[Mlir-commits] [clang] [llvm] [mlir] [IR][CodeGen] Replace constrained FP intrinsics with fp.control/fp.except operand bundles (PR #191613)

Wed Apr 15 11:25:41 PDT 2026

https://github.com/Prince781 updated https://github.com/llvm/llvm-project/pull/191613

>From c737776693e12b7d0753e2ce3bac30932d6687d2 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Sat, 11 Apr 2026 18:57:18 -0700
Subject: [PATCH 01/12] [IR][CodeGen] Replace constrained FP intrinsics with
 fp.control/fp.except operand bundles

This patch implements the RFC by Serge Pavlov for replacing constrained FP
intrinsics with fp.control/fp.except operand bundles:
https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021

Summary of changes:

IR / Auto-upgrade:
- Adds new FP arithmetic intrinsics (llvm.fadd, llvm.fsub, llvm.fmul,
  llvm.fdiv, llvm.frem, llvm.fneg, llvm.fcmp, llvm.fcmps) in Intrinsics.td.
- Auto-upgrades existing experimental_constrained_* calls to use
  fp.control/fp.except operand bundles in AutoUpgrade.cpp.

CodeGen / IRBuilder:
- Extends IRBuilderBase::CreateCall to automatically inject fp.control/
  fp.except bundles when building in constrained FP mode with non-default
  rounding or exception behavior settings.
- Extends SelectionDAGBuilder and IRTranslator to lower operations with
  non-default bundles to STRICT_* SDNodes, reusing the existing
  constrained-FP lowering path.
- Simplifies CloneFunction.cpp: inlining into a strictfp function no
  longer needs to convert plain FP instructions to constrained intrinsics.

Clang:
- Eliminates all CreateConstrainedFPCall and experimental_constrained_*
  references in CGBuiltin.cpp, CGExprScalar.cpp, AMDGPU.cpp, ARM.cpp,
  PPC.cpp, SystemZ.cpp, and X86.cpp. CreateCall in constrained mode now
  auto-injects FP bundles for non-default settings.

MLIR:
- Adds MLIR ops for the new constrained FP intrinsics
  (llvm.intr.experimental.constrained.*) that lower to llvm.fadd etc. with
  fp.control/fp.except bundles.
- Fixes import dispatch: llvm.fma and llvm.fmuladd have both regular and
  constrained MLIR handlers. For calls without FP bundles the regular
  handler (FMAOp/FMulAddOp) is now dispatched explicitly before the
  auto-generated .inc dispatch, preventing the constrained handler from
  silently swallowing plain calls.
- Updates ModuleImport to extract rounding mode and exception behavior
  from fp.control/fp.except bundles.
- Updates tests to reflect new intrinsic names and declaration signatures.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/lib/CodeGen/CGBuiltin.cpp               |  155 +-
 clang/lib/CodeGen/CGExprScalar.cpp            |   21 +-
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp   |    8 +-
 clang/lib/CodeGen/TargetBuiltins/ARM.cpp      |  101 +-
 clang/lib/CodeGen/TargetBuiltins/PPC.cpp      |   78 +-
 clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp  |  105 +-
 clang/lib/CodeGen/TargetBuiltins/X86.cpp      |   65 +-
 llvm/docs/LangRef.rst                         | 2263 ++------
 llvm/docs/ReleaseNotes.md                     |    3 +
 llvm/include/llvm/ADT/FloatingPointMode.h     |   33 +
 llvm/include/llvm/CodeGen/BasicTTIImpl.h      |   11 -
 .../llvm/CodeGen/GlobalISel/IRTranslator.h    |    4 -
 llvm/include/llvm/CodeGen/ISDOpcodes.h        |    5 +
 llvm/include/llvm/IR/ConstrainedOps.def       |  131 +-
 llvm/include/llvm/IR/FPEnv.h                  |   24 +-
 llvm/include/llvm/IR/FloatingPointOps.def     |  121 +
 llvm/include/llvm/IR/IRBuilder.h              |  178 +-
 llvm/include/llvm/IR/InstrTypes.h             |   47 +
 llvm/include/llvm/IR/IntrinsicInst.h          |   53 +-
 llvm/include/llvm/IR/Intrinsics.td            |  275 +-
 llvm/include/llvm/IR/LLVMContext.h            |    4 +-
 llvm/include/llvm/IR/Type.h                   |    2 +
 llvm/include/llvm/IR/VPIntrinsics.def         |   12 -
 llvm/include/llvm/Support/ModRef.h            |    5 +
 llvm/include/module.modulemap                 |    1 -
 llvm/lib/Analysis/ConstantFolding.cpp         |  237 +-
 llvm/lib/Analysis/InstructionSimplify.cpp     |   61 +-
 llvm/lib/Analysis/ValueTracking.cpp           |   12 +-
 llvm/lib/CodeGen/ExpandVectorPredication.cpp  |   14 +-
 llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp  |  252 +-
 .../SelectionDAG/LegalizeFloatTypes.cpp       |   11 +-
 .../SelectionDAG/LegalizeVectorTypes.cpp      |  153 +-
 .../SelectionDAG/SelectionDAGBuilder.cpp      |  283 +-
 .../SelectionDAG/SelectionDAGBuilder.h        |    3 +-
 llvm/lib/CodeGen/TargetLoweringBase.cpp       |   55 +-
 llvm/lib/IR/AutoUpgrade.cpp                   |  265 +-
 llvm/lib/IR/FPEnv.cpp                         |   97 +-
 llvm/lib/IR/IRBuilder.cpp                     |  196 +-
 llvm/lib/IR/Instructions.cpp                  |  284 ++
 llvm/lib/IR/IntrinsicInst.cpp                 |   84 +-
 llvm/lib/IR/Intrinsics.cpp                    |   26 +-
 llvm/lib/IR/LLVMContext.cpp                   |    4 +
 llvm/lib/IR/Type.cpp                          |   33 +-
 llvm/lib/IR/Verifier.cpp                      |  217 +-
 llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp |   13 +-
 .../Target/SPIRV/SPIRVPrepareFunctions.cpp    |  175 +-
 .../InstCombine/InstCombineCalls.cpp          |    5 +-
 llvm/lib/Transforms/Scalar/EarlyCSE.cpp       |   56 +-
 llvm/lib/Transforms/Utils/CloneFunction.cpp   |   82 +-
 llvm/lib/Transforms/Utils/Local.cpp           |    8 +-
 .../CostModel/ARM/mve-intrinsic-cost-kinds.ll |    4 +-
 .../CostModel/X86/intrinsic-cost-kinds.ll     |   16 +-
 llvm/test/Assembler/fp-intrinsics-attr.bc     |  Bin 0 -> 3960 bytes
 llvm/test/Assembler/fp-intrinsics-attr.ll     |  196 +-
 .../Assembler/fp-intrinsics-nondefault.ll     |  257 +
 .../Bitcode/operand-bundles-bc-analyzer.ll    |   19 +
 .../AArch64/arm64-cvt-simd-fptoi-strictfp.ll  |   13 -
 llvm/test/CodeGen/AArch64/arm64-vmul.ll       |   85 +-
 llvm/test/CodeGen/AArch64/floatdp_1source.ll  |   23 +
 llvm/test/CodeGen/AArch64/floatdp_2source.ll  |   69 +
 .../CodeGen/AArch64/fp-intrinsics-fp16.ll     |   42 +-
 .../CodeGen/AArch64/fp-intrinsics-vector.ll   |  127 +-
 llvm/test/CodeGen/AArch64/fp-intrinsics.ll    |  734 +--
 .../CodeGen/AArch64/strict-fp-int-promote.ll  |   13 +-
 llvm/test/CodeGen/AArch64/strict-fp-opt.ll    |   16 +-
 .../sve-streaming-mode-cvt-fp-to-int.ll       |    4 +-
 .../GlobalISel/irtranslator-constrained-fp.ll |    8 +-
 .../AMDGPU/GlobalISel/strict_fma.f16.ll       |   84 +-
 .../AMDGPU/GlobalISel/strict_fma.f32.ll       |  160 +-
 .../AMDGPU/GlobalISel/strict_fma.f64.ll       |  296 +-
 .../AMDGPU/amdgpu-simplify-libcall-pow.ll     |   96 +-
 .../AMDGPU/amdgpu-simplify-libcall-pown.ll    |   51 +-
 .../AMDGPU/amdgpu-simplify-libcall-rootn.ll   |  165 +-
 llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll     |    8 +-
 .../AMDGPU/fsub-as-fneg-src-modifier.ll       |   54 +-
 .../AMDGPU/global_atomic_optimizer_fp_rtn.ll  | 1310 ++---
 .../global_atomics_optimizer_fp_no_rtn.ll     | 1030 ++--
 .../AMDGPU/global_atomics_scan_fadd.ll        | 1818 +++----
 .../AMDGPU/global_atomics_scan_fsub.ll        | 1826 +++----
 llvm/test/CodeGen/AMDGPU/strict_fpext.ll      |   57 +-
 llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll  |  150 +-
 llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll  |  213 +-
 .../AMDGPU/strictfp_f16_abi_promote.ll        |   36 +-
 llvm/test/CodeGen/ARM/fp-intrinsics-vector.ll |   28 +-
 llvm/test/CodeGen/ARM/fp-intrinsics.ll        |  211 +-
 llvm/test/CodeGen/ARM/fp16-fullfp16.ll        |   21 +-
 llvm/test/CodeGen/ARM/strict-fp-ops.ll        |   17 +-
 .../CodeGen/ARM/strictfp_f16_abi_promote.ll   |   21 +-
 llvm/test/CodeGen/Mips/fp-intrinsics.ll       |  991 +++-
 .../PowerPC/cse-despite-rounding-mode.ll      |    9 +-
 .../CodeGen/PowerPC/fp-strict-conv-f128.ll    |  175 +-
 llvm/test/CodeGen/PowerPC/fp-strict-round.ll  |  127 +-
 llvm/test/CodeGen/PowerPC/fp-strict.ll        |  355 +-
 llvm/test/CodeGen/PowerPC/nofpexcept.ll       |    8 +-
 .../ppcf128-constrained-fp-intrinsics.ll      |  271 +-
 .../CodeGen/PowerPC/respect-rounding-mode.ll  |   10 +-
 .../CodeGen/PowerPC/scalar-rounding-ops.ll    |  170 +-
 .../vector-constrained-fp-intrinsics.ll       | 3557 +++++--------
 .../test/CodeGen/RISCV/double-arith-strict.ll |   66 +-
 .../CodeGen/RISCV/double-intrinsics-strict.ll |  767 +--
 llvm/test/CodeGen/RISCV/float-arith-strict.ll |   89 +-
 .../CodeGen/RISCV/float-intrinsics-strict.ll  | 1146 +----
 .../test/CodeGen/RISCV/half-convert-strict.ll |   50 +-
 .../RISCV/rvv/fceil-constrained-sdnode.ll     |  125 +-
 .../RISCV/rvv/ffloor-constrained-sdnode.ll    |  125 +-
 .../fixed-vectors-fceil-constrained-sdnode.ll |  125 +-
 ...fixed-vectors-ffloor-constrained-sdnode.ll |  125 +-
 ...d-vectors-fnearbyint-constrained-sdnode.ll |   99 +-
 ...fixed-vectors-fround-constrained-sdnode.ll |  125 +-
 ...d-vectors-froundeven-constrained-sdnode.ll |  125 +-
 ...fixed-vectors-ftrunc-constrained-sdnode.ll |  125 +-
 ...fixed-vectors-vfmadd-constrained-sdnode.ll |  120 +-
 ...fixed-vectors-vfptoi-constrained-sdnode.ll |   94 +-
 .../rvv/fnearbyint-constrained-sdnode.ll      |  125 +-
 .../RISCV/rvv/fround-constrained-sdnode.ll    |  125 +-
 .../rvv/froundeven-constrained-sdnode.ll      |  125 +-
 .../RISCV/rvv/ftrunc-constrained-sdnode.ll    |  125 +-
 .../RISCV/rvv/rvv-peephole-vmerge-vops.ll     |   12 +-
 .../RISCV/rvv/stores-of-loads-merging.ll      |    1 -
 .../RISCV/rvv/vfmadd-constrained-sdnode.ll    |  330 +-
 .../RISCV/rvv/vfmsub-constrained-sdnode.ll    |  142 +-
 .../RISCV/rvv/vfnmadd-constrained-sdnode.ll   |   24 +-
 .../RISCV/rvv/vfnmsub-constrained-sdnode.ll   |   24 +-
 .../RISCV/rvv/vfptoi-constrained-sdnode.ll    |   90 +-
 .../CodeGen/RISCV/rvv/vmv.v.v-peephole.ll     |    4 +-
 .../RISCV/zfh-half-intrinsics-strict.ll       |  624 ++-
 .../RISCV/zfhmin-half-intrinsics-strict.ll    |  564 +-
 .../llvm-intrinsics/constrained-fmuladd.ll    |    5 +-
 llvm/test/CodeGen/SystemZ/fp-strict-alias.ll  |  185 +-
 llvm/test/CodeGen/SystemZ/fp-strict-cmp-04.ll |  433 +-
 llvm/test/CodeGen/SystemZ/fp-strict-cmp-05.ll |   37 +-
 .../test/CodeGen/SystemZ/fp-strict-conv-08.ll |   67 +-
 .../test/CodeGen/SystemZ/fp-strict-conv-10.ll |   33 +-
 .../test/CodeGen/SystemZ/fp-strict-conv-12.ll |   33 +-
 llvm/test/CodeGen/SystemZ/fp-strict-mul-06.ll |  218 +-
 .../vector-constrained-fp-intrinsics.ll       | 1931 ++-----
 llvm/test/CodeGen/VE/Scalar/cast.ll           |  213 +-
 llvm/test/CodeGen/X86/avx512fp16-frem.ll      |  474 +-
 llvm/test/CodeGen/X86/bfloat-constrained.ll   |   64 +-
 .../CodeGen/X86/float-strict-powi-convert.ll  |   10 +-
 .../CodeGen/X86/fp-intrinsics-flags-x86_64.ll |   17 +-
 llvm/test/CodeGen/X86/fp-intrinsics-flags.ll  |  178 +-
 llvm/test/CodeGen/X86/fp-intrinsics-fma.ll    |   94 +-
 llvm/test/CodeGen/X86/fp-intrinsics.ll        |  784 +--
 .../CodeGen/X86/fp-strict-libcalls-msvc32.ll  |    9 +-
 .../test/CodeGen/X86/fp-strict-scalar-fp16.ll |  112 +-
 .../X86/fp-strict-scalar-fptoint-fp16.ll      |   72 +-
 .../CodeGen/X86/fp-strict-scalar-fptoint.ll   |  170 +-
 .../X86/fp-strict-scalar-inttofp-fp16.ll      |   50 +-
 .../CodeGen/X86/fp-strict-scalar-inttofp.ll   |  176 +-
 .../X86/fp-strict-scalar-round-fp16.ll        |   70 +-
 .../CodeGen/X86/fp-strict-scalar-round.ll     |  104 +-
 llvm/test/CodeGen/X86/fp-strict-scalar.ll     |   32 +-
 llvm/test/CodeGen/X86/fp128-cast-strict.ll    |   76 +-
 .../test/CodeGen/X86/fp128-libcalls-strict.ll |  568 +--
 llvm/test/CodeGen/X86/fp80-strict-libcalls.ll |   59 +-
 llvm/test/CodeGen/X86/fp80-strict-scalar.ll   |   17 +-
 llvm/test/CodeGen/X86/half-constrained.ll     |   31 +-
 llvm/test/CodeGen/X86/half-darwin.ll          |    3 +-
 llvm/test/CodeGen/X86/ldexp-strict.ll         |   10 +-
 llvm/test/CodeGen/X86/llrint-conv.ll          |  142 +-
 llvm/test/CodeGen/X86/lrint-conv-i32.ll       |  109 +-
 llvm/test/CodeGen/X86/lrint-conv-i64.ll       |  114 +-
 llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll  |  225 +
 llvm/test/CodeGen/X86/strict-fsub-combines.ll |   14 +-
 llvm/test/CodeGen/X86/vec-strict-128-fp16.ll  |    3 -
 .../X86/vec-strict-fptoint-128-fp16.ll        |  226 +-
 .../CodeGen/X86/vec-strict-fptoint-128.ll     | 1080 ++--
 .../X86/vec-strict-fptoint-256-fp16.ll        |   44 +-
 .../CodeGen/X86/vec-strict-fptoint-256.ll     |  314 +-
 .../CodeGen/X86/vec-strict-fptoint-512.ll     |  212 +-
 .../CodeGen/X86/vec-strict-inttofp-128.ll     |  445 +-
 .../CodeGen/X86/vec-strict-inttofp-256.ll     |  223 +-
 .../CodeGen/X86/vec-strict-inttofp-512.ll     |  122 +-
 .../vector-constrained-fp-intrinsics-flags.ll |   42 +-
 .../vector-constrained-fp-intrinsics-fma.ll   |   40 +-
 .../X86/vector-constrained-fp-intrinsics.ll   | 4544 +++++------------
 .../CodeGen/X86/vector-half-conversions.ll    |   17 +-
 .../CodeGen/X86/vector-shuffle-combining.ll   |   37 +-
 llvm/test/Feature/fp-intrinsics.ll            |  490 +-
 .../MemorySanitizer/AArch64/arm64-vmul.ll     |  452 +-
 .../AMDGPU/expand-atomic-rmw-fadd.ll          |  151 +-
 .../AMDGPU/expand-atomic-rmw-fmax.ll          |    5 +-
 .../AMDGPU/expand-atomic-rmw-fmin.ll          |    5 +-
 .../AMDGPU/expand-atomic-rmw-fsub.ll          |    5 +-
 .../Transforms/Attributor/nofpclass-log.ll    |  162 +-
 .../Transforms/Attributor/nofpclass-sqrt.ll   |  114 +-
 llvm/test/Transforms/Attributor/nofpclass.ll  |  834 +--
 .../Transforms/EarlyCSE/defaultfp-strictfp.ll |   70 +-
 .../Transforms/EarlyCSE/ebstrict-strictfp.ll  |   54 +-
 .../Transforms/EarlyCSE/mixed-strictfp.ll     |  118 +-
 .../Transforms/EarlyCSE/nonmixed-strictfp.ll  |   84 +-
 .../Transforms/EarlyCSE/round-dyn-strictfp.ll |   58 +-
 .../test/Transforms/EarlyCSE/tfpropagation.ll |   18 +-
 .../solver-constant-strictfpmetadata.ll       |    2 +-
 .../test/Transforms/Inline/inline-strictfp.ll |  116 +-
 .../InstCombine/AMDGPU/amdgcn-intrinsics.ll   |    3 +-
 .../InstCombine/AMDGPU/fmed3-fpext-fold.ll    |   24 +-
 .../Transforms/InstCombine/constrained.ll     |   13 +-
 .../InstCombine/fpclass-check-idioms.ll       |    3 +
 .../InstCombine/fsqrtdiv-transform.ll         |    7 +-
 .../test/Transforms/InstCombine/is_fpclass.ll |   10 +-
 llvm/test/Transforms/InstCombine/ldexp.ll     |    7 +-
 .../InstSimplify/X86/fp-nan-strictfp.ll       |  141 +-
 .../constant-fold-fp-denormal-strict.ll       |   98 +
 .../InstSimplify/constfold-constrained.ll     |   84 +-
 .../InstSimplify/fast-math-strictfp.ll        |  184 +-
 .../Transforms/InstSimplify/fdiv-strictfp.ll  |   20 +-
 .../floating-point-arithmetic-strictfp.ll     |  203 +-
 .../InstSimplify/fp-undef-poison-strictfp.ll  |  256 +-
 llvm/test/Transforms/InstSimplify/ldexp.ll    |   81 +-
 .../Transforms/InstSimplify/strictfp-fadd.ll  |  118 +-
 .../Transforms/InstSimplify/strictfp-fsub.ll  |  194 +-
 .../InstSimplify/strictfp-sqrt-nonneg.ll      |   82 +-
 .../MergeFunc/merge-fp-intrinsics.ll          |    6 +-
 .../Transforms/SCCP/strictfp-phis-fcmp.ll     |   20 +-
 .../Transforms/SCCP/strictfp-phis-fcmps.ll    |   16 +-
 .../Util/libcalls-shrinkwrap-double.ll        |  643 +--
 .../Util/libcalls-shrinkwrap-float.ll         |  505 +-
 .../Util/libcalls-shrinkwrap-long-double.ll   |  505 +-
 .../Util/libcalls-shrinkwrap-strictfp.ll      |   53 +-
 llvm/test/Verifier/fp-intrinsics-pass.ll      |   38 +-
 llvm/test/Verifier/fp-intrinsics.ll           |  188 +-
 .../tools/llvm-reduce/inline-call-sites.ll    |   10 +-
 llvm/unittests/Bitcode/BitReaderTest.cpp      |   23 +-
 llvm/unittests/IR/IRBuilderTest.cpp           |  329 +-
 llvm/unittests/IR/InstructionsTest.cpp        |   54 -
 .../mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td   |   82 +-
 .../include/mlir/Target/LLVMIR/ModuleImport.h |    9 +
 .../LLVMIR/LLVMIRToLLVMTranslation.cpp        |   46 +
 mlir/lib/Target/LLVMIR/ModuleImport.cpp       |   24 +
 mlir/test/Target/LLVMIR/Import/intrinsic.ll   |    4 +-
 .../test/Target/LLVMIR/llvmir-intrinsics.mlir |  183 +-
 233 files changed, 20495 insertions(+), 27700 deletions(-)
 create mode 100644 llvm/include/llvm/IR/FloatingPointOps.def
 create mode 100644 llvm/test/Assembler/fp-intrinsics-attr.bc
 create mode 100644 llvm/test/Assembler/fp-intrinsics-nondefault.ll
 create mode 100644 llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 4d74d681cd320..d06925c6fc656 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -588,98 +588,70 @@ static Value *EmitISOVolatileStore(CodeGenFunction &CGF, const CallExpr *E) {
 }
 
 // Emit a simple mangled intrinsic that has 1 argument and a return type
-// matching the argument type. Depending on mode, this may be a constrained
-// floating-point intrinsic.
+// matching the argument type. When in constrained FP mode, CreateCall
+// automatically injects fp.control/fp.except bundles for non-default settings.
 Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
                                 const CallExpr *E, unsigned IntrinsicID,
-                                unsigned ConstrainedIntrinsicID) {
+                                unsigned /*ConstrainedIntrinsicID*/) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-  if (CGF.Builder.getIsFPConstrained()) {
-    Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
-    return CGF.Builder.CreateConstrainedFPCall(F, { Src0 });
-  } else {
-    Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-    return CGF.Builder.CreateCall(F, Src0);
-  }
+  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+  return CGF.Builder.CreateCall(F, Src0);
 }
 
 // Emit an intrinsic that has 2 operands of the same type as its result.
-// Depending on mode, this may be a constrained floating-point intrinsic.
+// When in constrained FP mode, CreateCall automatically injects fp.control/
+// fp.except bundles for non-default settings.
 static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
                                 const CallExpr *E, unsigned IntrinsicID,
-                                unsigned ConstrainedIntrinsicID) {
+                                unsigned /*ConstrainedIntrinsicID*/) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
-
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-  if (CGF.Builder.getIsFPConstrained()) {
-    Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
-    return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1 });
-  } else {
-    Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-    return CGF.Builder.CreateCall(F, { Src0, Src1 });
-  }
+  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+  return CGF.Builder.CreateCall(F, { Src0, Src1 });
 }
 
 // Has second type mangled argument.
 static Value *
 emitBinaryExpMaybeConstrainedFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
                                        Intrinsic::ID IntrinsicID,
-                                       Intrinsic::ID ConstrainedIntrinsicID) {
+                                       Intrinsic::ID /*ConstrainedIntrinsicID*/) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
-
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-  if (CGF.Builder.getIsFPConstrained()) {
-    Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID,
-                                       {Src0->getType(), Src1->getType()});
-    return CGF.Builder.CreateConstrainedFPCall(F, {Src0, Src1});
-  }
-
   Function *F =
       CGF.CGM.getIntrinsic(IntrinsicID, {Src0->getType(), Src1->getType()});
   return CGF.Builder.CreateCall(F, {Src0, Src1});
 }
 
 // Emit an intrinsic that has 3 operands of the same type as its result.
-// Depending on mode, this may be a constrained floating-point intrinsic.
+// When in constrained FP mode, CreateCall automatically injects fp.control/
+// fp.except bundles for non-default settings.
 static Value *emitTernaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
                                  const CallExpr *E, unsigned IntrinsicID,
-                                 unsigned ConstrainedIntrinsicID) {
+                                 unsigned /*ConstrainedIntrinsicID*/) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
-
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-  if (CGF.Builder.getIsFPConstrained()) {
-    Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Src0->getType());
-    return CGF.Builder.CreateConstrainedFPCall(F, { Src0, Src1, Src2 });
-  } else {
-    Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-    return CGF.Builder.CreateCall(F, { Src0, Src1, Src2 });
-  }
+  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+  return CGF.Builder.CreateCall(F, { Src0, Src1, Src2 });
 }
 
 // Emit an intrinsic that has overloaded integer result and fp operand.
+// When in constrained FP mode, CreateCall automatically injects fp.control/
+// fp.except bundles for non-default settings.
 static Value *
 emitMaybeConstrainedFPToIntRoundBuiltin(CodeGenFunction &CGF, const CallExpr *E,
                                         unsigned IntrinsicID,
-                                        unsigned ConstrainedIntrinsicID) {
+                                        unsigned /*ConstrainedIntrinsicID*/) {
   llvm::Type *ResultType = CGF.ConvertType(E->getType());
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-
-  if (CGF.Builder.getIsFPConstrained()) {
-    CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-    Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID,
-                                       {ResultType, Src0->getType()});
-    return CGF.Builder.CreateConstrainedFPCall(F, {Src0});
-  } else {
-    Function *F =
-        CGF.CGM.getIntrinsic(IntrinsicID, {ResultType, Src0->getType()});
-    return CGF.Builder.CreateCall(F, Src0);
-  }
+  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
+  Function *F =
+      CGF.CGM.getIntrinsic(IntrinsicID, {ResultType, Src0->getType()});
+  return CGF.Builder.CreateCall(F, Src0);
 }
 
 static Value *emitFrexpBuiltin(CodeGenFunction &CGF, const CallExpr *E,
@@ -2709,7 +2681,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_acosf128:
     case Builtin::BI__builtin_elementwise_acos:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::acos, Intrinsic::experimental_constrained_acos));
+          *this, E, Intrinsic::acos, 0));
 
     case Builtin::BIasin:
     case Builtin::BIasinf:
@@ -2721,7 +2693,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_asinf128:
     case Builtin::BI__builtin_elementwise_asin:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::asin, Intrinsic::experimental_constrained_asin));
+          *this, E, Intrinsic::asin, 0));
 
     case Builtin::BIatan:
     case Builtin::BIatanf:
@@ -2733,7 +2705,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_atanf128:
     case Builtin::BI__builtin_elementwise_atan:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::atan, Intrinsic::experimental_constrained_atan));
+          *this, E, Intrinsic::atan, 0));
 
     case Builtin::BIatan2:
     case Builtin::BIatan2f:
@@ -2746,7 +2718,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_atan2:
       return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(
           *this, E, Intrinsic::atan2,
-          Intrinsic::experimental_constrained_atan2));
+          0));
 
     case Builtin::BIceil:
     case Builtin::BIceilf:
@@ -2759,7 +2731,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_ceil:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::ceil,
-                                   Intrinsic::experimental_constrained_ceil));
+                                   0));
 
     case Builtin::BIcopysign:
     case Builtin::BIcopysignf:
@@ -2783,7 +2755,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_cos:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::cos,
-                                   Intrinsic::experimental_constrained_cos));
+                                   0));
 
     case Builtin::BIcosh:
     case Builtin::BIcoshf:
@@ -2795,7 +2767,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_coshf128:
     case Builtin::BI__builtin_elementwise_cosh:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::cosh, Intrinsic::experimental_constrained_cosh));
+          *this, E, Intrinsic::cosh, 0));
 
     case Builtin::BIexp:
     case Builtin::BIexpf:
@@ -2808,7 +2780,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_exp:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::exp,
-                                   Intrinsic::experimental_constrained_exp));
+                                   0));
 
     case Builtin::BIexp2:
     case Builtin::BIexp2f:
@@ -2821,7 +2793,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_exp2:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::exp2,
-                                   Intrinsic::experimental_constrained_exp2));
+                                   0));
     case Builtin::BI__builtin_exp10:
     case Builtin::BI__builtin_exp10f:
     case Builtin::BI__builtin_exp10f16:
@@ -2856,7 +2828,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_floor:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::floor,
-                                   Intrinsic::experimental_constrained_floor));
+                                   0));
 
     case Builtin::BIfma:
     case Builtin::BIfmaf:
@@ -2869,7 +2841,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_fma:
       return RValue::get(emitTernaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::fma,
-                                   Intrinsic::experimental_constrained_fma));
+                                   0));
 
     case Builtin::BIfmax:
     case Builtin::BIfmaxf:
@@ -2883,7 +2855,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       Builder.getFastMathFlags().setNoSignedZeros();
       return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(
           *this, E, Intrinsic::maxnum,
-          Intrinsic::experimental_constrained_maxnum));
+          0));
     }
 
     case Builtin::BIfmin:
@@ -2898,7 +2870,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       Builder.getFastMathFlags().setNoSignedZeros();
       return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(
           *this, E, Intrinsic::minnum,
-          Intrinsic::experimental_constrained_minnum));
+          0));
     }
 
     case Builtin::BIfmaximum_num:
@@ -2937,13 +2909,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
       CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
       Value *Arg1 = EmitScalarExpr(E->getArg(0));
       Value *Arg2 = EmitScalarExpr(E->getArg(1));
-      if (Builder.getIsFPConstrained()) {
-        Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_frem,
-                                       Arg1->getType());
-        return RValue::get(Builder.CreateConstrainedFPCall(F, {Arg1, Arg2}));
-      } else {
-        return RValue::get(Builder.CreateFRem(Arg1, Arg2, "fmod"));
-      }
+      return RValue::get(Builder.CreateFRem(Arg1, Arg2, "fmod"));
     }
 
     case Builtin::BIlog:
@@ -2957,7 +2923,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_log:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::log,
-                                   Intrinsic::experimental_constrained_log));
+                                   0));
 
     case Builtin::BIlog10:
     case Builtin::BIlog10f:
@@ -2970,7 +2936,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_log10:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::log10,
-                                   Intrinsic::experimental_constrained_log10));
+                                   0));
 
     case Builtin::BIlog2:
     case Builtin::BIlog2f:
@@ -2983,7 +2949,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_log2:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::log2,
-                                   Intrinsic::experimental_constrained_log2));
+                                   0));
 
     case Builtin::BInearbyint:
     case Builtin::BInearbyintf:
@@ -2995,7 +2961,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_nearbyint:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                 Intrinsic::nearbyint,
-                                Intrinsic::experimental_constrained_nearbyint));
+                                0));
 
     case Builtin::BIpow:
     case Builtin::BIpowf:
@@ -3008,7 +2974,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_pow:
       return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::pow,
-                                   Intrinsic::experimental_constrained_pow));
+                                   0));
 
     case Builtin::BIrint:
     case Builtin::BIrintf:
@@ -3021,7 +2987,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_rint:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::rint,
-                                   Intrinsic::experimental_constrained_rint));
+                                   0));
 
     case Builtin::BIround:
     case Builtin::BIroundf:
@@ -3034,7 +3000,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_round:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::round,
-                                   Intrinsic::experimental_constrained_round));
+                                   0));
 
     case Builtin::BIroundeven:
     case Builtin::BIroundevenf:
@@ -3047,7 +3013,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_roundeven:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::roundeven,
-                                   Intrinsic::experimental_constrained_roundeven));
+                                   0));
 
     case Builtin::BIsin:
     case Builtin::BIsinf:
@@ -3060,7 +3026,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_sin:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::sin,
-                                   Intrinsic::experimental_constrained_sin));
+                                   0));
 
     case Builtin::BIsinh:
     case Builtin::BIsinhf:
@@ -3072,7 +3038,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sinhf128:
     case Builtin::BI__builtin_elementwise_sinh:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::sinh, Intrinsic::experimental_constrained_sinh));
+          *this, E, Intrinsic::sinh, 0));
 
     case Builtin::BI__builtin_sincospi:
     case Builtin::BI__builtin_sincospif:
@@ -3105,7 +3071,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sqrtf128:
     case Builtin::BI__builtin_elementwise_sqrt: {
       llvm::Value *Call = emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::sqrt, Intrinsic::experimental_constrained_sqrt);
+          *this, E, Intrinsic::sqrt, 0);
       SetSqrtFPAccuracy(Call);
       return RValue::get(Call);
     }
@@ -3120,7 +3086,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_tanf128:
     case Builtin::BI__builtin_elementwise_tan:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::tan, Intrinsic::experimental_constrained_tan));
+          *this, E, Intrinsic::tan, 0));
 
     case Builtin::BItanh:
     case Builtin::BItanhf:
@@ -3132,7 +3098,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_tanhf128:
     case Builtin::BI__builtin_elementwise_tanh:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::tanh, Intrinsic::experimental_constrained_tanh));
+          *this, E, Intrinsic::tanh, 0));
 
     case Builtin::BItrunc:
     case Builtin::BItruncf:
@@ -3145,7 +3111,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_trunc:
       return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
                                    Intrinsic::trunc,
-                                   Intrinsic::experimental_constrained_trunc));
+                                   0));
 
     case Builtin::BIlround:
     case Builtin::BIlroundf:
@@ -3156,7 +3122,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_lroundf128:
       return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
           *this, E, Intrinsic::lround,
-          Intrinsic::experimental_constrained_lround));
+          0));
 
     case Builtin::BIllround:
     case Builtin::BIllroundf:
@@ -3167,7 +3133,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_llroundf128:
       return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
           *this, E, Intrinsic::llround,
-          Intrinsic::experimental_constrained_llround));
+          0));
 
     case Builtin::BIlrint:
     case Builtin::BIlrintf:
@@ -3178,7 +3144,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_lrintf128:
       return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
           *this, E, Intrinsic::lrint,
-          Intrinsic::experimental_constrained_lrint));
+          0));
 
     case Builtin::BIllrint:
     case Builtin::BIllrintf:
@@ -3189,7 +3155,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_llrintf128:
       return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
           *this, E, Intrinsic::llrint,
-          Intrinsic::experimental_constrained_llrint));
+          0));
     case Builtin::BI__builtin_ldexp:
     case Builtin::BI__builtin_ldexpf:
     case Builtin::BI__builtin_ldexpl:
@@ -3198,7 +3164,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_elementwise_ldexp:
       return RValue::get(emitBinaryExpMaybeConstrainedFPBuiltin(
           *this, E, Intrinsic::ldexp,
-          Intrinsic::experimental_constrained_ldexp));
+          0));
     default:
       break;
     }
@@ -3865,15 +3831,6 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
     llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
 
-    if (Builder.getIsFPConstrained()) {
-      // FIXME: llvm.powi has 2 mangling types,
-      // llvm.experimental.constrained.powi has one.
-      CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_powi,
-                                     Src0->getType());
-      return RValue::get(Builder.CreateConstrainedFPCall(F, { Src0, Src1 }));
-    }
-
     Function *F = CGM.getIntrinsic(Intrinsic::powi,
                                    { Src0->getType(), Src1->getType() });
     return RValue::get(Builder.CreateCall(F, { Src0, Src1 }));
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index a8dcf22992983..498d48b7f6071 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -4609,20 +4609,9 @@ static Value* buildFMulAdd(llvm::Instruction *MulOp, Value *Addend,
   if (negAdd)
     Addend = Builder.CreateFNeg(Addend, "neg");
 
-  Value *FMulAdd = nullptr;
-  if (Builder.getIsFPConstrained()) {
-    assert(isa<llvm::ConstrainedFPIntrinsic>(MulOp) &&
-           "Only constrained operation should be created when Builder is in FP "
-           "constrained mode");
-    FMulAdd = Builder.CreateConstrainedFPCall(
-        CGF.CGM.getIntrinsic(llvm::Intrinsic::experimental_constrained_fmuladd,
-                             Addend->getType()),
-        {MulOp0, MulOp1, Addend});
-  } else {
-    FMulAdd = Builder.CreateCall(
-        CGF.CGM.getIntrinsic(llvm::Intrinsic::fmuladd, Addend->getType()),
-        {MulOp0, MulOp1, Addend});
-  }
+  Value *FMulAdd = Builder.CreateCall(
+      CGF.CGM.getIntrinsic(llvm::Intrinsic::fmuladd, Addend->getType()),
+      {MulOp0, MulOp1, Addend});
   MulOp->eraseFromParent();
 
   return FMulAdd;
@@ -4693,7 +4682,7 @@ static Value* tryEmitFMulAdd(const BinOpInfo &op,
 
   if (auto *LHSBinOp = dyn_cast<llvm::CallBase>(LHS)) {
     if (LHSBinOp->getIntrinsicID() ==
-            llvm::Intrinsic::experimental_constrained_fmul &&
+            llvm::Intrinsic::fmul &&
         (LHSBinOp->use_empty() || NegLHS)) {
       // If we looked through fneg, erase it.
       if (NegLHS)
@@ -4703,7 +4692,7 @@ static Value* tryEmitFMulAdd(const BinOpInfo &op,
   }
   if (auto *RHSBinOp = dyn_cast<llvm::CallBase>(RHS)) {
     if (RHSBinOp->getIntrinsicID() ==
-            llvm::Intrinsic::experimental_constrained_fmul &&
+            llvm::Intrinsic::fmul &&
         (RHSBinOp->use_empty() || NegRHS)) {
       // If we looked through fneg, erase it.
       if (NegRHS)
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 0aa1b9dbb8bd3..853ecc7cfe75c 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -40,12 +40,6 @@ emitBinaryExpMaybeConstrainedFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
 
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-  if (CGF.Builder.getIsFPConstrained()) {
-    Function *F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID,
-                                       {Src0->getType(), Src1->getType()});
-    return CGF.Builder.CreateConstrainedFPCall(F, {Src0, Src1});
-  }
-
   Function *F =
       CGF.CGM.getIntrinsic(IntrinsicID, {Src0->getType(), Src1->getType()});
   return CGF.Builder.CreateCall(F, {Src0, Src1});
@@ -2201,7 +2195,7 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   case Builtin::BIscalbn:
   case Builtin::BI__builtin_scalbn:
     return emitBinaryExpMaybeConstrainedFPBuiltin(
-        *this, E, Intrinsic::ldexp, Intrinsic::experimental_constrained_ldexp);
+        *this, E, Intrinsic::ldexp, 0);
   default:
     return nullptr;
   }
diff --git a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
index 3e0ec2c143428..0259b3c8e54da 100644
--- a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
@@ -343,18 +343,9 @@ translateArmToMsvcIntrin(unsigned BuiltinID) {
 // Depending on mode, this may be a constrained floating-point intrinsic.
 static Value *emitCallMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
                                                 unsigned IntrinsicID,
-                                                unsigned ConstrainedIntrinsicID,
                                                 llvm::Type *Ty,
                                                 ArrayRef<Value *> Args) {
-  Function *F;
-  if (CGF.Builder.getIsFPConstrained())
-    F = CGF.CGM.getIntrinsic(ConstrainedIntrinsicID, Ty);
-  else
-    F = CGF.CGM.getIntrinsic(IntrinsicID, Ty);
-
-  if (CGF.Builder.getIsFPConstrained())
-    return CGF.Builder.CreateConstrainedFPCall(F, Args);
-
+  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Ty);
   return CGF.Builder.CreateCall(F, Args);
 }
 
@@ -430,17 +421,12 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
   unsigned j = 0;
   for (Function::const_arg_iterator ai = F->arg_begin(), ae = F->arg_end();
        ai != ae; ++ai, ++j) {
-    if (F->isConstrainedFPIntrinsic())
-      if (ai->getType()->isMetadataTy())
-        continue;
     if (shift > 0 && shift == j)
       Ops[j] = EmitNeonShiftVector(Ops[j], ai->getType(), rightshift);
     else
       Ops[j] = Builder.CreateBitCast(Ops[j], ai->getType(), name);
   }
 
-  if (F->isConstrainedFPIntrinsic())
-    return Builder.CreateConstrainedFPCall(F, Ops, name);
   return Builder.CreateCall(F, Ops, name);
 }
 
@@ -1462,7 +1448,7 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
 
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
     return emitCallMaybeConstrainedFPBuiltin(
-        *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma, Ty,
+        *this, Intrinsic::fma, Ty,
         {Ops[1], Ops[2], Ops[0]});
   }
   case NEON::BI__builtin_neon_vld1_v:
@@ -1614,9 +1600,7 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, NameHint);
   case NEON::BI__builtin_neon_vrndi_v:
   case NEON::BI__builtin_neon_vrndiq_v:
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_nearbyint
-              : Intrinsic::nearbyint;
+    Int = Intrinsic::nearbyint;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, NameHint);
   case NEON::BI__builtin_neon_vrshr_n_v:
   case NEON::BI__builtin_neon_vrshrq_n_v:
@@ -5814,14 +5798,14 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
   case NEON::BI__builtin_neon_vfmah_f16:
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
     return emitCallMaybeConstrainedFPBuiltin(
-        *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma, HalfTy,
+        *this, Intrinsic::fma, HalfTy,
         {Ops[1], Ops[2], Ops[0]});
   case NEON::BI__builtin_neon_vfmsh_f16: {
     Value *Neg = Builder.CreateFNeg(Ops[1], "vsubh");
 
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
     return emitCallMaybeConstrainedFPBuiltin(
-        *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma, HalfTy,
+        *this, Intrinsic::fma, HalfTy,
         {Neg, Ops[2], Ops[0]});
   }
   case NEON::BI__builtin_neon_vaddd_s64:
@@ -6102,8 +6086,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Ops[1] = Builder.CreateShuffleVector(Ops[1], Ops[1], SV, "lane");
 
     Ops.pop_back();
-    Int = Builder.getIsFPConstrained() ? Intrinsic::experimental_constrained_fma
-                                       : Intrinsic::fma;
+    Int = Intrinsic::fma;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "fmla");
   }
   case NEON::BI__builtin_neon_vfma_laneq_v: {
@@ -6118,7 +6101,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
       Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
       Value *Result;
       Result = emitCallMaybeConstrainedFPBuiltin(
-          *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma,
+          *this, Intrinsic::fma,
           DoubleTy, {Ops[1], Ops[2], Ops[0]});
       return Builder.CreateBitCast(Result, Ty);
     }
@@ -6133,7 +6116,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Ops[2] = Builder.CreateShuffleVector(Ops[2], Ops[2], SV, "lane");
 
     return emitCallMaybeConstrainedFPBuiltin(
-        *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma, Ty,
+        *this, Intrinsic::fma, Ty,
         {Ops[2], Ops[1], Ops[0]});
   }
   case NEON::BI__builtin_neon_vfmaq_laneq_v: {
@@ -6143,7 +6126,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
     Ops[2] = EmitNeonSplat(Ops[2], cast<ConstantInt>(Ops[3]));
     return emitCallMaybeConstrainedFPBuiltin(
-        *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma, Ty,
+        *this, Intrinsic::fma, Ty,
         {Ops[2], Ops[1], Ops[0]});
   }
   case NEON::BI__builtin_neon_vfmah_lane_f16:
@@ -6155,7 +6138,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     llvm::Type *Ty = ConvertType(E->getCallReturnType(getContext()));
     Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
     return emitCallMaybeConstrainedFPBuiltin(
-        *this, Intrinsic::fma, Intrinsic::experimental_constrained_fma, Ty,
+        *this, Intrinsic::fma, Ty,
         {Ops[1], Ops[2], Ops[0]});
   }
   case NEON::BI__builtin_neon_vmull_v:
@@ -6257,86 +6240,60 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Int = usgn ? Intrinsic::aarch64_neon_uqrshrn : Intrinsic::aarch64_neon_sqrshrn;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vqrshrn_n");
   case NEON::BI__builtin_neon_vrndah_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_round
-              : Intrinsic::round;
+    Int = Intrinsic::round;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrnda");
   }
   case NEON::BI__builtin_neon_vrnda_v:
   case NEON::BI__builtin_neon_vrndaq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_round
-              : Intrinsic::round;
+    Int = Intrinsic::round;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrnda");
   }
   case NEON::BI__builtin_neon_vrndih_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_nearbyint
-              : Intrinsic::nearbyint;
+    Int = Intrinsic::nearbyint;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndi");
   }
   case NEON::BI__builtin_neon_vrndmh_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_floor
-              : Intrinsic::floor;
+    Int = Intrinsic::floor;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndm");
   }
   case NEON::BI__builtin_neon_vrndm_v:
   case NEON::BI__builtin_neon_vrndmq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_floor
-              : Intrinsic::floor;
+    Int = Intrinsic::floor;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndm");
   }
   case NEON::BI__builtin_neon_vrndnh_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_roundeven
-              : Intrinsic::roundeven;
+    Int = Intrinsic::roundeven;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndn");
   }
   case NEON::BI__builtin_neon_vrndn_v:
   case NEON::BI__builtin_neon_vrndnq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_roundeven
-              : Intrinsic::roundeven;
+    Int = Intrinsic::roundeven;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndn");
   }
   case NEON::BI__builtin_neon_vrndns_f32: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_roundeven
-              : Intrinsic::roundeven;
+    Int = Intrinsic::roundeven;
     return EmitNeonCall(CGM.getIntrinsic(Int, FloatTy), Ops, "vrndn");
   }
   case NEON::BI__builtin_neon_vrndph_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_ceil
-              : Intrinsic::ceil;
+    Int = Intrinsic::ceil;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndp");
   }
   case NEON::BI__builtin_neon_vrndp_v:
   case NEON::BI__builtin_neon_vrndpq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_ceil
-              : Intrinsic::ceil;
+    Int = Intrinsic::ceil;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndp");
   }
   case NEON::BI__builtin_neon_vrndxh_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_rint
-              : Intrinsic::rint;
+    Int = Intrinsic::rint;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndx");
   }
   case NEON::BI__builtin_neon_vrndx_v:
   case NEON::BI__builtin_neon_vrndxq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_rint
-              : Intrinsic::rint;
+    Int = Intrinsic::rint;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndx");
   }
   case NEON::BI__builtin_neon_vrndh_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_trunc
-              : Intrinsic::trunc;
+    Int = Intrinsic::trunc;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndz");
   }
   case NEON::BI__builtin_neon_vrnd32x_f32:
@@ -6369,9 +6326,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
   }
   case NEON::BI__builtin_neon_vrnd_v:
   case NEON::BI__builtin_neon_vrndq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_trunc
-              : Intrinsic::trunc;
+    Int = Intrinsic::trunc;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndz");
   }
   case NEON::BI__builtin_neon_vcvt_f64_v:
@@ -6516,16 +6471,12 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vpminnm");
   }
   case NEON::BI__builtin_neon_vsqrth_f16: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_sqrt
-              : Intrinsic::sqrt;
+    Int = Intrinsic::sqrt;
     return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vsqrt");
   }
   case NEON::BI__builtin_neon_vsqrt_v:
   case NEON::BI__builtin_neon_vsqrtq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_sqrt
-              : Intrinsic::sqrt;
+    Int = Intrinsic::sqrt;
     Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vsqrt");
   }
diff --git a/clang/lib/CodeGen/TargetBuiltins/PPC.cpp b/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
index 721071308c251..5e0bc06cbb398 100644
--- a/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
@@ -507,14 +507,8 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
   case PPC::BI__builtin_vsx_xvsqrtdp: {
     llvm::Type *ResultType = ConvertType(E->getType());
     Value *X = EmitScalarExpr(E->getArg(0));
-    if (Builder.getIsFPConstrained()) {
-      llvm::Function *F = CGM.getIntrinsic(
-          Intrinsic::experimental_constrained_sqrt, ResultType);
-      return Builder.CreateConstrainedFPCall(F, X);
-    } else {
-      llvm::Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType);
-      return Builder.CreateCall(F, X);
-    }
+    llvm::Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType);
+    return Builder.CreateCall(F, X);
   }
   // Count leading zeros
   case PPC::BI__builtin_altivec_vclzb:
@@ -760,32 +754,21 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
     Value *X = EmitScalarExpr(E->getArg(0));
     if (BuiltinID == PPC::BI__builtin_vsx_xvrdpim ||
         BuiltinID == PPC::BI__builtin_vsx_xvrspim)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_floor
-               : Intrinsic::floor;
+      ID = Intrinsic::floor;
     else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpi ||
              BuiltinID == PPC::BI__builtin_vsx_xvrspi)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_round
-               : Intrinsic::round;
+      ID = Intrinsic::round;
     else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpic ||
              BuiltinID == PPC::BI__builtin_vsx_xvrspic)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_rint
-               : Intrinsic::rint;
+      ID = Intrinsic::rint;
     else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpip ||
              BuiltinID == PPC::BI__builtin_vsx_xvrspip)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_ceil
-               : Intrinsic::ceil;
+      ID = Intrinsic::ceil;
     else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpiz ||
              BuiltinID == PPC::BI__builtin_vsx_xvrspiz)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_trunc
-               : Intrinsic::trunc;
+      ID = Intrinsic::trunc;
     llvm::Function *F = CGM.getIntrinsic(ID, ResultType);
-    return Builder.getIsFPConstrained() ? Builder.CreateConstrainedFPCall(F, X)
-                                        : Builder.CreateCall(F, X);
+    return Builder.CreateCall(F, X);
   }
 
   // Absolute value
@@ -864,44 +847,23 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
     Value *X = EmitScalarExpr(E->getArg(0));
     Value *Y = EmitScalarExpr(E->getArg(1));
     Value *Z = EmitScalarExpr(E->getArg(2));
-    llvm::Function *F;
-    if (Builder.getIsFPConstrained())
-      F = CGM.getIntrinsic(Intrinsic::experimental_constrained_fma, ResultType);
-    else
-      F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
+    llvm::Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
     switch (BuiltinID) {
       case PPC::BI__builtin_vsx_xvmaddadp:
       case PPC::BI__builtin_vsx_xvmaddasp:
-        if (Builder.getIsFPConstrained())
-          return Builder.CreateConstrainedFPCall(F, {X, Y, Z});
-        else
-          return Builder.CreateCall(F, {X, Y, Z});
+        return Builder.CreateCall(F, {X, Y, Z});
       case PPC::BI__builtin_vsx_xvnmaddadp:
       case PPC::BI__builtin_vsx_xvnmaddasp:
-        if (Builder.getIsFPConstrained())
-          return Builder.CreateFNeg(
-              Builder.CreateConstrainedFPCall(F, {X, Y, Z}), "neg");
-        else
-          return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");
+        return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");
       case PPC::BI__builtin_vsx_xvmsubadp:
       case PPC::BI__builtin_vsx_xvmsubasp:
-        if (Builder.getIsFPConstrained())
-          return Builder.CreateConstrainedFPCall(
-              F, {X, Y, Builder.CreateFNeg(Z, "neg")});
-        else
-          return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
+        return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
       case PPC::BI__builtin_ppc_fnmsub:
       case PPC::BI__builtin_ppc_fnmsubs:
       case PPC::BI__builtin_vsx_xvnmsubadp:
       case PPC::BI__builtin_vsx_xvnmsubasp:
-        if (Builder.getIsFPConstrained())
-          return Builder.CreateFNeg(
-              Builder.CreateConstrainedFPCall(
-                  F, {X, Y, Builder.CreateFNeg(Z, "neg")}),
-              "neg");
-        else
-          return Builder.CreateCall(
-              CGM.getIntrinsic(Intrinsic::ppc_fnmsub, ResultType), {X, Y, Z});
+        return Builder.CreateCall(
+            CGM.getIntrinsic(Intrinsic::ppc_fnmsub, ResultType), {X, Y, Z});
       }
     llvm_unreachable("Unknown FMA operation");
     return nullptr; // Suppress no-return warning
@@ -1270,37 +1232,37 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
   case PPC::BI__builtin_ppc_fric:
     return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                            *this, E, Intrinsic::rint,
-                           Intrinsic::experimental_constrained_rint))
+                           0))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frim:
   case PPC::BI__builtin_ppc_frims:
     return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                            *this, E, Intrinsic::floor,
-                           Intrinsic::experimental_constrained_floor))
+                           0))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frin:
   case PPC::BI__builtin_ppc_frins:
     return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                            *this, E, Intrinsic::round,
-                           Intrinsic::experimental_constrained_round))
+                           0))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frip:
   case PPC::BI__builtin_ppc_frips:
     return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                            *this, E, Intrinsic::ceil,
-                           Intrinsic::experimental_constrained_ceil))
+                           0))
         .getScalarVal();
   case PPC::BI__builtin_ppc_friz:
   case PPC::BI__builtin_ppc_frizs:
     return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                            *this, E, Intrinsic::trunc,
-                           Intrinsic::experimental_constrained_trunc))
+                           0))
         .getScalarVal();
   case PPC::BI__builtin_ppc_fsqrt:
   case PPC::BI__builtin_ppc_fsqrts:
     return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
                            *this, E, Intrinsic::sqrt,
-                           Intrinsic::experimental_constrained_sqrt))
+                           0))
         .getScalarVal();
   case PPC::BI__builtin_ppc_test_data_class: {
     Value *Op0 = EmitScalarExpr(E->getArg(0));
diff --git a/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp b/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp
index a7c25b29d1dba..4346771ad4804 100644
--- a/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp
@@ -128,13 +128,8 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
   case SystemZ::BI__builtin_s390_vfsqdb: {
     llvm::Type *ResultType = ConvertType(E->getType());
     Value *X = EmitScalarExpr(E->getArg(0));
-    if (Builder.getIsFPConstrained()) {
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_sqrt, ResultType);
-      return Builder.CreateConstrainedFPCall(F, { X });
-    } else {
-      Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType);
-      return Builder.CreateCall(F, X);
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType);
+    return Builder.CreateCall(F, X);
   }
   case SystemZ::BI__builtin_s390_vfmasb:
   case SystemZ::BI__builtin_s390_vfmadb: {
@@ -142,13 +137,8 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     Value *X = EmitScalarExpr(E->getArg(0));
     Value *Y = EmitScalarExpr(E->getArg(1));
     Value *Z = EmitScalarExpr(E->getArg(2));
-    if (Builder.getIsFPConstrained()) {
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_fma, ResultType);
-      return Builder.CreateConstrainedFPCall(F, {X, Y, Z});
-    } else {
-      Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
-      return Builder.CreateCall(F, {X, Y, Z});
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
+    return Builder.CreateCall(F, {X, Y, Z});
   }
   case SystemZ::BI__builtin_s390_vfmssb:
   case SystemZ::BI__builtin_s390_vfmsdb: {
@@ -156,13 +146,8 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     Value *X = EmitScalarExpr(E->getArg(0));
     Value *Y = EmitScalarExpr(E->getArg(1));
     Value *Z = EmitScalarExpr(E->getArg(2));
-    if (Builder.getIsFPConstrained()) {
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_fma, ResultType);
-      return Builder.CreateConstrainedFPCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
-    } else {
-      Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
-      return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
+    return Builder.CreateCall(F, {X, Y, Builder.CreateFNeg(Z, "neg")});
   }
   case SystemZ::BI__builtin_s390_vfnmasb:
   case SystemZ::BI__builtin_s390_vfnmadb: {
@@ -170,13 +155,8 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     Value *X = EmitScalarExpr(E->getArg(0));
     Value *Y = EmitScalarExpr(E->getArg(1));
     Value *Z = EmitScalarExpr(E->getArg(2));
-    if (Builder.getIsFPConstrained()) {
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_fma, ResultType);
-      return Builder.CreateFNeg(Builder.CreateConstrainedFPCall(F, {X, Y,  Z}), "neg");
-    } else {
-      Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
-      return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
+    return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, Z}), "neg");
   }
   case SystemZ::BI__builtin_s390_vfnmssb:
   case SystemZ::BI__builtin_s390_vfnmsdb: {
@@ -184,15 +164,9 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     Value *X = EmitScalarExpr(E->getArg(0));
     Value *Y = EmitScalarExpr(E->getArg(1));
     Value *Z = EmitScalarExpr(E->getArg(2));
-    if (Builder.getIsFPConstrained()) {
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_fma, ResultType);
-      Value *NegZ = Builder.CreateFNeg(Z, "sub");
-      return Builder.CreateFNeg(Builder.CreateConstrainedFPCall(F, {X, Y, NegZ}));
-    } else {
-      Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
-      Value *NegZ = Builder.CreateFNeg(Z, "neg");
-      return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, NegZ}));
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::fma, ResultType);
+    Value *NegZ = Builder.CreateFNeg(Z, "neg");
+    return Builder.CreateFNeg(Builder.CreateCall(F, {X, Y, NegZ}));
   }
   case SystemZ::BI__builtin_s390_vflpsb:
   case SystemZ::BI__builtin_s390_vflpdb: {
@@ -218,42 +192,29 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     // Check whether this instance can be represented via a LLVM standard
     // intrinsic.  We only support some combinations of M4 and M5.
     Intrinsic::ID ID = Intrinsic::not_intrinsic;
-    Intrinsic::ID CI;
     switch (M4.getZExtValue()) {
     default: break;
     case 0:  // IEEE-inexact exception allowed
       switch (M5.getZExtValue()) {
       default: break;
-      case 0: ID = Intrinsic::rint;
-              CI = Intrinsic::experimental_constrained_rint; break;
+      case 0: ID = Intrinsic::rint; break;
       }
       break;
     case 4:  // IEEE-inexact exception suppressed
       switch (M5.getZExtValue()) {
       default: break;
-      case 0: ID = Intrinsic::nearbyint;
-              CI = Intrinsic::experimental_constrained_nearbyint; break;
-      case 1: ID = Intrinsic::round;
-              CI = Intrinsic::experimental_constrained_round; break;
-      case 4: ID = Intrinsic::roundeven;
-              CI = Intrinsic::experimental_constrained_roundeven; break;
-      case 5: ID = Intrinsic::trunc;
-              CI = Intrinsic::experimental_constrained_trunc; break;
-      case 6: ID = Intrinsic::ceil;
-              CI = Intrinsic::experimental_constrained_ceil; break;
-      case 7: ID = Intrinsic::floor;
-              CI = Intrinsic::experimental_constrained_floor; break;
+      case 0: ID = Intrinsic::nearbyint; break;
+      case 1: ID = Intrinsic::round; break;
+      case 4: ID = Intrinsic::roundeven; break;
+      case 5: ID = Intrinsic::trunc; break;
+      case 6: ID = Intrinsic::ceil; break;
+      case 7: ID = Intrinsic::floor; break;
       }
       break;
     }
     if (ID != Intrinsic::not_intrinsic) {
-      if (Builder.getIsFPConstrained()) {
-        Function *F = CGM.getIntrinsic(CI, ResultType);
-        return Builder.CreateConstrainedFPCall(F, X);
-      } else {
-        Function *F = CGM.getIntrinsic(ID, ResultType);
-        return Builder.CreateCall(F, X);
-      }
+      Function *F = CGM.getIntrinsic(ID, ResultType);
+      return Builder.CreateCall(F, X);
     }
     switch (BuiltinID) { // FIXME: constrained version?
       case SystemZ::BI__builtin_s390_vfisb: ID = Intrinsic::s390_vfisb; break;
@@ -275,20 +236,13 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     // Check whether this instance can be represented via a LLVM standard
     // intrinsic.  We only support some values of M4.
     Intrinsic::ID ID = Intrinsic::not_intrinsic;
-    Intrinsic::ID CI;
     switch (M4.getZExtValue()) {
     default: break;
-    case 4: ID = Intrinsic::maxnum;
-            CI = Intrinsic::experimental_constrained_maxnum; break;
+    case 4: ID = Intrinsic::maxnum; break;
     }
     if (ID != Intrinsic::not_intrinsic) {
-      if (Builder.getIsFPConstrained()) {
-        Function *F = CGM.getIntrinsic(CI, ResultType);
-        return Builder.CreateConstrainedFPCall(F, {X, Y});
-      } else {
-        Function *F = CGM.getIntrinsic(ID, ResultType);
-        return Builder.CreateCall(F, {X, Y});
-      }
+      Function *F = CGM.getIntrinsic(ID, ResultType);
+      return Builder.CreateCall(F, {X, Y});
     }
     switch (BuiltinID) {
       case SystemZ::BI__builtin_s390_vfmaxsb: ID = Intrinsic::s390_vfmaxsb; break;
@@ -309,20 +263,13 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     // Check whether this instance can be represented via a LLVM standard
     // intrinsic.  We only support some values of M4.
     Intrinsic::ID ID = Intrinsic::not_intrinsic;
-    Intrinsic::ID CI;
     switch (M4.getZExtValue()) {
     default: break;
-    case 4: ID = Intrinsic::minnum;
-            CI = Intrinsic::experimental_constrained_minnum; break;
+    case 4: ID = Intrinsic::minnum; break;
     }
     if (ID != Intrinsic::not_intrinsic) {
-      if (Builder.getIsFPConstrained()) {
-        Function *F = CGM.getIntrinsic(CI, ResultType);
-        return Builder.CreateConstrainedFPCall(F, {X, Y});
-      } else {
-        Function *F = CGM.getIntrinsic(ID, ResultType);
-        return Builder.CreateCall(F, {X, Y});
-      }
+      Function *F = CGM.getIntrinsic(ID, ResultType);
+      return Builder.CreateCall(F, {X, Y});
     }
     switch (BuiltinID) {
       case SystemZ::BI__builtin_s390_vfminsb: ID = Intrinsic::s390_vfminsb; break;
diff --git a/clang/lib/CodeGen/TargetBuiltins/X86.cpp b/clang/lib/CodeGen/TargetBuiltins/X86.cpp
index 9645ed87b8ef3..ba3f648c45498 100644
--- a/clang/lib/CodeGen/TargetBuiltins/X86.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/X86.cpp
@@ -83,32 +83,6 @@ static Value *emitX86RoundImmediate(CodeGenFunction &CGF, Value *X,
   unsigned RoundingMode = RoundingControl & RoundingMask;
 
   Intrinsic::ID ID = Intrinsic::not_intrinsic;
-  LLVMContext &Ctx = CGF.CGM.getLLVMContext();
-  if (CGF.Builder.getIsFPConstrained()) {
-
-    Value *ExceptMode =
-        MetadataAsValue::get(Ctx, MDString::get(Ctx, "fpexcept.ignore"));
-
-    switch (RoundingMode) {
-    case 0b00:
-      ID = Intrinsic::experimental_constrained_roundeven;
-      break;
-    case 0b01:
-      ID = Intrinsic::experimental_constrained_floor;
-      break;
-    case 0b10:
-      ID = Intrinsic::experimental_constrained_ceil;
-      break;
-    case 0b11:
-      ID = Intrinsic::experimental_constrained_trunc;
-      break;
-    default:
-      llvm_unreachable("Invalid rounding mode");
-    }
-
-    Function *F = CGF.CGM.getIntrinsic(ID, X->getType());
-    return CGF.Builder.CreateCall(F, {X, ExceptMode});
-  }
 
   switch (RoundingMode) {
   case 0b00:
@@ -448,15 +422,8 @@ static Value *EmitX86FMAExpr(CodeGenFunction &CGF, const CallExpr *E,
     Res = CGF.Builder.CreateCall(Intr, {A, B, C, Ops.back() });
   } else {
     llvm::Type *Ty = A->getType();
-    Function *FMA;
-    if (CGF.Builder.getIsFPConstrained()) {
-      CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-      FMA = CGF.CGM.getIntrinsic(Intrinsic::experimental_constrained_fma, Ty);
-      Res = CGF.Builder.CreateConstrainedFPCall(FMA, {A, B, C});
-    } else {
-      FMA = CGF.CGM.getIntrinsic(Intrinsic::fma, Ty);
-      Res = CGF.Builder.CreateCall(FMA, {A, B, C});
-    }
+    Function *FMA = CGF.CGM.getIntrinsic(Intrinsic::fma, Ty);
+    Res = CGF.Builder.CreateCall(FMA, {A, B, C});
   }
 
   // Handle any required masking.
@@ -533,11 +500,6 @@ static Value *EmitScalarFMAExpr(CodeGenFunction &CGF, const CallExpr *E,
     }
     Res = CGF.Builder.CreateCall(CGF.CGM.getIntrinsic(IID),
                                  {Ops[0], Ops[1], Ops[2], Ops[4]});
-  } else if (CGF.Builder.getIsFPConstrained()) {
-    CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-    Function *FMA = CGF.CGM.getIntrinsic(
-        Intrinsic::experimental_constrained_fma, Ops[0]->getType());
-    Res = CGF.Builder.CreateConstrainedFPCall(FMA, Ops.slice(0, 3));
   } else {
     Function *FMA = CGF.CGM.getIntrinsic(Intrinsic::fma, Ops[0]->getType());
     Res = CGF.Builder.CreateCall(FMA, Ops.slice(0, 3));
@@ -2322,16 +2284,8 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
       return Builder.CreateCall(CGM.getIntrinsic(IID), Ops);
     }
     Value *A = Builder.CreateExtractElement(Ops[1], (uint64_t)0);
-    Function *F;
-    if (Builder.getIsFPConstrained()) {
-      CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
-      F = CGM.getIntrinsic(Intrinsic::experimental_constrained_sqrt,
-                           A->getType());
-      A = Builder.CreateConstrainedFPCall(F, A);
-    } else {
-      F = CGM.getIntrinsic(Intrinsic::sqrt, A->getType());
-      A = Builder.CreateCall(F, A);
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::sqrt, A->getType());
+    A = Builder.CreateCall(F, A);
     Value *Src = Builder.CreateExtractElement(Ops[2], (uint64_t)0);
     A = EmitX86ScalarSelect(*this, Ops[3], A, Src);
     return Builder.CreateInsertElement(Ops[0], A, (uint64_t)0);
@@ -2360,15 +2314,8 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
       }
       return Builder.CreateCall(CGM.getIntrinsic(IID), Ops);
     }
-    if (Builder.getIsFPConstrained()) {
-      CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
-      Function *F = CGM.getIntrinsic(Intrinsic::experimental_constrained_sqrt,
-                                     Ops[0]->getType());
-      return Builder.CreateConstrainedFPCall(F, Ops[0]);
-    } else {
-      Function *F = CGM.getIntrinsic(Intrinsic::sqrt, Ops[0]->getType());
-      return Builder.CreateCall(F, Ops[0]);
-    }
+    Function *F = CGM.getIntrinsic(Intrinsic::sqrt, Ops[0]->getType());
+    return Builder.CreateCall(F, Ops[0]);
   }
 
   case X86::BI__builtin_ia32_pmuludq128:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 6f34005f3e945..05028a3790320 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3279,6 +3279,170 @@ A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
 When present, the operand bundle must contain exactly one value of token type.
 See the :doc:`ConvergentOperations` document for details.
 
+.. _ob_fp:
+
+Floating-point Operand Bundles
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These operand bundles are attached to call instructions that perform
+floating-point operations requiring non-default control of the
+:ref:`floating-point environment <floatenv>` — rounding mode, denormal
+handling, or exception behavior.  There are two independent bundle tags:
+``fp.control`` and ``fp.except``.  Either or both may appear on a call;
+their absence means "use the target default."
+
+.. _fpcontrolbundle:
+
+``fp.control`` Bundle
+"""""""""""""""""""""
+
+An operand bundle tagged ``"fp.control"`` carries one or more metadata string
+operands describing the FP control modes in effect for the call.  Two kinds
+of mode are supported: rounding mode and denormal behavior.
+
+**Rounding mode** is a single metadata string.  Exactly one rounding-mode
+operand may appear.  If no rounding-mode operand is present the operation
+uses the hardware control register's current value (dynamic rounding).  In
+the :ref:`default FP environment <floatenv>` that is round-to-nearest,
+ties-to-even.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 10 40
+
+   * - Value
+     - Meaning
+   * - ``"rte"``
+     - Round to nearest, ties to even (IEEE default)
+   * - ``"rtz"``
+     - Round toward zero (truncation)
+   * - ``"rtp"``
+     - Round toward positive infinity (ceiling)
+   * - ``"rtn"``
+     - Round toward negative infinity (floor)
+   * - ``"rmm"``
+     - Round to nearest, ties away from zero
+   * - ``"dyn"``
+     - Rounding mode is read from the hardware control register at runtime
+
+Examples:
+
+.. code-block:: llvm
+
+    ; Add two floats, rounding toward zero instead of the default round-to-nearest.
+    %r = call float @llvm.fadd.f32(float %a, float %b)
+             [ "fp.control"(metadata !"rtz") ]
+
+    ; Divide, rounding toward positive infinity, with strict exception tracking.
+    %r = call float @llvm.fdiv.f32(float %a, float %b)
+             [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
+
+    ; Multiply with an explicitly dynamic rounding mode (optimizer must assume
+    ; rounding mode can change between calls).
+    %r = call float @llvm.fmul.f32(float %a, float %b)
+             [ "fp.control"(metadata !"dyn") ]
+
+**Denormal behavior** is specified separately for inputs and outputs using
+operands of the form ``"denorm.in=<mode>"`` and ``"denorm.out=<mode>"``.
+Each applies to all floating-point types unless overridden for ``float``
+specifically with ``"denorm.f32.in=<mode>"`` or ``"denorm.f32.out=<mode>"``.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 10 40
+
+   * - Mode
+     - Meaning
+   * - ``"ieee"``
+     - Preserve denormals (IEEE behavior)
+   * - ``"zero"``
+     - Flush denormals to zero, preserving sign (±0.0)
+   * - ``"pzero"``
+     - Flush denormals to positive zero (+0.0)
+   * - ``"dyn"``
+     - Denormal mode is read from a hardware register at runtime
+
+Examples:
+
+.. code-block:: llvm
+
+    ; Flush denormal inputs to +0.0 before truncating.
+    call float @llvm.trunc.f32(float %x)
+             [ "fp.control"(metadata !"denorm.in=pzero") ]
+
+    ; Globally flush denormal inputs to signed zero, but override for f32
+    ; inputs to use IEEE behavior (preserve denormals).
+    call float @llvm.trunc.f32(float %x)
+             [ "fp.control"(metadata !"denorm.in=zero", metadata !"denorm.f32.in=ieee") ]
+
+    ; Combine a rounding mode with denormal flushing.
+    %r = call float @llvm.fadd.f32(float %a, float %b)
+             [ "fp.control"(metadata !"rtz", metadata !"denorm.in=zero",
+                            metadata !"denorm.out=zero") ]
+
+.. _fpexceptbundle:
+
+``fp.except`` Bundle
+""""""""""""""""""""
+
+An operand bundle tagged ``"fp.except"`` describes how the call interacts
+with the hardware FP exception status flags.  It contains a single metadata
+string operand.  When this bundle is absent on a call that does not have
+``memory(inaccessiblemem: readwrite)`` semantics (e.g. a plain
+:ref:`fadd <i_fadd>` instruction or an :ref:`llvm.fadd <int_fadd>` intrinsic
+without an ``fp.except`` bundle), the optimizer assumes exceptions need not be
+preserved.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 10 40
+
+   * - Value
+     - Meaning
+   * - ``"ignore"``
+     - FP exception flags are not observed.  The optimizer may freely
+       eliminate, reorder, or transform the operation in ways that would
+       change what exceptions are raised.
+   * - ``"maytrap"``
+     - The operation may raise a hardware exception (signal), but the
+       program does not read the sticky exception flags.  The optimizer must
+       not suppress the operation or introduce new exceptions, but has some
+       freedom to reorder with respect to flag reads.
+   * - ``"strict"``
+     - FP exceptions are precisely tracked.  The program may read the sticky
+       exception-status flags and must see exactly the exceptions raised by
+       this operation in program order.  The optimizer must not eliminate,
+       duplicate, or reorder the call with respect to any other
+       exception-observable operation.
+
+When ``fp.except`` is present with any value, the call is given
+``memory(inaccessiblemem: readwrite)`` and ``willreturn`` attributes, making
+it non-speculatable.  Calls with ``"ignore"`` may still be CSE'd if they are
+otherwise identical; ``"strict"`` calls may not.
+
+Examples:
+
+.. code-block:: llvm
+
+    ; Divide with exceptions ignored — optimizer may hoist, sink, or fold freely.
+    %r = call float @llvm.fdiv.f32(float %a, float %b)
+             [ "fp.except"(metadata !"ignore") ]
+
+    ; Multiply rounding toward zero; program traps on overflow but doesn't
+    ; poll the status flags.
+    %r = call float @llvm.fmul.f32(float %a, float %b)
+             [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"maytrap") ]
+
+    ; Add with strict exception tracking — the Inexact or Overflow flag must
+    ; be raised exactly as IEEE 754 requires, in program order.
+    %r = call float @llvm.fadd.f32(float %a, float %b)
+             [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"strict") ]
+
+    ; Signaling compare — raises Invalid if either operand is a NaN.
+    ; With "strict", the exception is precisely preserved.
+    %c = call i1 @llvm.fcmps.f32(float %x, float %y, metadata !"oeq")
+             [ "fp.except"(metadata !"strict") ]
+
 .. _deactivationsymbol:
 
 Deactivation Symbol Operand Bundles
@@ -4054,8 +4218,9 @@ round-to-nearest rounding mode, and subnormals are assumed to be preserved.
 Running LLVM code in an environment where these assumptions are not met
 typically leads to undefined behavior. The ``strictfp`` and
 :ref:`denormal_fpenv <denormal_fpenv>` attributes as well as
-:ref:`Constrained Floating-Point Intrinsics <constrainedfp>` can be
-used to weaken LLVM's assumptions and ensure defined behavior in
+:ref:`FP arithmetic intrinsics <fpintrin>` with
+:ref:`floating-point operand bundles<ob_fp>` can be used to
+weaken LLVM's assumptions and ensure defined behavior in
 non-default floating-point environments; see their respective
 documentation for details.
 
@@ -4109,8 +4274,8 @@ Floating-point math operations are allowed to treat all NaNs as if they were
 quiet NaNs. For example, "pow(1.0, SNaN)" may be simplified to 1.0.
 
 Code that requires different behavior than this should use the
-:ref:`Constrained Floating-Point Intrinsics <constrainedfp>`.
-In particular, constrained intrinsics rule out the "Unchanged NaN propagation"
+:ref:`FP arithmetic intrinsics <fpintrin>` with ``fp.except`` operand bundles.
+In particular, those intrinsics rule out the "Unchanged NaN propagation"
 case; they are guaranteed to return a QNaN.
 
 Unfortunately, due to hard-or-impossible-to-fix issues, LLVM violates its own
@@ -10459,6 +10624,9 @@ This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
 
+See also the :ref:`llvm.fneg <int_fneg>` intrinsic, which accepts an optional
+``fp.control`` operand bundle for non-default denormal handling.
+
 Example:
 """"""""
 
@@ -10564,6 +10732,10 @@ This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
 
+See also the :ref:`llvm.fadd <int_fadd>` intrinsic, which accepts
+:ref:`fp.control and fp.except <ob_fp>` operand bundles for non-default FP
+environment control.
+
 Example:
 """"""""
 
@@ -10660,6 +10832,10 @@ This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
 
+See also the :ref:`llvm.fsub <int_fsub>` intrinsic, which accepts
+:ref:`fp.control and fp.except <ob_fp>` operand bundles for non-default FP
+environment control.
+
 Example:
 """"""""
 
@@ -10757,6 +10933,10 @@ This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
 
+See also the :ref:`llvm.fmul <int_fmul>` intrinsic, which accepts
+:ref:`fp.control and fp.except <ob_fp>` operand bundles for non-default FP
+environment control.
+
 Example:
 """"""""
 
@@ -10895,6 +11075,10 @@ This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
 
+See also the :ref:`llvm.fdiv <int_fdiv>` intrinsic, which accepts
+:ref:`fp.control and fp.except <ob_fp>` operand bundles for non-default FP
+environment control.
+
 Example:
 """"""""
 
@@ -11051,6 +11235,10 @@ This instruction can also take any number of :ref:`fast-math
 flags <fastmath>`, which are optimization hints to enable otherwise
 unsafe floating-point optimizations:
 
+See also the :ref:`llvm.frem <int_frem>` intrinsic, which accepts
+:ref:`fp.control and fp.except <ob_fp>` operand bundles for non-default FP
+environment control.
+
 Example:
 """"""""
 
@@ -13409,6 +13597,8 @@ The '``fcmp``' instruction takes three operands. The first operand is
 the condition code indicating the kind of comparison to perform. It is
 not a value, just a keyword. The possible condition codes are:
 
+.. _fcmp_md_cc:
+
 #. ``false``: no comparison, always returns false
 #. ``oeq``: ordered and equal
 #. ``ogt``: ordered and greater than
@@ -13436,6 +13626,8 @@ They must have identical types.
 Semantics:
 """"""""""
 
+.. _fcmp_md_cc_sem:
+
 The '``fcmp``' instruction compares ``op1`` and ``op2`` according to the
 condition code given as ``cond``. If the operands are vectors, then the
 vectors are compared element by element. Each comparison performed
@@ -13479,6 +13671,10 @@ only flags that have any effect on its semantics are those that allow
 assumptions to be made about the values of input arguments; namely
 ``nnan``, ``ninf``, and ``reassoc``. See :ref:`fastmath` for more information.
 
+See also the :ref:`llvm.fcmp <int_fcmp>` intrinsic (quiet comparison with
+optional :ref:`fp.control and fp.except <ob_fp>` bundles) and the
+:ref:`llvm.fcmps <int_fcmps>` intrinsic (signaling comparison).
+
 Example:
 """"""""
 
@@ -18406,8 +18602,8 @@ This function returns the same values as the libm ``rint`` functions
 would, and handles error conditions in the same way. Since LLVM assumes the
 :ref:`default floating-point environment <floatenv>`, the rounding mode is
 assumed to be set to "nearest", so halfway cases are rounded to the even
-integer. Use :ref:`Constrained Floating-Point Intrinsics <constrainedfp>`
-to avoid that assumption.
+integer. Use :ref:`FP arithmetic intrinsics <fpintrin>` with an
+``fp.control`` bundle to avoid that assumption.
 
 .. _int_nearbyint:
 
@@ -18448,8 +18644,8 @@ This function returns the same values as the libm ``nearbyint``
 functions would, and handles error conditions in the same way. Since LLVM
 assumes the :ref:`default floating-point environment <floatenv>`, the rounding
 mode is assumed to be set to "nearest", so halfway cases are rounded to the even
-integer. Use :ref:`Constrained Floating-Point Intrinsics <constrainedfp>` to
-avoid that assumption.
+integer. Use :ref:`FP arithmetic intrinsics <fpintrin>` with an
+``fp.control`` bundle to avoid that assumption.
 
 .. _int_round:
 
@@ -28276,1973 +28472,462 @@ It does not read any memory and can be speculated.
 
 
 
-.. _constrainedfp:
-
-Constrained Floating-Point Intrinsics
--------------------------------------
-
-These intrinsics are used to provide special handling of floating-point
-operations when specific rounding mode or floating-point exception behavior is
-required.  By default, LLVM optimization passes assume that the rounding mode is
-round-to-nearest and that floating-point exceptions will not be monitored.
-Constrained FP intrinsics are used to support non-default rounding modes and
-accurately preserve exception behavior without compromising LLVM's ability to
-optimize FP code when the default behavior is used.
-
-If any FP operation in a function is constrained then they all must be
-constrained. This is required for correct LLVM IR. Optimizations that
-move code around can create miscompiles if mixing of constrained and normal
-operations is done. The correct way to mix constrained and less constrained
-operations is to use the rounding mode and exception handling metadata to
-mark constrained intrinsics as having LLVM's default behavior.
-
-Each of these intrinsics corresponds to a normal floating-point operation. The
-data arguments and the return value are the same as the corresponding FP
-operation.
-
-The rounding mode argument is a metadata string specifying what
-assumptions, if any, the optimizer can make when transforming constant
-values. Some constrained FP intrinsics omit this argument. If required
-by the intrinsic, this argument must be one of the following strings:
-
-::
-
-      "round.dynamic"
-      "round.tonearest"
-      "round.downward"
-      "round.upward"
-      "round.towardzero"
-      "round.tonearestaway"
-
-If this argument is "round.dynamic" optimization passes must assume that the
-rounding mode is unknown and may change at runtime.  No transformations that
-depend on rounding mode may be performed in this case.
-
-The other possible values for the rounding mode argument correspond to the
-similarly named IEEE rounding modes.  If the argument is any of these values
-optimization passes may perform transformations as long as they are consistent
-with the specified rounding mode.
-
-For example, 'x-0'->'x' is not a valid transformation if the rounding mode is
-"round.downward" or "round.dynamic" because if the value of 'x' is +0 then
-'x-0' should evaluate to '-0' when rounding downward.  However, this
-transformation is legal for all other rounding modes.
-
-For values other than "round.dynamic" optimization passes may assume that the
-actual runtime rounding mode (as defined in a target-specific manner) matches
-the specified rounding mode, but this is not guaranteed.  Using a specific
-non-dynamic rounding mode which does not match the actual rounding mode at
-runtime results in undefined behavior.
-
-The exception behavior argument is a metadata string describing the floating
-point exception semantics that required for the intrinsic. This argument
-must be one of the following strings:
-
-::
-
-      "fpexcept.ignore"
-      "fpexcept.maytrap"
-      "fpexcept.strict"
-
-If this argument is "fpexcept.ignore" optimization passes may assume that the
-exception status flags will not be read and that floating-point exceptions will
-be masked.  This allows transformations to be performed that may change the
-exception semantics of the original code.  For example, FP operations may be
-speculatively executed in this case whereas they must not be for either of the
-other possible values of this argument.
-
-If the exception behavior argument is "fpexcept.maytrap" optimization passes
-must avoid transformations that may raise exceptions that would not have been
-raised by the original code (such as speculatively executing FP operations), but
-passes are not required to preserve all exceptions that are implied by the
-original code.  For example, exceptions may be potentially hidden by constant
-folding.
-
-If the exception behavior argument is "fpexcept.strict" all transformations must
-strictly preserve the floating-point exception semantics of the original code.
-Any FP exception that would have been raised by the original code must be raised
-by the transformed code, and the transformed code must not raise any FP
-exceptions that would not have been raised by the original code.  This is the
-exception behavior argument that will be used if the code being compiled reads
-the FP exception status flags, but this mode can also be used with code that
-unmasks FP exceptions.
-
-The number and order of floating-point exceptions is NOT guaranteed.  For
-example, a series of FP operations that each may raise exceptions may be
-vectorized into a single instruction that raises each unique exception a single
-time.
-
-Proper :ref:`function attributes <fnattrs>` usage is required for the
-constrained intrinsics to function correctly.
-
-All function *calls* done in a function that uses constrained floating
-point intrinsics must have the ``strictfp`` attribute either on the
-calling instruction or on the declaration or definition of the function
-being called.
-
-All function *definitions* that use constrained floating point intrinsics
-must have the ``strictfp`` attribute.
-
-'``llvm.experimental.constrained.fadd``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.fadd``' intrinsic returns the sum of its
-two arguments.
-
+.. _fpintrin:
 
-Arguments:
-""""""""""
+FP Arithmetic Intrinsics
+------------------------
 
-The first two arguments to the '``llvm.experimental.constrained.fadd``'
-intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
-of floating-point values. Both arguments must have identical types.
+These intrinsics are intrinsic-form equivalents of the standard floating-point
+instructions (:ref:`fadd <i_fadd>`, :ref:`fsub <i_fsub>`, :ref:`fmul <i_fmul>`,
+:ref:`fdiv <i_fdiv>`, :ref:`frem <i_frem>`, :ref:`fneg <i_fneg>`,
+and :ref:`fcmp <i_fcmp>`).  Unlike the plain instructions, they may carry
+:ref:`fp.control and fp.except <ob_fp>` operand bundles to express per-call FP
+environment requirements.
 
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
+When no operand bundles are present, an intrinsic behaves identically to the
+corresponding instruction.  In particular, fast-math flags attached to the call
+have the same meaning as they do on the corresponding instruction.
 
-Semantics:
-""""""""""
+When an ``fp.except`` bundle is present, the call is treated as potentially
+raising an FP exception and must be given ``memory(inaccessiblemem: readwrite)``
+semantics; such calls are not speculatable and must be treated conservatively by
+DCE and CSE.  ``llvm.fcmps`` is always non-speculatable regardless of whether an
+``fp.except`` bundle is present, because a signaling comparison can raise an
+Invalid Operation exception whenever a NaN operand is encountered.
 
-The value produced is the floating-point sum of the two value arguments and has
-the same type as the arguments.
+All FP arithmetic intrinsics accept an optional ``fp.control``
+:ref:`operand bundle <ob_fp>` specifying the rounding mode and denormal handling
+for that operation.
 
+.. _int_fadd:
 
-'``llvm.experimental.constrained.fsub``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.fadd.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fadd`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <type>
-      @llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
+      declare half      @llvm.fadd.f16(half %op1, half %op2)
+      declare bfloat    @llvm.fadd.bf16(bfloat %op1, bfloat %op2)
+      declare float     @llvm.fadd.f32(float %op1, float %op2)
+      declare double    @llvm.fadd.f64(double %op1, double %op2)
+      declare x86_fp80  @llvm.fadd.f80(x86_fp80 %op1, x86_fp80 %op2)
+      declare fp128     @llvm.fadd.f128(fp128 %op1, fp128 %op2)
+      declare ppc_fp128 @llvm.fadd.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.fsub``' intrinsic returns the difference
-of its two arguments.
-
+Intrinsic form of the :ref:`fadd <i_fadd>` instruction.  Returns the
+floating-point sum of its two operands.
 
 Arguments:
 """"""""""
 
-The first two arguments to the '``llvm.experimental.constrained.fsub``'
-intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
-of floating-point values. Both arguments must have identical types.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
+The arguments and return value are floating-point numbers of the same type.
+Optional operand bundles ``fp.control`` and ``fp.except`` are described in
+:ref:`Floating-point Operand Bundles <ob_fp>`.
 
 Semantics:
 """"""""""
 
-The value produced is the floating-point difference of the two value arguments
-and has the same type as the arguments.
+Equivalent to the :ref:`fadd <i_fadd>` instruction.  When ``fp.except`` is
+absent the call is ``memory(none)`` and speculatable.  When ``fp.except`` is
+present the call is ``memory(inaccessiblemem: readwrite) willreturn`` and may
+not be reordered past other exception-observable operations.
 
+Example:
+""""""""
 
-'``llvm.experimental.constrained.fmul``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+::
+
+      ; Add in round-toward-zero mode, preserving any FP exception.
+      %r = call float @llvm.fadd.f32(float %a, float %b)
+               [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"strict") ]
+
+.. _int_fsub:
+
+'``llvm.fsub.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fsub`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <type>
-      @llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
+      declare half      @llvm.fsub.f16(half %op1, half %op2)
+      declare bfloat    @llvm.fsub.bf16(bfloat %op1, bfloat %op2)
+      declare float     @llvm.fsub.f32(float %op1, float %op2)
+      declare double    @llvm.fsub.f64(double %op1, double %op2)
+      declare x86_fp80  @llvm.fsub.f80(x86_fp80 %op1, x86_fp80 %op2)
+      declare fp128     @llvm.fsub.f128(fp128 %op1, fp128 %op2)
+      declare ppc_fp128 @llvm.fsub.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.fmul``' intrinsic returns the product of
-its two arguments.
-
+Intrinsic form of the :ref:`fsub <i_fsub>` instruction.  Returns the
+floating-point difference of its two operands.
 
 Arguments:
 """"""""""
 
-The first two arguments to the '``llvm.experimental.constrained.fmul``'
-intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
-of floating-point values. Both arguments must have identical types.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
+The arguments and return value are floating-point numbers of the same type.
+Optional operand bundles ``fp.control`` and ``fp.except`` are described in
+:ref:`Floating-point Operand Bundles <ob_fp>`.
 
 Semantics:
 """"""""""
 
-The value produced is the floating-point product of the two value arguments and
-has the same type as the arguments.
+Equivalent to the :ref:`fsub <i_fsub>` instruction.  When ``fp.except`` is
+absent the call is ``memory(none)`` and speculatable.  When ``fp.except`` is
+present the call is ``memory(inaccessiblemem: readwrite) willreturn`` and may
+not be reordered past other exception-observable operations.
+
+Example:
+""""""""
 
+::
 
-'``llvm.experimental.constrained.fdiv``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+      ; Subtract with downward rounding and strict exception tracking.
+      %r = call double @llvm.fsub.f64(double %a, double %b)
+               [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"strict") ]
+
+.. _int_fmul:
+
+'``llvm.fmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fmul`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <type>
-      @llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
+      declare half      @llvm.fmul.f16(half %op1, half %op2)
+      declare bfloat    @llvm.fmul.bf16(bfloat %op1, bfloat %op2)
+      declare float     @llvm.fmul.f32(float %op1, float %op2)
+      declare double    @llvm.fmul.f64(double %op1, double %op2)
+      declare x86_fp80  @llvm.fmul.f80(x86_fp80 %op1, x86_fp80 %op2)
+      declare fp128     @llvm.fmul.f128(fp128 %op1, fp128 %op2)
+      declare ppc_fp128 @llvm.fmul.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.fdiv``' intrinsic returns the quotient of
-its two arguments.
-
+Intrinsic form of the :ref:`fmul <i_fmul>` instruction.  Returns the
+floating-point product of its two operands.
 
 Arguments:
 """"""""""
 
-The first two arguments to the '``llvm.experimental.constrained.fdiv``'
-intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
-of floating-point values. Both arguments must have identical types.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
+The arguments and return value are floating-point numbers of the same type.
+Optional operand bundles ``fp.control`` and ``fp.except`` are described in
+:ref:`Floating-point Operand Bundles <ob_fp>`.
 
 Semantics:
 """"""""""
 
-The value produced is the floating-point quotient of the two value arguments and
-has the same type as the arguments.
+Equivalent to the :ref:`fmul <i_fmul>` instruction.  When ``fp.except`` is
+absent the call is ``memory(none)`` and speculatable.  When ``fp.except`` is
+present the call is ``memory(inaccessiblemem: readwrite) willreturn`` and may
+not be reordered past other exception-observable operations.
+
+Example:
+""""""""
+
+::
 
+      ; Multiply with upward rounding, allowing the operation to trap.
+      %r = call float @llvm.fmul.f32(float %a, float %b)
+               [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"maytrap") ]
 
-'``llvm.experimental.constrained.frem``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. _int_fdiv:
+
+'``llvm.fdiv.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fdiv`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <type>
-      @llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
+      declare half      @llvm.fdiv.f16(half %op1, half %op2)
+      declare bfloat    @llvm.fdiv.bf16(bfloat %op1, bfloat %op2)
+      declare float     @llvm.fdiv.f32(float %op1, float %op2)
+      declare double    @llvm.fdiv.f64(double %op1, double %op2)
+      declare x86_fp80  @llvm.fdiv.f80(x86_fp80 %op1, x86_fp80 %op2)
+      declare fp128     @llvm.fdiv.f128(fp128 %op1, fp128 %op2)
+      declare ppc_fp128 @llvm.fdiv.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder
-from the division of its two arguments.
-
+Intrinsic form of the :ref:`fdiv <i_fdiv>` instruction.  Returns the
+floating-point quotient of its two operands.
 
 Arguments:
 """"""""""
 
-The first two arguments to the '``llvm.experimental.constrained.frem``'
-intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
-of floating-point values. Both arguments must have identical types.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.  The rounding mode argument has no effect, since
-the result of frem is never rounded, but the argument is included for
-consistency with the other constrained floating-point intrinsics.
+The arguments and return value are floating-point numbers of the same type.
+Optional operand bundles ``fp.control`` and ``fp.except`` are described in
+:ref:`Floating-point Operand Bundles <ob_fp>`.
 
 Semantics:
 """"""""""
 
-The value produced is the floating-point remainder from the division of the two
-value arguments and has the same type as the arguments.  The remainder has the
-same sign as the dividend.
-
-'``llvm.experimental.constrained.fma``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Equivalent to the :ref:`fdiv <i_fdiv>` instruction.  When ``fp.except`` is
+absent the call is ``memory(none)`` and speculatable.  When ``fp.except`` is
+present the call is ``memory(inaccessiblemem: readwrite) willreturn`` and may
+not be reordered past other exception-observable operations.
 
-Syntax:
-"""""""
+Example:
+""""""""
 
 ::
 
-      declare <type>
-      @llvm.experimental.constrained.fma(<type> <op1>, <type> <op2>, <type> <op3>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.fma``' intrinsic returns the result of a
-fused-multiply-add operation on its arguments.
-
-Arguments:
-""""""""""
-
-The first three arguments to the '``llvm.experimental.constrained.fma``'
-intrinsic must be :ref:`floating-point <t_floating>` or :ref:`vector
-<t_vector>` of floating-point values. All arguments must have identical types.
+      ; Divide toward positive infinity, preserving the DivideByZero exception.
+      %r = call double @llvm.fdiv.f64(double %a, double %b)
+               [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
 
-The fourth and fifth arguments specify the rounding mode and exception behavior
-as described above.
+.. _int_frem:
 
-Semantics:
-""""""""""
-
-The result produced is the product of the first two arguments added to the third
-argument computed with infinite precision, and then rounded to the target
-precision.
-
-'``llvm.experimental.constrained.fptoui``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.frem.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.frem`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.fptoui(<type> <value>,
-                                          metadata <exception behavior>)
+      declare half      @llvm.frem.f16(half %op1, half %op2)
+      declare bfloat    @llvm.frem.bf16(bfloat %op1, bfloat %op2)
+      declare float     @llvm.frem.f32(float %op1, float %op2)
+      declare double    @llvm.frem.f64(double %op1, double %op2)
+      declare x86_fp80  @llvm.frem.f80(x86_fp80 %op1, x86_fp80 %op2)
+      declare fp128     @llvm.frem.f128(fp128 %op1, fp128 %op2)
+      declare ppc_fp128 @llvm.frem.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.fptoui``' intrinsic converts a
-floating-point ``value`` to its unsigned integer equivalent of type ``ty2``.
+Intrinsic form of the :ref:`frem <i_frem>` instruction.  Returns the
+floating-point remainder of dividing the first operand by the second.
 
 Arguments:
 """"""""""
 
-The first argument to the '``llvm.experimental.constrained.fptoui``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
-<t_vector>` of floating point values.
-
-The second argument specifies the exception behavior as described above.
+The arguments and return value are floating-point numbers of the same type.
+Optional operand bundles ``fp.control`` and ``fp.except`` are described in
+:ref:`Floating-point Operand Bundles <ob_fp>`.
 
 Semantics:
 """"""""""
 
-The result produced is an unsigned integer converted from the floating
-point argument. The value is truncated, so it is rounded towards zero.
+Equivalent to the :ref:`frem <i_frem>` instruction.  When ``fp.except`` is
+absent the call is ``memory(none)`` and speculatable.  When ``fp.except`` is
+present the call is ``memory(inaccessiblemem: readwrite) willreturn`` and may
+not be reordered past other exception-observable operations.
 
-'``llvm.experimental.constrained.fptosi``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
+Example:
+""""""""
 
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.fptosi(<type> <value>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.fptosi``' intrinsic converts
-:ref:`floating-point <t_floating>` ``value`` to type ``ty2``.
-
-Arguments:
-""""""""""
-
-The first argument to the '``llvm.experimental.constrained.fptosi``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
-<t_vector>` of floating point values.
-
-The second argument specifies the exception behavior as described above.
+      ; Compute remainder with strict exception tracking.
+      %r = call float @llvm.frem.f32(float %a, float %b)
+               [ "fp.except"(metadata !"strict") ]
 
-Semantics:
-""""""""""
+.. _int_fneg:
 
-The result produced is a signed integer converted from the floating
-point argument. The value is truncated, so it is rounded towards zero.
-
-'``llvm.experimental.constrained.uitofp``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.fneg.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fneg`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.uitofp(<type> <value>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
+      declare half      @llvm.fneg.f16(half %op1)
+      declare bfloat    @llvm.fneg.bf16(bfloat %op1)
+      declare float     @llvm.fneg.f32(float %op1)
+      declare double    @llvm.fneg.f64(double %op1)
+      declare x86_fp80  @llvm.fneg.f80(x86_fp80 %op1)
+      declare fp128     @llvm.fneg.f128(fp128 %op1)
+      declare ppc_fp128 @llvm.fneg.ppcf128(ppc_fp128 %op1)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.uitofp``' intrinsic converts an
-unsigned integer ``value`` to a floating-point of type ``ty2``.
+Intrinsic form of the :ref:`fneg <i_fneg>` instruction.  Returns the
+floating-point negation of its operand.
 
 Arguments:
 """"""""""
 
-The first argument to the '``llvm.experimental.constrained.uitofp``'
-intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
-<t_vector>` of integer values.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
+The argument and return value are floating-point numbers of the same type.
+An optional ``fp.control`` bundle may specify denormal handling; ``fp.except``
+is not applicable because ``fneg`` cannot raise FP exceptions.
 
 Semantics:
 """"""""""
 
-An inexact floating-point exception will be raised if rounding is required.
-Any result produced is a floating point value converted from the input
-integer argument.
-
-'``llvm.experimental.constrained.sitofp``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Equivalent to the :ref:`fneg <i_fneg>` instruction.  Always ``memory(none)``
+and speculatable.
 
-Syntax:
-"""""""
+Example:
+""""""""
 
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.sitofp(<type> <value>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.sitofp``' intrinsic converts a
-signed integer ``value`` to a floating-point of type ``ty2``.
-
-Arguments:
-""""""""""
-
-The first argument to the '``llvm.experimental.constrained.sitofp``'
-intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
-<t_vector>` of integer values.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
+      ; Negate with denormal inputs flushed to positive zero.
+      %r = call float @llvm.fneg.f32(float %x)
+               [ "fp.control"(metadata !"denorm.in=pzero") ]
 
-An inexact floating-point exception will be raised if rounding is required.
-Any result produced is a floating point value converted from the input
-integer argument.
+.. _int_fcmp:
 
-'``llvm.experimental.constrained.fptrunc``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.fcmp.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fcmp`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.fptrunc(<type> <value>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
+      declare i1        @llvm.fcmp.f16(half %op1, half %op2, metadata %cc)
+      declare i1        @llvm.fcmp.bf16(bfloat %op1, bfloat %op2, metadata %cc)
+      declare i1        @llvm.fcmp.f32(float %op1, float %op2, metadata %cc)
+      declare i1        @llvm.fcmp.f64(double %op1, double %op2, metadata %cc)
+      declare i1        @llvm.fcmp.f80(x86_fp80 %op1, x86_fp80 %op2, metadata %cc)
+      declare i1        @llvm.fcmp.f128(fp128 %op1, fp128 %op2, metadata %cc)
+      declare i1        @llvm.fcmp.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2, metadata %cc)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value``
-to type ``ty2``.
+Intrinsic form of the :ref:`fcmp <i_fcmp>` instruction.  Performs a **quiet**
+floating-point comparison: raises an FP Invalid Operation exception only if an
+operand is a signaling NaN (sNaN).  For vector types the return type is a
+vector of ``i1`` with the same number of elements as the operands.
 
 Arguments:
 """"""""""
 
-The first argument to the '``llvm.experimental.constrained.fptrunc``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
-<t_vector>` of floating point values. This argument must be larger in size
-than the result.
+The two operands must be floating-point numbers of the same type.  The third
+argument is a metadata string giving the comparison predicate; valid values are
+:ref:`the same as for the fcmp instruction <fcmp_md_cc>`: ``"oeq"``,
+``"ogt"``, ``"oge"``, ``"olt"``, ``"ole"``, ``"one"``, ``"ord"``, ``"ueq"``,
+``"ugt"``, ``"uge"``, ``"ult"``, ``"ule"``, ``"une"``, ``"uno"``, ``"true"``,
+and ``"false"``.
 
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
+Optional operand bundles ``fp.control`` and ``fp.except`` are described in
+:ref:`Floating-point Operand Bundles <ob_fp>`.
 
 Semantics:
 """"""""""
 
-The result produced is a floating point value truncated to be smaller in size
-than the argument.
+The comparison semantics :ref:`follow those of the fcmp instruction
+<fcmp_md_cc_sem>`.  When ``fp.except`` is absent the call is ``memory(none)``
+and speculatable.  When ``fp.except`` is present the call is
+``memory(inaccessiblemem: readwrite) willreturn``.
 
-'``llvm.experimental.constrained.fpext``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
+Example:
+""""""""
 
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.fpext(<type> <value>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.fpext``' intrinsic extends a
-floating-point ``value`` to a larger floating-point value.
-
-Arguments:
-""""""""""
-
-The first argument to the '``llvm.experimental.constrained.fpext``'
-intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
-<t_vector>` of floating point values. This argument must be smaller in size
-than the result.
-
-The second argument specifies the exception behavior as described above.
+      ; Quiet ordered-equal compare, preserving any Invalid exception from sNaN.
+      %c = call i1 @llvm.fcmp.f32(float %a, float %b, metadata !"oeq")
+               [ "fp.except"(metadata !"strict") ]
 
-Semantics:
-""""""""""
+.. _int_fcmps:
 
-The result produced is a floating point value extended to be larger in size
-than the argument. All restrictions that apply to the fpext instruction also
-apply to this intrinsic.
-
-'``llvm.experimental.constrained.fcmp``' and '``llvm.experimental.constrained.fcmps``' Intrinsics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.fcmps.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
+This is an overloaded intrinsic. You can use ``llvm.fcmps`` on any
+floating-point or vector of floating-point type. Not all targets support
+all types however.
+
 ::
 
-      declare <ty2>
-      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
-                                          metadata <condition code>,
-                                          metadata <exception behavior>)
-      declare <ty2>
-      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
-                                           metadata <condition code>,
-                                           metadata <exception behavior>)
+      declare i1        @llvm.fcmps.f16(half %op1, half %op2, metadata %cc)
+      declare i1        @llvm.fcmps.bf16(bfloat %op1, bfloat %op2, metadata %cc)
+      declare i1        @llvm.fcmps.f32(float %op1, float %op2, metadata %cc)
+      declare i1        @llvm.fcmps.f64(double %op1, double %op2, metadata %cc)
+      declare i1        @llvm.fcmps.f80(x86_fp80 %op1, x86_fp80 %op2, metadata %cc)
+      declare i1        @llvm.fcmps.f128(fp128 %op1, fp128 %op2, metadata %cc)
+      declare i1        @llvm.fcmps.ppcf128(ppc_fp128 %op1, ppc_fp128 %op2, metadata %cc)
 
 Overview:
 """""""""
 
-The '``llvm.experimental.constrained.fcmp``' and
-'``llvm.experimental.constrained.fcmps``' intrinsics return a boolean
-value or vector of boolean values based on comparison of its arguments.
-
-If the arguments are floating-point scalars, then the result type is a
-boolean (:ref:`i1 <t_integer>`).
-
-If the arguments are floating-point vectors, then the result type is a
-vector of boolean with the same number of elements as the arguments being
-compared.
-
-The '``llvm.experimental.constrained.fcmp``' intrinsic performs a quiet
-comparison operation while the '``llvm.experimental.constrained.fcmps``'
-intrinsic performs a signaling comparison operation.
+Performs a **signaling** floating-point comparison.  Unlike
+:ref:`llvm.fcmp <int_fcmp>`, a signaling comparison raises an FP Invalid
+Operation exception whenever either operand is any NaN — including quiet NaNs
+(qNaN).  There is no corresponding plain instruction; the closest equivalent is
+:ref:`fcmp <i_fcmp>`.  For vector types the return type is a vector of ``i1``
+with the same number of elements as the operands.
 
 Arguments:
 """"""""""
 
-The first two arguments to the '``llvm.experimental.constrained.fcmp``'
-and '``llvm.experimental.constrained.fcmps``' intrinsics must be
-:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
-of floating-point values. Both arguments must have identical types.
+Same as :ref:`llvm.fcmp <int_fcmp>`.
 
-The third argument is the condition code indicating the kind of comparison
-to perform. It must be a metadata string with one of the following values:
+Semantics:
+""""""""""
 
-.. _fcmp_md_cc:
+``llvm.fcmps`` is always ``memory(inaccessiblemem: readwrite) willreturn``
+regardless of whether an ``fp.except`` bundle is present, because it can always
+raise an exception when a NaN operand is encountered.  The ``fp.except`` bundle
+controls whether that exception must be preserved in the surrounding code.
 
-- "``oeq``": ordered and equal
-- "``ogt``": ordered and greater than
-- "``oge``": ordered and greater than or equal
-- "``olt``": ordered and less than
-- "``ole``": ordered and less than or equal
-- "``one``": ordered and not equal
-- "``ord``": ordered (no nans)
-- "``ueq``": unordered or equal
-- "``ugt``": unordered or greater than
-- "``uge``": unordered or greater than or equal
-- "``ult``": unordered or less than
-- "``ule``": unordered or less than or equal
-- "``une``": unordered or not equal
-- "``uno``": unordered (either nans)
+Example:
+""""""""
 
-*Ordered* means that neither argument is a NAN while *unordered* means
-that either argument may be a NAN.
+::
 
-The fourth argument specifies the exception behavior as described above.
+      ; Signaling compare — preserve the Invalid exception on NaN.
+      %r = call i1 @llvm.fcmps.f32(float %a, float %b, metadata !"oeq")
+               [ "fp.except"(metadata !"strict") ]
 
-Semantics:
-""""""""""
 
-``op1`` and ``op2`` are compared according to the condition code given
-as the third argument. If the arguments are vectors, then the
-vectors are compared element by element. Each comparison performed
-always yields an :ref:`i1 <t_integer>` result, as follows:
-
-.. _fcmp_md_cc_sem:
-
-- "``oeq``": yields ``true`` if both arguments are not a NAN and ``op1``
-  is equal to ``op2``.
-- "``ogt``": yields ``true`` if both arguments are not a NAN and ``op1``
-  is greater than ``op2``.
-- "``oge``": yields ``true`` if both arguments are not a NAN and ``op1``
-  is greater than or equal to ``op2``.
-- "``olt``": yields ``true`` if both arguments are not a NAN and ``op1``
-  is less than ``op2``.
-- "``ole``": yields ``true`` if both arguments are not a NAN and ``op1``
-  is less than or equal to ``op2``.
-- "``one``": yields ``true`` if both arguments are not a NAN and ``op1``
-  is not equal to ``op2``.
-- "``ord``": yields ``true`` if both arguments are not a NAN.
-- "``ueq``": yields ``true`` if either argument is a NAN or ``op1`` is
-  equal to ``op2``.
-- "``ugt``": yields ``true`` if either argument is a NAN or ``op1`` is
-  greater than ``op2``.
-- "``uge``": yields ``true`` if either argument is a NAN or ``op1`` is
-  greater than or equal to ``op2``.
-- "``ult``": yields ``true`` if either argument is a NAN or ``op1`` is
-  less than ``op2``.
-- "``ule``": yields ``true`` if either argument is a NAN or ``op1`` is
-  less than or equal to ``op2``.
-- "``une``": yields ``true`` if either argument is a NAN or ``op1`` is
-  not equal to ``op2``.
-- "``uno``": yields ``true`` if either argument is a NAN.
-
-The quiet comparison operation performed by
-'``llvm.experimental.constrained.fcmp``' will only raise an exception
-if either argument is a SNAN.  The signaling comparison operation
-performed by '``llvm.experimental.constrained.fcmps``' will raise an
-exception if either argument is a NAN (QNAN or SNAN). Such an exception
-does not preclude a result being produced (e.g., exception might only
-set a flag), therefore the distinction between ordered and unordered
-comparisons is also relevant for the
-'``llvm.experimental.constrained.fcmps``' intrinsic.
-
-'``llvm.experimental.constrained.fmuladd``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.fmuladd(<type> <op1>, <type> <op2>,
-                                             <type> <op3>,
-                                             metadata <rounding mode>,
-                                             metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.fmuladd``' intrinsic represents
-multiply-add expressions that can be fused if the code generator determines
-that (a) the target instruction set has support for a fused operation,
-and (b) that the fused operation is more efficient than the equivalent,
-separate pair of mul and add instructions.
-
-Arguments:
-""""""""""
-
-The first three arguments to the '``llvm.experimental.constrained.fmuladd``'
-intrinsic must be floating-point or vector of floating-point values.
-All three arguments must have identical types.
-
-The fourth and fifth arguments specify the rounding mode and exception behavior
-as described above.
-
-Semantics:
-""""""""""
-
-The expression:
-
-::
-
-      %0 = call float @llvm.experimental.constrained.fmuladd.f32(%a, %b, %c,
-                                                                 metadata <rounding mode>,
-                                                                 metadata <exception behavior>)
-
-is equivalent to the expression:
-
-::
-
-      %0 = call float @llvm.experimental.constrained.fmul.f32(%a, %b,
-                                                              metadata <rounding mode>,
-                                                              metadata <exception behavior>)
-      %1 = call float @llvm.experimental.constrained.fadd.f32(%0, %c,
-                                                              metadata <rounding mode>,
-                                                              metadata <exception behavior>)
-
-except that it is unspecified whether rounding will be performed between the
-multiplication and addition steps. Fusion is not guaranteed, even if the target
-platform supports it.
-If a fused multiply-add is required, the corresponding
-:ref:`llvm.experimental.constrained.fma <int_fma>` intrinsic function should be
-used instead.
-This never sets errno, just as '``llvm.experimental.constrained.fma.*``'.
-
-Constrained libm-equivalent Intrinsics
---------------------------------------
-
-In addition to the basic floating-point operations for which constrained
-intrinsics are described above, there are constrained versions of various
-operations which provide equivalent behavior to a corresponding libm function.
-These intrinsics allow the precise behavior of these operations with respect to
-rounding mode and exception behavior to be controlled.
-
-As with the basic constrained floating-point intrinsics, the rounding mode
-and exception behavior arguments only control the behavior of the optimizer.
-They do not change the runtime floating-point environment.
-
-
-'``llvm.experimental.constrained.sqrt``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.sqrt(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.sqrt``' intrinsic returns the square root
-of the specified value, returning the same value as the libm '``sqrt``'
-functions would, but without setting ``errno``.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the nonnegative square root of the specified value.
-If the value is less than negative zero, a floating-point exception occurs
-and the return value is architecture specific.
-
-
-'``llvm.experimental.constrained.pow``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,
-                                         metadata <rounding mode>,
-                                         metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.pow``' intrinsic returns the first argument
-raised to the (positive or negative) power specified by the second argument.
-
-Arguments:
-""""""""""
-
-The first two arguments and the return value are floating-point numbers of the
-same type.  The second argument specifies the power to which the first argument
-should be raised.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the first value raised to the second power,
-returning the same values as the libm ``pow`` functions would, and
-handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.powi``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.powi``' intrinsic returns the first argument
-raised to the (positive or negative) power specified by the second argument. The
-order of evaluation of multiplications is not defined. When a vector of
-floating-point type is used, the second argument remains a scalar integer value.
-
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.  The second argument is a 32-bit signed integer specifying the power to
-which the first argument should be raised.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the first value raised to the second power with an
-unspecified sequence of rounding operations.
-
-
-'``llvm.experimental.constrained.ldexp``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type0>
-      @llvm.experimental.constrained.ldexp(<type0> <op1>, <type1> <op2>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.ldexp``' performs the ldexp function.
-
-
-Arguments:
-""""""""""
-
-The first argument and the return value are :ref:`floating-point
-<t_floating>` or :ref:`vector <t_vector>` of floating-point values of
-the same type. The second argument is an integer with the same number
-of elements.
-
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function multiplies the first argument by 2 raised to the second
-argument's power. If the first argument is NaN or infinite, the same
-value is returned. If the result underflows a zero with the same sign
-is returned. If the result overflows, the result is an infinity with
-the same sign.
-
-
-'``llvm.experimental.constrained.sin``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.sin(<type> <op1>,
-                                         metadata <rounding mode>,
-                                         metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.sin``' intrinsic returns the sine of the
-first argument.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the sine of the specified argument, returning the
-same values as the libm ``sin`` functions would, and handles error
-conditions in the same way.
-
-
-'``llvm.experimental.constrained.cos``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.cos(<type> <op1>,
-                                         metadata <rounding mode>,
-                                         metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.cos``' intrinsic returns the cosine of the
-first argument.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the cosine of the specified argument, returning the
-same values as the libm ``cos`` functions would, and handles error
-conditions in the same way.
-
-
-'``llvm.experimental.constrained.tan``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.tan(<type> <op1>,
-                                         metadata <rounding mode>,
-                                         metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.tan``' intrinsic returns the tangent of the
-first argument.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the tangent of the specified argument, returning the
-same values as the libm ``tan`` functions would, and handles error
-conditions in the same way.
-
-'``llvm.experimental.constrained.asin``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.asin(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.asin``' intrinsic returns the arcsine of the
-first operand.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the arcsine of the specified operand, returning the
-same values as the libm ``asin`` functions would, and handles error
-conditions in the same way.
-
-
-'``llvm.experimental.constrained.acos``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.acos(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.acos``' intrinsic returns the arccosine of the
-first operand.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the arccosine of the specified operand, returning the
-same values as the libm ``acos`` functions would, and handles error
-conditions in the same way.
-
-
-'``llvm.experimental.constrained.atan``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.atan(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.atan``' intrinsic returns the arctangent of the
-first operand.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the arctangent of the specified operand, returning the
-same values as the libm ``atan`` functions would, and handles error
-conditions in the same way.
-
-'``llvm.experimental.constrained.atan2``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.atan2(<type> <op1>,
-                                           <type> <op2>,
-                                           metadata <rounding mode>,
-                                           metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.atan2``' intrinsic returns the arctangent
-of ``<op1>`` divided by ``<op2>`` accounting for the quadrant.
-
-Arguments:
-""""""""""
-
-The first two arguments and the return value are floating-point numbers of the
-same type.
-
-The third and fourth arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the quadrant-specific arctangent using the specified
-operands, returning the same values as the libm ``atan2`` functions would, and
-handles error conditions in the same way.
-
-'``llvm.experimental.constrained.sinh``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.sinh(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.sinh``' intrinsic returns the hyperbolic sine of the
-first operand.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the hyperbolic sine of the specified operand, returning the
-same values as the libm ``sinh`` functions would, and handles error
-conditions in the same way.
-
-
-'``llvm.experimental.constrained.cosh``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.cosh(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.cosh``' intrinsic returns the hyperbolic cosine of the
-first operand.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the hyperbolic cosine of the specified operand, returning the
-same values as the libm ``cosh`` functions would, and handles error
-conditions in the same way.
-
-
-'``llvm.experimental.constrained.tanh``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.tanh(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.tanh``' intrinsic returns the hyperbolic tangent of the
-first operand.
-
-Arguments:
-""""""""""
-
-The first argument and the return type are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the hyperbolic tangent of the specified operand, returning the
-same values as the libm ``tanh`` functions would, and handles error
-conditions in the same way.
-
-'``llvm.experimental.constrained.exp``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.exp(<type> <op1>,
-                                         metadata <rounding mode>,
-                                         metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.exp``' intrinsic computes the base-e
-exponential of the specified value.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``exp`` functions
-would, and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.exp2``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.exp2(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.exp2``' intrinsic computes the base-2
-exponential of the specified value.
-
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``exp2`` functions
-would, and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.log``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.log(<type> <op1>,
-                                         metadata <rounding mode>,
-                                         metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.log``' intrinsic computes the base-e
-logarithm of the specified value.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``log`` functions
-would, and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.log10``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.log10(<type> <op1>,
-                                           metadata <rounding mode>,
-                                           metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.log10``' intrinsic computes the base-10
-logarithm of the specified value.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``log10`` functions
-would, and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.log2``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.log2(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.log2``' intrinsic computes the base-2
-logarithm of the specified value.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``log2`` functions
-would, and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.rint``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.rint(<type> <op1>,
-                                          metadata <rounding mode>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.rint``' intrinsic returns the first
-argument rounded to the nearest integer. It may raise an inexact floating-point
-exception if the argument is not an integer.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``rint`` functions
-would, and handles error conditions in the same way.  The rounding mode is
-described, not determined, by the rounding mode argument.  The actual rounding
-mode is determined by the runtime floating-point environment.  The rounding
-mode argument is only intended as information to the compiler.
-
-
-'``llvm.experimental.constrained.lrint``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <inttype>
-      @llvm.experimental.constrained.lrint(<fptype> <op1>,
-                                           metadata <rounding mode>,
-                                           metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.lrint``' intrinsic returns the first
-argument rounded to the nearest integer. An inexact floating-point exception
-will be raised if the argument is not an integer. If the rounded value is too
-large to fit into the result type, an invalid exception is raised, and the
-return value is a non-deterministic value (equivalent to `freeze poison`).
-
-Arguments:
-""""""""""
-
-The first argument is a floating-point number. The return value is an
-integer type. Not all types are supported on all targets. The supported
-types are the same as the ``llvm.lrint`` intrinsic and the ``lrint``
-libm functions.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``lrint`` functions
-would, and handles error conditions in the same way.
-
-The rounding mode is described, not determined, by the rounding mode
-argument.  The actual rounding mode is determined by the runtime floating-point
-environment.  The rounding mode argument is only intended as information
-to the compiler.
-
-If the runtime floating-point environment is using the default rounding mode
-then the results will be the same as the ``llvm.lrint`` intrinsic.
-
-
-'``llvm.experimental.constrained.llrint``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <inttype>
-      @llvm.experimental.constrained.llrint(<fptype> <op1>,
-                                            metadata <rounding mode>,
-                                            metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.llrint``' intrinsic returns the first
-argument rounded to the nearest integer. An inexact floating-point exception
-will be raised if the argument is not an integer. If the rounded value is too
-large to fit into the result type, an invalid exception is raised, and the
-return value is a non-deterministic value (equivalent to `freeze poison`).
-
-Arguments:
-""""""""""
-
-The first argument is a floating-point number. The return value is an
-integer type. Not all types are supported on all targets. The supported
-types are the same as the ``llvm.llrint`` intrinsic and the ``llrint``
-libm functions.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``llrint`` functions
-would, and handles error conditions in the same way.
-
-The rounding mode is described, not determined, by the rounding mode
-argument.  The actual rounding mode is determined by the runtime floating-point
-environment.  The rounding mode argument is only intended as information
-to the compiler.
-
-If the runtime floating-point environment is using the default rounding mode
-then the results will be the same as the ``llvm.llrint`` intrinsic.
-
-
-'``llvm.experimental.constrained.nearbyint``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.nearbyint(<type> <op1>,
-                                               metadata <rounding mode>,
-                                               metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first
-argument rounded to the nearest integer. It will not raise an inexact
-floating-point exception if the argument is not an integer.
-
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second and third arguments specify the rounding mode and exception
-behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``nearbyint`` functions
-would, and handles error conditions in the same way.  The rounding mode is
-described, not determined, by the rounding mode argument.  The actual rounding
-mode is determined by the runtime floating-point environment.  The rounding
-mode argument is only intended as information to the compiler.
-
-
-'``llvm.experimental.constrained.maxnum``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
+.. _int_noalias_scope_decl:
 
-::
-
-      declare <type>
-      @llvm.experimental.constrained.maxnum(<type> <op1>, <type> <op2>
-                                            metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.maxnum``' intrinsic returns the maximum
-of the two arguments.
-
-Arguments:
-""""""""""
-
-The first two arguments and the return value are floating-point numbers
-of the same type.
-
-The third argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function follows the IEEE 754-2008 semantics for maxNum.
-
-
-'``llvm.experimental.constrained.minnum``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.minnum(<type> <op1>, <type> <op2>
-                                            metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.minnum``' intrinsic returns the minimum
-of the two arguments.
-
-Arguments:
-""""""""""
-
-The first two arguments and the return value are floating-point numbers
-of the same type.
-
-The third argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function follows the IEEE 754-2008 semantics for minNum.
-
-
-'``llvm.experimental.constrained.maximum``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.maximum(<type> <op1>, <type> <op2>
-                                             metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.maximum``' intrinsic returns the maximum
-of the two arguments, propagating NaNs and treating -0.0 as less than +0.0.
-
-Arguments:
-""""""""""
-
-The first two arguments and the return value are floating-point numbers
-of the same type.
-
-The third argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function follows semantics specified in the draft of IEEE 754-2019.
-
-
-'``llvm.experimental.constrained.minimum``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.minimum(<type> <op1>, <type> <op2>
-                                             metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.minimum``' intrinsic returns the minimum
-of the two arguments, propagating NaNs and treating -0.0 as less than +0.0.
-
-Arguments:
-""""""""""
-
-The first two arguments and the return value are floating-point numbers
-of the same type.
-
-The third argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function follows semantics specified in the draft of IEEE 754-2019.
-
-
-'``llvm.experimental.constrained.ceil``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.ceil(<type> <op1>,
-                                          metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.ceil``' intrinsic returns the ceiling of the
-first argument.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``ceil`` functions
-would and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.floor``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.floor(<type> <op1>,
-                                           metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.floor``' intrinsic returns the floor of the
-first argument.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``floor`` functions
-would and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.round``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.round(<type> <op1>,
-                                           metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.round``' intrinsic returns the first
-argument rounded to the nearest integer.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``round`` functions
-would and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.roundeven``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.roundeven(<type> <op1>,
-                                               metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.roundeven``' intrinsic returns the first
-argument rounded to the nearest integer in floating-point format, rounding
-halfway cases to even (that is, to the nearest value that is an even integer),
-regardless of the current rounding direction.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function implements IEEE 754 operation ``roundToIntegralTiesToEven``. It
-also behaves in the same way as C standard function ``roundeven`` and can signal
-the invalid operation exception for a SNAN argument.
-
-
-'``llvm.experimental.constrained.lround``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <inttype>
-      @llvm.experimental.constrained.lround(<fptype> <op1>,
-                                            metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.lround``' intrinsic returns the first
-argument rounded to the nearest integer with ties away from zero.  It will
-raise an inexact floating-point exception if the argument is not an integer.
-If the rounded value is too large to fit into the result type, an invalid
-exception is raised, and the return value is a non-deterministic value
-(equivalent to `freeze poison`).
-
-Arguments:
-""""""""""
-
-The first argument is a floating-point number. The return value is an
-integer type. Not all types are supported on all targets. The supported
-types are the same as the ``llvm.lround`` intrinsic and the ``lround``
-libm functions.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``lround`` functions
-would and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.llround``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <inttype>
-      @llvm.experimental.constrained.llround(<fptype> <op1>,
-                                             metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.llround``' intrinsic returns the first
-argument rounded to the nearest integer with ties away from zero. It will
-raise an inexact floating-point exception if the argument is not an integer.
-If the rounded value is too large to fit into the result type, an invalid
-exception is raised, and the return value is a non-deterministic value
-(equivalent to `freeze poison`).
-
-Arguments:
-""""""""""
-
-The first argument is a floating-point number. The return value is an
-integer type. Not all types are supported on all targets. The supported
-types are the same as the ``llvm.llround`` intrinsic and the ``llround``
-libm functions.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``llround`` functions
-would and handles error conditions in the same way.
-
-
-'``llvm.experimental.constrained.trunc``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-
-::
-
-      declare <type>
-      @llvm.experimental.constrained.trunc(<type> <op1>,
-                                           metadata <exception behavior>)
-
-Overview:
-"""""""""
-
-The '``llvm.experimental.constrained.trunc``' intrinsic returns the first
-argument rounded to the nearest integer not larger in magnitude than the
-argument.
-
-Arguments:
-""""""""""
-
-The first argument and the return value are floating-point numbers of the same
-type.
-
-The second argument specifies the exception behavior as described above.
-
-Semantics:
-""""""""""
-
-This function returns the same values as the libm ``trunc`` functions
-would and handles error conditions in the same way.
+Alias Scope Intrinsics
+----------------------
 
 .. _int_experimental_noalias_scope_decl:
 
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 8ccd186e5207d..72d8503e6b1b4 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -68,6 +68,9 @@ Changes to the LLVM IR
 
 * The `"nooutline"` attribute is now writen as `nooutline`. Existing IR and
   bitcode will be automatically updated.
+* Updated semantics of `llvm.type.checked.load.relative` to match that of
+  `llvm.load.relative`.
+* Floating-point operand bundles have been added.
 
 Changes to LLVM infrastructure
 ------------------------------
diff --git a/llvm/include/llvm/ADT/FloatingPointMode.h b/llvm/include/llvm/ADT/FloatingPointMode.h
index da1fd22d85e0b..877d3fc9c6ae6 100644
--- a/llvm/include/llvm/ADT/FloatingPointMode.h
+++ b/llvm/include/llvm/ADT/FloatingPointMode.h
@@ -412,6 +412,39 @@ LLVM_ABI bool cannotOrderStrictlyLess(FPClassTest LHS, FPClassTest RHS,
 LLVM_ABI bool cannotOrderStrictlyLessEq(FPClassTest LHS, FPClassTest RHS,
                                         bool OrderedZeroSign = false);
 
+/// If the specified string represents denormal mode as used in operand bundles,
+/// returns the corresponding mode.
+inline std::optional<DenormalMode::DenormalModeKind>
+parseDenormalKindFromOperandBundle(StringRef Str) {
+  if (Str == "ieee")
+    return DenormalMode::IEEE;
+  if (Str == "zero")
+    return DenormalMode::PreserveSign;
+  if (Str == "pzero")
+    return DenormalMode::PositiveZero;
+  if (Str == "dyn")
+    return DenormalMode::Dynamic;
+  return std::nullopt;
+}
+
+/// Converts the specified denormal mode into string suitable for use in an
+/// operand bundle.
+inline std::optional<StringRef>
+printDenormalForOperandBundle(DenormalMode::DenormalModeKind Mode) {
+  switch (Mode) {
+  case DenormalMode::IEEE:
+    return "ieee";
+  case DenormalMode::PreserveSign:
+    return "zero";
+  case DenormalMode::PositiveZero:
+    return "pzero";
+  case DenormalMode::Dynamic:
+    return "dyn";
+  default:
+    return std::nullopt;
+  }
+}
+
 } // namespace llvm
 
 #endif // LLVM_ADT_FLOATINGPOINTMODE_H
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 7dbd8bc658161..e156b0431fef3 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2473,9 +2473,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     case Intrinsic::fmuladd:
       ISD = ISD::FMA;
       break;
-    case Intrinsic::experimental_constrained_fmuladd:
-      ISD = ISD::STRICT_FMA;
-      break;
     // FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.
     case Intrinsic::lifetime_start:
     case Intrinsic::lifetime_end:
@@ -2814,14 +2811,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
              thisT()->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy,
                                              CostKind);
     }
-    case Intrinsic::experimental_constrained_fmuladd: {
-      IntrinsicCostAttributes FMulAttrs(
-        Intrinsic::experimental_constrained_fmul, RetTy, Tys);
-      IntrinsicCostAttributes FAddAttrs(
-        Intrinsic::experimental_constrained_fadd, RetTy, Tys);
-      return thisT()->getIntrinsicInstrCost(FMulAttrs, CostKind) +
-             thisT()->getIntrinsicInstrCost(FAddAttrs, CostKind);
-    }
     case Intrinsic::smin:
     case Intrinsic::smax:
     case Intrinsic::umin:
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
index 7815ad686cbaa..1c79aff55c5ba 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h
@@ -39,7 +39,6 @@ class BasicBlock;
 class CallInst;
 class CallLowering;
 class Constant;
-class ConstrainedFPIntrinsic;
 class DataLayout;
 class DbgDeclareInst;
 class DbgValueInst;
@@ -268,9 +267,6 @@ class IRTranslator : public MachineFunctionPass {
   bool translateSimpleIntrinsic(const CallInst &CI, Intrinsic::ID ID,
                                 MachineIRBuilder &MIRBuilder);
 
-  bool translateConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI,
-                                       MachineIRBuilder &MIRBuilder);
-
   bool translateKnownIntrinsic(const CallInst &CI, Intrinsic::ID ID,
                                MachineIRBuilder &MIRBuilder);
 
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 8a8a9ee71ca02..9b92e09e2808c 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1671,6 +1671,11 @@ LLVM_ABI bool isVPBinaryOp(unsigned Opcode);
 /// Whether this is a vector-predicated reduction opcode.
 LLVM_ABI bool isVPReduction(unsigned Opcode);
 
+/// Returns true if \p Opcode is a floating-point operation — i.e., a node that
+/// computes an FP result, converts between FP precisions, compares FP values,
+/// or otherwise performs floating-point arithmetic.
+LLVM_ABI bool isFPOpcode(unsigned Opcode);
+
 /// The operand position of the vector mask.
 LLVM_ABI std::optional<unsigned> getVPMaskIdx(unsigned Opcode);
 
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 30a82bf633d57..fa6df284ff21c 100644
--- a/llvm/include/llvm/IR/ConstrainedOps.def
+++ b/llvm/include/llvm/IR/ConstrainedOps.def
@@ -1,4 +1,4 @@
-//===- llvm/IR/ConstrainedOps.def - Constrained intrinsics ------*- C++ -*-===//
+//===- llvm/IR/ConstrainedOps.def - Strict FP SDNode list ------*- C++ -*-===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -6,12 +6,17 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// Defines properties of constrained intrinsics, in particular corresponding
-// floating point operations and DAG nodes.
+// Defines the set of STRICT_* SelectionDAG nodes corresponding to floating-
+// point operations, used to generate switch-case tables.
+//
+// DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)
+//   Expands to case ISD::STRICT_##DAGN when DAG_INSTRUCTION is defined.
+//
+// CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)
+//   Like DAG_INSTRUCTION but for comparisons (EqOpc = ISD::SETCC).
 //
 //===----------------------------------------------------------------------===//
 
-// DAG_FUNCTION defers to DAG_INSTRUCTION if its defined, otherwise FUNCTION.
 #ifndef DAG_FUNCTION
 #ifdef DAG_INSTRUCTION
 #define DAG_FUNCTION(N,A,R,I,D) DAG_INSTRUCTION(N,A,R,I,D)
@@ -24,90 +29,68 @@
 #define INSTRUCTION(N,A,R,I)
 #endif
 
-// DAG_INSTRUCTION is treated like an INSTRUCTION if the DAG node isn't used.
 #ifndef DAG_INSTRUCTION
 #define DAG_INSTRUCTION(N,A,R,I,D) INSTRUCTION(N,A,R,I)
 #endif
 
-// In most cases intrinsic function is handled similar to instruction.
 #ifndef FUNCTION
 #define FUNCTION(N,A,R,I) INSTRUCTION(N,A,R,I)
 #endif
 
-// Compare instruction have a DAG node so they are treated like DAG_INSTRUCTION.
 #ifndef CMP_INSTRUCTION
 #define CMP_INSTRUCTION(N,A,R,I,D) DAG_INSTRUCTION(N,A,R,I,D)
 #endif
 
-// Arguments of the entries are:
-// - instruction or intrinsic function name.
-// - Number of original instruction/intrinsic arguments.
-// - 1 if the corresponding constrained intrinsic has rounding mode argument.
-// - name of the constrained intrinsic to represent this instruction/function.
-// - DAG node corresponding to the constrained intrinsic without prefix STRICT_.
-
-// These are definitions for instructions, that are converted into constrained
-// intrinsics.
-//
-DAG_INSTRUCTION(FAdd,         2, 1, experimental_constrained_fadd,       FADD)
-DAG_INSTRUCTION(FSub,         2, 1, experimental_constrained_fsub,       FSUB)
-DAG_INSTRUCTION(FMul,         2, 1, experimental_constrained_fmul,       FMUL)
-DAG_INSTRUCTION(FDiv,         2, 1, experimental_constrained_fdiv,       FDIV)
-DAG_INSTRUCTION(FRem,         2, 1, experimental_constrained_frem,       FREM)
-DAG_INSTRUCTION(FPExt,        1, 0, experimental_constrained_fpext,      FP_EXTEND)
-DAG_INSTRUCTION(SIToFP,       1, 1, experimental_constrained_sitofp,     SINT_TO_FP)
-DAG_INSTRUCTION(UIToFP,       1, 1, experimental_constrained_uitofp,     UINT_TO_FP)
-DAG_INSTRUCTION(FPToSI,       1, 0, experimental_constrained_fptosi,     FP_TO_SINT)
-DAG_INSTRUCTION(FPToUI,       1, 0, experimental_constrained_fptoui,     FP_TO_UINT)
-DAG_INSTRUCTION(FPTrunc,      1, 1, experimental_constrained_fptrunc,    FP_ROUND)
-
-// These are definitions for compare instructions (signaling and quiet version).
-// Both of these match to FCmp / SETCC.
-CMP_INSTRUCTION(FCmp,         2, 0, experimental_constrained_fcmp,       FSETCC)
-CMP_INSTRUCTION(FCmp,         2, 0, experimental_constrained_fcmps,      FSETCCS)
+DAG_INSTRUCTION(FAdd,         2, 1, fadd,            FADD)
+DAG_INSTRUCTION(FSub,         2, 1, fsub,            FSUB)
+DAG_INSTRUCTION(FMul,         2, 1, fmul,            FMUL)
+DAG_INSTRUCTION(FDiv,         2, 1, fdiv,            FDIV)
+DAG_INSTRUCTION(FRem,         2, 1, frem,            FREM)
+DAG_INSTRUCTION(FPExt,        1, 0, fpext,           FP_EXTEND)
+DAG_INSTRUCTION(SIToFP,       1, 1, sitofp,          SINT_TO_FP)
+DAG_INSTRUCTION(UIToFP,       1, 1, uitofp,          UINT_TO_FP)
+DAG_INSTRUCTION(FPToSI,       1, 0, fptosi,          FP_TO_SINT)
+DAG_INSTRUCTION(FPToUI,       1, 0, fptoui,          FP_TO_UINT)
+DAG_INSTRUCTION(FPTrunc,      1, 1, fptrunc,         FP_ROUND)
 
-// Theses are definitions for intrinsic functions, that are converted into
-// constrained intrinsics.
-//
-DAG_FUNCTION(acos,            1, 1, experimental_constrained_acos,       FACOS)
-DAG_FUNCTION(asin,            1, 1, experimental_constrained_asin,       FASIN)
-DAG_FUNCTION(atan,            1, 1, experimental_constrained_atan,       FATAN)
-DAG_FUNCTION(atan2,           2, 1, experimental_constrained_atan2,      FATAN2)
-DAG_FUNCTION(ceil,            1, 0, experimental_constrained_ceil,       FCEIL)
-DAG_FUNCTION(cos,             1, 1, experimental_constrained_cos,        FCOS)
-DAG_FUNCTION(cosh,            1, 1, experimental_constrained_cosh,       FCOSH)
-DAG_FUNCTION(exp,             1, 1, experimental_constrained_exp,        FEXP)
-DAG_FUNCTION(exp2,            1, 1, experimental_constrained_exp2,       FEXP2)
-DAG_FUNCTION(floor,           1, 0, experimental_constrained_floor,      FFLOOR)
-DAG_FUNCTION(fma,             3, 1, experimental_constrained_fma,        FMA)
-DAG_FUNCTION(log,             1, 1, experimental_constrained_log,        FLOG)
-DAG_FUNCTION(log10,           1, 1, experimental_constrained_log10,      FLOG10)
-DAG_FUNCTION(log2,            1, 1, experimental_constrained_log2,       FLOG2)
-DAG_FUNCTION(lrint,           1, 1, experimental_constrained_lrint,      LRINT)
-DAG_FUNCTION(llrint,          1, 1, experimental_constrained_llrint,     LLRINT)
-DAG_FUNCTION(lround,          1, 0, experimental_constrained_lround,     LROUND)
-DAG_FUNCTION(llround,         1, 0, experimental_constrained_llround,    LLROUND)
-DAG_FUNCTION(maxnum,          2, 0, experimental_constrained_maxnum,     FMAXNUM)
-DAG_FUNCTION(minnum,          2, 0, experimental_constrained_minnum,     FMINNUM)
-DAG_FUNCTION(maximum,         2, 0, experimental_constrained_maximum,    FMAXIMUM)
-DAG_FUNCTION(minimum,         2, 0, experimental_constrained_minimum,    FMINIMUM)
-DAG_FUNCTION(nearbyint,       1, 1, experimental_constrained_nearbyint,  FNEARBYINT)
-DAG_FUNCTION(pow,             2, 1, experimental_constrained_pow,        FPOW)
-DAG_FUNCTION(powi,            2, 1, experimental_constrained_powi,       FPOWI)
-DAG_FUNCTION(ldexp,           2, 1, experimental_constrained_ldexp,      FLDEXP)
-DAG_FUNCTION(rint,            1, 1, experimental_constrained_rint,       FRINT)
-DAG_FUNCTION(round,           1, 0, experimental_constrained_round,      FROUND)
-DAG_FUNCTION(roundeven,       1, 0, experimental_constrained_roundeven,  FROUNDEVEN)
-DAG_FUNCTION(sin,             1, 1, experimental_constrained_sin,        FSIN)
-DAG_FUNCTION(sinh,            1, 1, experimental_constrained_sinh,       FSINH)
-DAG_FUNCTION(sqrt,            1, 1, experimental_constrained_sqrt,       FSQRT)
-DAG_FUNCTION(tan,             1, 1, experimental_constrained_tan,        FTAN)
-DAG_FUNCTION(tanh,            1, 1, experimental_constrained_tanh,       FTANH)
-DAG_FUNCTION(trunc,           1, 0, experimental_constrained_trunc,      FTRUNC)
+CMP_INSTRUCTION(FCmp,         2, 0, fcmp,            FSETCC)
+CMP_INSTRUCTION(FCmps,        2, 0, fcmps,           FSETCCS)
 
-// This is definition for fmuladd intrinsic function, that is converted into
-// constrained FMA or FMUL + FADD intrinsics.
-FUNCTION(fmuladd,         3, 1, experimental_constrained_fmuladd)
+DAG_FUNCTION(acos,            1, 1, acos,            FACOS)
+DAG_FUNCTION(asin,            1, 1, asin,            FASIN)
+DAG_FUNCTION(atan,            1, 1, atan,            FATAN)
+DAG_FUNCTION(atan2,           2, 1, atan2,           FATAN2)
+DAG_FUNCTION(ceil,            1, 0, ceil,            FCEIL)
+DAG_FUNCTION(cos,             1, 1, cos,             FCOS)
+DAG_FUNCTION(cosh,            1, 1, cosh,            FCOSH)
+DAG_FUNCTION(exp,             1, 1, exp,             FEXP)
+DAG_FUNCTION(exp2,            1, 1, exp2,            FEXP2)
+DAG_FUNCTION(floor,           1, 0, floor,           FFLOOR)
+DAG_FUNCTION(fma,             3, 1, fma,             FMA)
+DAG_FUNCTION(log,             1, 1, log,             FLOG)
+DAG_FUNCTION(log10,           1, 1, log10,           FLOG10)
+DAG_FUNCTION(log2,            1, 1, log2,            FLOG2)
+DAG_FUNCTION(lrint,           1, 1, lrint,           LRINT)
+DAG_FUNCTION(llrint,          1, 1, llrint,          LLRINT)
+DAG_FUNCTION(lround,          1, 0, lround,          LROUND)
+DAG_FUNCTION(llround,         1, 0, llround,         LLROUND)
+DAG_FUNCTION(maxnum,          2, 0, maxnum,          FMAXNUM)
+DAG_FUNCTION(minnum,          2, 0, minnum,          FMINNUM)
+DAG_FUNCTION(maximum,         2, 0, maximum,         FMAXIMUM)
+DAG_FUNCTION(minimum,         2, 0, minimum,         FMINIMUM)
+DAG_FUNCTION(nearbyint,       1, 1, nearbyint,       FNEARBYINT)
+DAG_FUNCTION(pow,             2, 1, pow,             FPOW)
+DAG_FUNCTION(powi,            2, 1, powi,            FPOWI)
+DAG_FUNCTION(ldexp,           2, 1, ldexp,           FLDEXP)
+DAG_FUNCTION(rint,            1, 1, rint,            FRINT)
+DAG_FUNCTION(round,           1, 0, round,           FROUND)
+DAG_FUNCTION(roundeven,       1, 0, roundeven,       FROUNDEVEN)
+DAG_FUNCTION(sin,             1, 1, sin,             FSIN)
+DAG_FUNCTION(sinh,            1, 1, sinh,            FSINH)
+DAG_FUNCTION(sqrt,            1, 1, sqrt,            FSQRT)
+DAG_FUNCTION(tan,             1, 1, tan,             FTAN)
+DAG_FUNCTION(tanh,            1, 1, tanh,            FTANH)
+DAG_FUNCTION(trunc,           1, 0, trunc,           FTRUNC)
 
 #undef INSTRUCTION
 #undef FUNCTION
diff --git a/llvm/include/llvm/IR/FPEnv.h b/llvm/include/llvm/IR/FPEnv.h
index 38395b15c8c09..864fdb424ea40 100644
--- a/llvm/include/llvm/IR/FPEnv.h
+++ b/llvm/include/llvm/IR/FPEnv.h
@@ -49,32 +49,44 @@ enum ExceptionBehavior : uint8_t {
 /// metadata.
 LLVM_ABI std::optional<RoundingMode> convertStrToRoundingMode(StringRef);
 
+/// Returns a valid RoundingMode enumerator given a string that is used as
+/// rounding mode specifier in operand bundles.
+std::optional<RoundingMode> convertBundleToRoundingMode(StringRef);
+
 /// For any RoundingMode enumerator, returns a string valid as input in
 /// constrained intrinsic rounding mode metadata.
 LLVM_ABI std::optional<StringRef> convertRoundingModeToStr(RoundingMode);
 
+/// For any RoundingMode enumerator, returns a string to be used in operand
+/// bundles.
+std::optional<StringRef> convertRoundingModeToBundle(RoundingMode);
+
 /// Returns a valid ExceptionBehavior enumerator when given a string
 /// valid as input in constrained intrinsic exception behavior metadata.
 LLVM_ABI std::optional<fp::ExceptionBehavior>
     convertStrToExceptionBehavior(StringRef);
 
+/// Returns a valid ExceptionBehavior enumerator given a string from the operand
+/// bundle argument.
+std::optional<fp::ExceptionBehavior>
+    convertBundleToExceptionBehavior(StringRef);
+
 /// For any ExceptionBehavior enumerator, returns a string valid as
 /// input in constrained intrinsic exception behavior metadata.
 LLVM_ABI std::optional<StringRef>
     convertExceptionBehaviorToStr(fp::ExceptionBehavior);
 
+/// Return string representing the given exception behavior for use in operand
+/// bundles
+std::optional<StringRef>
+    convertExceptionBehaviorToBundle(fp::ExceptionBehavior);
+
 /// Returns true if the exception handling behavior and rounding mode
 /// match what is used in the default floating point environment.
 inline bool isDefaultFPEnvironment(fp::ExceptionBehavior EB, RoundingMode RM) {
   return EB == fp::ebIgnore && RM == RoundingMode::NearestTiesToEven;
 }
 
-/// Returns constrained intrinsic id to represent the given instruction in
-/// strictfp function. If the instruction is already a constrained intrinsic or
-/// does not have a constrained intrinsic counterpart, the function returns
-/// zero.
-LLVM_ABI Intrinsic::ID getConstrainedIntrinsicID(const Instruction &Instr);
-
 /// Returns true if the rounding mode RM may be QRM at compile time or
 /// at run time.
 inline bool canRoundingModeBe(RoundingMode RM, RoundingMode QRM) {
diff --git a/llvm/include/llvm/IR/FloatingPointOps.def b/llvm/include/llvm/IR/FloatingPointOps.def
new file mode 100644
index 0000000000000..017c5266413fb
--- /dev/null
+++ b/llvm/include/llvm/IR/FloatingPointOps.def
@@ -0,0 +1,121 @@
+//===- llvm/IR/FloatingPointOps.def - FP intrinsics -------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Defines the set of intrinsics classified as floating-point operations.
+//
+// Two macro forms:
+//
+//   FP_OP(intrinsic_name, isd_opcode_name)
+//     - An FP intrinsic that lowers to a single ISD opcode.
+//     - isFloatingPointOperation() uses column 1 (Intrinsic::NAME).
+//     - ISD::isFPOpcode() uses column 2 (ISD::ISD_NAME).
+//
+//   FP_INTRINSIC(intrinsic_name)
+//     - An FP intrinsic whose ISD opcode is already covered by another FP_OP
+//       entry, or maps to a shared opcode (e.g. SETCC) not exclusively FP.
+//     - isFloatingPointOperation() uses it; ISD::isFPOpcode() does not.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FP_OP
+#define FP_OP(N, D)
+#endif
+
+#ifndef FP_INTRINSIC
+#define FP_INTRINSIC(N)
+#endif
+
+// Basic FP arithmetic
+FP_OP(fadd,             FADD)
+FP_OP(fsub,             FSUB)
+FP_OP(fmul,             FMUL)
+FP_OP(fdiv,             FDIV)
+FP_OP(frem,             FREM)
+FP_OP(fneg,             FNEG)
+FP_OP(fabs,             FABS)
+
+// FMA family
+FP_OP(fma,              FMA)
+FP_OP(fmuladd,          FMULADD)
+
+// FP type conversions (float <-> float)
+FP_OP(fptrunc,          FP_ROUND)
+FP_OP(fpext,            FP_EXTEND)
+FP_INTRINSIC(fptrunc_round)  // also lowers to FP_ROUND; ISD already covered above
+
+// Integer/FP conversions
+FP_OP(sitofp,           SINT_TO_FP)
+FP_OP(uitofp,           UINT_TO_FP)
+FP_OP(fptosi,           FP_TO_SINT)
+FP_OP(fptoui,           FP_TO_UINT)
+
+// FP comparison (SETCC is shared with integer; kept as FP_INTRINSIC)
+FP_INTRINSIC(fcmp)
+
+// Math / transcendental
+FP_OP(sqrt,             FSQRT)
+// cbrt: no llvm.cbrt intrinsic; FCBRT is ISD-only (handled in isFPOpcode)
+FP_OP(sin,              FSIN)
+FP_OP(cos,              FCOS)
+FP_OP(tan,              FTAN)
+FP_OP(asin,             FASIN)
+FP_OP(acos,             FACOS)
+FP_OP(atan,             FATAN)
+FP_OP(atan2,            FATAN2)
+FP_OP(sinh,             FSINH)
+FP_OP(cosh,             FCOSH)
+FP_OP(tanh,             FTANH)
+FP_OP(pow,              FPOW)
+FP_OP(powi,             FPOWI)
+FP_OP(ldexp,            FLDEXP)
+FP_OP(frexp,            FFREXP)
+FP_OP(log,              FLOG)
+FP_OP(log2,             FLOG2)
+FP_OP(log10,            FLOG10)
+FP_OP(exp,              FEXP)
+FP_OP(exp2,             FEXP2)
+FP_OP(exp10,            FEXP10)
+
+// FP rounding
+FP_OP(ceil,             FCEIL)
+FP_OP(trunc,            FTRUNC)
+FP_OP(rint,             FRINT)
+FP_OP(nearbyint,        FNEARBYINT)
+FP_OP(round,            FROUND)
+FP_OP(roundeven,        FROUNDEVEN)
+FP_OP(floor,            FFLOOR)
+FP_OP(lround,           LROUND)
+FP_OP(llround,          LLROUND)
+FP_OP(lrint,            LRINT)
+FP_OP(llrint,           LLRINT)
+
+// FP min/max
+FP_OP(minnum,           FMINNUM)
+FP_OP(maxnum,           FMAXNUM)
+FP_OP(minimum,          FMINIMUM)
+FP_OP(maximum,          FMAXIMUM)
+FP_OP(minimumnum,       FMINIMUMNUM)
+FP_OP(maximumnum,       FMAXIMUMNUM)
+
+// FP special / decomposition
+FP_OP(copysign,         FCOPYSIGN)
+FP_OP(canonicalize,     FCANONICALIZE)
+FP_OP(is_fpclass,       IS_FPCLASS)
+FP_OP(sincos,           FSINCOS)
+FP_OP(modf,             FMODF)
+
+// Vector FP reductions
+FP_OP(vector_reduce_fadd,    VECREDUCE_FADD)
+FP_OP(vector_reduce_fmul,    VECREDUCE_FMUL)
+FP_OP(vector_reduce_fmax,    VECREDUCE_FMAX)
+FP_OP(vector_reduce_fmin,    VECREDUCE_FMIN)
+FP_OP(vector_reduce_fmaximum, VECREDUCE_FMAXIMUM)
+FP_OP(vector_reduce_fminimum, VECREDUCE_FMINIMUM)
+
+#undef FP_OP
+#undef FP_INTRINSIC
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index bdec42516b8cd..1c31bf60fb7bb 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -1028,6 +1028,16 @@ class IRBuilderBase {
                                      FMFSource FMFSource = {},
                                      const Twine &Name = "");
 
+  /// Create a call to intrinsic \p ID with \p Args, mangled using \p Types and
+  /// with operand bundles.
+  /// If \p FMFSource is provided, copy fast-math-flags from that instruction to
+  /// the intrinsic.
+  CallInst *CreateIntrinsic(Intrinsic::ID ID, ArrayRef<Type *> Types,
+                            ArrayRef<Value *> Args,
+                            ArrayRef<OperandBundleDef> OpBundles,
+                            Instruction *FMFSource = nullptr,
+                            const Twine &Name = "");
+
   /// Create a call to non-overloaded intrinsic \p ID with \p Args. If
   /// \p FMFSource is provided, copy fast-math-flags from that instruction to
   /// the intrinsic.
@@ -1039,24 +1049,12 @@ class IRBuilderBase {
   /// Create call to the minnum intrinsic.
   Value *CreateMinNum(Value *LHS, Value *RHS, FMFSource FMFSource = {},
                       const Twine &Name = "") {
-    if (IsFPConstrained) {
-      return CreateConstrainedFPUnroundedBinOp(
-          Intrinsic::experimental_constrained_minnum, LHS, RHS, FMFSource,
-          Name);
-    }
-
     return CreateBinaryIntrinsic(Intrinsic::minnum, LHS, RHS, FMFSource, Name);
   }
 
   /// Create call to the maxnum intrinsic.
   Value *CreateMaxNum(Value *LHS, Value *RHS, FMFSource FMFSource = {},
                       const Twine &Name = "") {
-    if (IsFPConstrained) {
-      return CreateConstrainedFPUnroundedBinOp(
-          Intrinsic::experimental_constrained_maxnum, LHS, RHS, FMFSource,
-          Name);
-    }
-
     return CreateBinaryIntrinsic(Intrinsic::maxnum, LHS, RHS, FMFSource, Name);
   }
 
@@ -1092,7 +1090,6 @@ class IRBuilderBase {
   /// Create call to the ldexp intrinsic.
   Value *CreateLdexp(Value *Src, Value *Exp, FMFSource FMFSource = {},
                      const Twine &Name = "") {
-    assert(!IsFPConstrained && "TODO: Support strictfp");
     return CreateIntrinsic(Intrinsic::ldexp, {Src->getType(), Exp->getType()},
                            {Src, Exp}, FMFSource, Name);
   }
@@ -1100,12 +1097,6 @@ class IRBuilderBase {
   /// Create call to the fma intrinsic.
   Value *CreateFMA(Value *Factor1, Value *Factor2, Value *Summand,
                    FMFSource FMFSource = {}, const Twine &Name = "") {
-    if (IsFPConstrained) {
-      return CreateConstrainedFPIntrinsic(
-          Intrinsic::experimental_constrained_fma, {Factor1->getType()},
-          {Factor1, Factor2, Summand}, FMFSource, Name);
-    }
-
     return CreateIntrinsic(Intrinsic::fma, {Factor1->getType()},
                            {Factor1, Factor2, Summand}, FMFSource, Name);
   }
@@ -1414,6 +1405,15 @@ class IRBuilderBase {
     return MetadataAsValue::get(Context, ExceptMDS);
   }
 
+  /// Return true if the current constrained rounding mode or exception
+  /// behavior differs from the defaults (Dynamic rounding, ebStrict exception).
+  /// When false, plain IR instructions inside a strictfp function are
+  /// semantically equivalent to the intrinsic form with no bundles.
+  bool hasNonDefaultFPConstraints() const {
+    return DefaultConstrainedRounding != RoundingMode::Dynamic ||
+           DefaultConstrainedExcept != fp::ebStrict;
+  }
+
   Value *getConstrainedFPPredicate(CmpInst::Predicate Predicate) {
     assert(CmpInst::isFPPredicate(Predicate) &&
            Predicate != CmpInst::FCMP_FALSE &&
@@ -1644,9 +1644,9 @@ class IRBuilderBase {
 
   Value *CreateFAddFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fadd,
-                                      L, R, FMFSource, Name, FPMD);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fadd, {L->getType()}, {L, R},
+                             FMFSource, Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FAdd, L, R, FMFSource.get(FMF)))
@@ -1663,9 +1663,9 @@ class IRBuilderBase {
 
   Value *CreateFSubFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fsub,
-                                      L, R, FMFSource, Name, FPMD);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fsub, {L->getType()}, {L, R},
+                             FMFSource, Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FSub, L, R, FMFSource.get(FMF)))
@@ -1682,9 +1682,9 @@ class IRBuilderBase {
 
   Value *CreateFMulFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fmul,
-                                      L, R, FMFSource, Name, FPMD);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fmul, {L->getType()}, {L, R},
+                             FMFSource, Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FMul, L, R, FMFSource.get(FMF)))
@@ -1701,9 +1701,9 @@ class IRBuilderBase {
 
   Value *CreateFDivFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_fdiv,
-                                      L, R, FMFSource, Name, FPMD);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fdiv, {L->getType()}, {L, R},
+                             FMFSource, Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FDiv, L, R, FMFSource.get(FMF)))
@@ -1720,9 +1720,9 @@ class IRBuilderBase {
 
   Value *CreateFRemFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPBinOp(Intrinsic::experimental_constrained_frem,
-                                      L, R, FMFSource, Name, FPMD);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::frem, {L->getType()}, {L, R},
+                             FMFSource, Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FRem, L, R, FMFSource.get(FMF)))
@@ -1787,28 +1787,6 @@ class IRBuilderBase {
     return Accum;
   }
 
-  /// This function is like @ref CreateIntrinsic for constrained fp
-  /// intrinsics. It sets the rounding mode and exception behavior of
-  /// the created intrinsic call according to \p Rounding and \p
-  /// Except and it sets \p FPMathTag as the 'fpmath' metadata, using
-  /// defaults if a value equals nullopt/null.
-  LLVM_ABI CallInst *CreateConstrainedFPIntrinsic(
-      Intrinsic::ID ID, ArrayRef<Type *> Types, ArrayRef<Value *> Args,
-      FMFSource FMFSource, const Twine &Name, MDNode *FPMathTag = nullptr,
-      std::optional<RoundingMode> Rounding = std::nullopt,
-      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
-
-  LLVM_ABI CallInst *CreateConstrainedFPBinOp(
-      Intrinsic::ID ID, Value *L, Value *R, FMFSource FMFSource = {},
-      const Twine &Name = "", MDNode *FPMathTag = nullptr,
-      std::optional<RoundingMode> Rounding = std::nullopt,
-      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
-
-  LLVM_ABI CallInst *CreateConstrainedFPUnroundedBinOp(
-      Intrinsic::ID ID, Value *L, Value *R, FMFSource FMFSource = {},
-      const Twine &Name = "", MDNode *FPMathTag = nullptr,
-      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
-
   Value *CreateNeg(Value *V, const Twine &Name = "", bool HasNSW = false) {
     return CreateSub(Constant::getNullValue(V->getType()), V, Name,
                      /*HasNUW=*/0, HasNSW);
@@ -2122,24 +2100,24 @@ class IRBuilderBase {
   }
 
   Value *CreateFPToUI(Value *V, Type *DestTy, const Twine &Name = "") {
-    if (IsFPConstrained)
-      return CreateConstrainedFPCast(Intrinsic::experimental_constrained_fptoui,
-                                     V, DestTy, nullptr, Name);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fptoui, {DestTy, V->getType()}, {V},
+                             {}, Name);
     return CreateCast(Instruction::FPToUI, V, DestTy, Name);
   }
 
   Value *CreateFPToSI(Value *V, Type *DestTy, const Twine &Name = "") {
-    if (IsFPConstrained)
-      return CreateConstrainedFPCast(Intrinsic::experimental_constrained_fptosi,
-                                     V, DestTy, nullptr, Name);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fptosi, {DestTy, V->getType()}, {V},
+                             {}, Name);
     return CreateCast(Instruction::FPToSI, V, DestTy, Name);
   }
 
   Value *CreateUIToFP(Value *V, Type *DestTy, const Twine &Name = "",
                       bool IsNonNeg = false) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPCast(Intrinsic::experimental_constrained_uitofp,
-                                     V, DestTy, nullptr, Name);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::uitofp, {DestTy, V->getType()}, {V},
+                             {}, Name);
     if (Value *Folded = Folder.FoldCast(Instruction::UIToFP, V, DestTy))
       return Folded;
     Instruction *I = Insert(new UIToFPInst(V, DestTy), Name);
@@ -2149,9 +2127,9 @@ class IRBuilderBase {
   }
 
   Value *CreateSIToFP(Value *V, Type *DestTy, const Twine &Name = ""){
-    if (IsFPConstrained)
-      return CreateConstrainedFPCast(Intrinsic::experimental_constrained_sitofp,
-                                     V, DestTy, nullptr, Name);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::sitofp, {DestTy, V->getType()}, {V},
+                             {}, Name);
     return CreateCast(Instruction::SIToFP, V, DestTy, Name);
   }
 
@@ -2162,10 +2140,9 @@ class IRBuilderBase {
 
   Value *CreateFPTruncFMF(Value *V, Type *DestTy, FMFSource FMFSource,
                           const Twine &Name = "", MDNode *FPMathTag = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPCast(
-          Intrinsic::experimental_constrained_fptrunc, V, DestTy, FMFSource,
-          Name, FPMathTag);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fptrunc, {DestTy, V->getType()}, {V},
+                             FMFSource, Name);
     return CreateCast(Instruction::FPTrunc, V, DestTy, Name, FPMathTag,
                       FMFSource);
   }
@@ -2177,9 +2154,9 @@ class IRBuilderBase {
 
   Value *CreateFPExtFMF(Value *V, Type *DestTy, FMFSource FMFSource,
                         const Twine &Name = "", MDNode *FPMathTag = nullptr) {
-    if (IsFPConstrained)
-      return CreateConstrainedFPCast(Intrinsic::experimental_constrained_fpext,
-                                     V, DestTy, FMFSource, Name, FPMathTag);
+    if (IsFPConstrained && hasNonDefaultFPConstraints())
+      return CreateIntrinsic(Intrinsic::fpext, {DestTy, V->getType()}, {V},
+                             FMFSource, Name);
     return CreateCast(Instruction::FPExt, V, DestTy, Name, FPMathTag,
                       FMFSource);
   }
@@ -2300,12 +2277,6 @@ class IRBuilderBase {
     return CreateCast(CastOp, V, DestTy, Name, FPMathTag);
   }
 
-  LLVM_ABI CallInst *CreateConstrainedFPCast(
-      Intrinsic::ID ID, Value *V, Type *DestTy, FMFSource FMFSource = {},
-      const Twine &Name = "", MDNode *FPMathTag = nullptr,
-      std::optional<RoundingMode> Rounding = std::nullopt,
-      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
-
   // Provided to resolve 'CreateIntCast(Ptr, Ptr, "...")', giving a
   // compile time error, instead of converting the string to bool for the
   // isSigned parameter.
@@ -2485,11 +2456,6 @@ class IRBuilderBase {
                                    FMFSource FMFSource, bool IsSignaling);
 
 public:
-  LLVM_ABI CallInst *CreateConstrainedFPCmp(
-      Intrinsic::ID ID, CmpInst::Predicate P, Value *L, Value *R,
-      const Twine &Name = "",
-      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
-
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Other Instructions
   //===--------------------------------------------------------------------===//
@@ -2511,24 +2477,13 @@ class IRBuilderBase {
   CallInst *CreateCall(FunctionType *FTy, Value *Callee,
                        ArrayRef<Value *> Args = {}, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
-    CallInst *CI = CallInst::Create(FTy, Callee, Args, DefaultOperandBundles);
-    if (IsFPConstrained)
-      setConstrainedFPCallAttr(CI);
-    if (isa<FPMathOperator>(CI))
-      setFPAttrs(CI, FPMathTag, FMF);
-    return Insert(CI, Name);
+    return CreateCall(FTy, Callee, Args, DefaultOperandBundles, Name,
+                      FPMathTag);
   }
 
   CallInst *CreateCall(FunctionType *FTy, Value *Callee, ArrayRef<Value *> Args,
                        ArrayRef<OperandBundleDef> OpBundles,
-                       const Twine &Name = "", MDNode *FPMathTag = nullptr) {
-    CallInst *CI = CallInst::Create(FTy, Callee, Args, OpBundles);
-    if (IsFPConstrained)
-      setConstrainedFPCallAttr(CI);
-    if (isa<FPMathOperator>(CI))
-      setFPAttrs(CI, FPMathTag, FMF);
-    return Insert(CI, Name);
-  }
+                       const Twine &Name = "", MDNode *FPMathTag = nullptr);
 
   CallInst *CreateCall(FunctionCallee Callee, ArrayRef<Value *> Args = {},
                        const Twine &Name = "", MDNode *FPMathTag = nullptr) {
@@ -2543,11 +2498,6 @@ class IRBuilderBase {
                       OpBundles, Name, FPMathTag);
   }
 
-  LLVM_ABI CallInst *CreateConstrainedFPCall(
-      Function *Callee, ArrayRef<Value *> Args, const Twine &Name = "",
-      std::optional<RoundingMode> Rounding = std::nullopt,
-      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
-
   LLVM_ABI Value *CreateSelectWithUnknownProfile(Value *C, Value *True,
                                                  Value *False,
                                                  StringRef PassName,
@@ -2791,6 +2741,24 @@ class IRBuilderBase {
   /// assumption on the provided pointer.
   LLVM_ABI CallInst *CreateDereferenceableAssumption(Value *PtrValue,
                                                      Value *SizeValue);
+
+  /// Create an operand bundle in the provided bundle set to represent given FP
+  /// rounding mode.
+  ///
+  /// If the rounding mode is not defined, adds the default rounding mode,
+  /// stored in this builder object.
+  void
+  createFPRoundingBundle(SmallVectorImpl<OperandBundleDef> &Bundles,
+                         std::optional<RoundingMode> Rounding = std::nullopt);
+
+  /// Create an operand bundle in the provided bundle set to represent FP
+  /// exception behavior.
+  ///
+  /// If the exception behavior is not defined, adds the default behavior,
+  /// stored in this builder object.
+  void createFPExceptionBundle(
+      SmallVectorImpl<OperandBundleDef> &Bundles,
+      std::optional<fp::ExceptionBehavior> Except = std::nullopt);
 };
 
 /// This provides a uniform API for creating instructions and inserting
diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h
index 61dc5ebef1b1d..394c507a1150a 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -25,6 +25,7 @@
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/FMF.h"
+#include "llvm/IR/FPEnv.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/LLVMContext.h"
@@ -1097,6 +1098,28 @@ template <typename InputTy> class OperandBundleDefT {
 using OperandBundleDef = OperandBundleDefT<Value *>;
 using ConstOperandBundleDef = OperandBundleDefT<const Value *>;
 
+std::optional<StringRef> getBundleOperandByPrefix(OperandBundleUse Bundle,
+                                                  StringRef Prefix);
+void addOperandToBundleTag(LLVMContext &Ctx,
+                           SmallVectorImpl<OperandBundleDef> &Bundles,
+                           StringRef Tag, size_t PrefixSize, StringRef Val);
+
+void addFPRoundingBundle(LLVMContext &Ctx,
+                         SmallVectorImpl<OperandBundleDef> &Bundles,
+                         RoundingMode Rounding);
+void addFPExceptionBundle(LLVMContext &Ctx,
+                          SmallVectorImpl<OperandBundleDef> &Bundles,
+                          fp::ExceptionBehavior Except);
+std::optional<DenormalMode::DenormalModeKind>
+getDenormModeBundle(const OperandBundleUse &Control, bool Unput,
+                    const fltSemantics *FPSem);
+void addInputDenormBundle(LLVMContext &Ctx,
+                          SmallVectorImpl<OperandBundleDef> &Bundles,
+                          DenormalMode::DenormalModeKind Mode);
+void addOutputDenormBundle(LLVMContext &Ctx,
+                           SmallVectorImpl<OperandBundleDef> &Bundles,
+                           DenormalMode::DenormalModeKind Mode);
+
 //===----------------------------------------------------------------------===//
 //                               CallBase Class
 //===----------------------------------------------------------------------===//
@@ -1156,6 +1179,8 @@ class CallBase : public Instruction {
   /// number of extra operands.
   LLVM_ABI unsigned getNumSubclassExtraOperandsDynamic() const;
 
+  MemoryEffects getFloatingPointMemoryEffects() const;
+
 public:
   using Instruction::getContext;
 
@@ -2166,6 +2191,28 @@ class CallBase : public Instruction {
     return false;
   }
 
+  /// Return rounding mode specified for this call.
+  RoundingMode getRoundingMode() const;
+
+  /// Return exception behavior specified for this call.
+  fp::ExceptionBehavior getExceptionBehavior() const;
+
+  /// Return input denormal mode specified by operand bundles.
+  std::optional<DenormalMode::DenormalModeKind>
+  getInputDenormMode(const fltSemantics *FPSem = nullptr) const;
+
+  /// Return output denormal mode specified by operand bundles.
+  std::optional<DenormalMode::DenormalModeKind>
+  getOutputDenormMode(const fltSemantics *FPSem = nullptr) const;
+
+  /// Return input denormal mode specified by operand bundles.
+  std::optional<DenormalMode::DenormalModeKind>
+  getInputDenormModeFromBundle(const fltSemantics *FPSem = nullptr) const;
+
+  /// Return output denormal mode specified by operand bundles.
+  std::optional<DenormalMode::DenormalModeKind>
+  getOutputDenormModeFromBundle(const fltSemantics *FPSem = nullptr) const;
+
   /// Used to keep track of an operand bundle.  See the main comment on
   /// OperandBundleUser above.
   struct BundleOpInfo {
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 78a0cd569d5bd..5f3f0b57f0c61 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -135,6 +135,14 @@ class IntrinsicInst : public CallInst {
   /// course of IR transformations
   LLVM_ABI static bool mayLowerToFunctionCall(Intrinsic::ID IID);
 
+  /// Check if \p ID represents a function that may access FP environment and
+  /// may have FP operand bundles.
+  ///
+  /// Access to FP environment means that in the strict FP environment the
+  /// function has read/write memory effect, which is used to maintain proper
+  /// instructions ordering.
+  static bool isFloatingPointOperation(Intrinsic::ID IID);
+
   /// Methods for support type inquiry through isa, cast, and dyn_cast:
   static bool classof(const CallInst *I) {
     auto *F = dyn_cast_or_null<Function>(I->getCalledOperand());
@@ -640,11 +648,6 @@ class VPIntrinsic : public IntrinsicInst {
     return getFunctionalIntrinsicIDForVP(getIntrinsicID());
   }
 
-  // Equivalent non-predicated constrained ID
-  std::optional<unsigned> getConstrainedIntrinsicID() const {
-    return getConstrainedIntrinsicIDForVP(getIntrinsicID());
-  }
-
   // Equivalent non-predicated opcode
   LLVM_ABI static std::optional<unsigned>
   getFunctionalOpcodeForVP(Intrinsic::ID ID);
@@ -653,9 +656,6 @@ class VPIntrinsic : public IntrinsicInst {
   LLVM_ABI static std::optional<Intrinsic::ID>
   getFunctionalIntrinsicIDForVP(Intrinsic::ID ID);
 
-  // Equivalent non-predicated constrained ID
-  LLVM_ABI static std::optional<Intrinsic::ID>
-  getConstrainedIntrinsicIDForVP(Intrinsic::ID ID);
 };
 
 /// This represents vector predication reduction intrinsics.
@@ -728,43 +728,6 @@ class VPBinOpIntrinsic : public VPIntrinsic {
 };
 
 
-/// This is the common base class for constrained floating point intrinsics.
-class ConstrainedFPIntrinsic : public IntrinsicInst {
-public:
-  LLVM_ABI unsigned getNonMetadataArgCount() const;
-  LLVM_ABI std::optional<RoundingMode> getRoundingMode() const;
-  LLVM_ABI std::optional<fp::ExceptionBehavior> getExceptionBehavior() const;
-  LLVM_ABI bool isDefaultFPEnvironment() const;
-
-  // Methods for support type inquiry through isa, cast, and dyn_cast:
-  LLVM_ABI static bool classof(const IntrinsicInst *I);
-  static bool classof(const Value *V) {
-    return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
-  }
-};
-
-/// Constrained floating point compare intrinsics.
-class ConstrainedFPCmpIntrinsic : public ConstrainedFPIntrinsic {
-public:
-  LLVM_ABI FCmpInst::Predicate getPredicate() const;
-  bool isSignaling() const {
-    return getIntrinsicID() == Intrinsic::experimental_constrained_fcmps;
-  }
-
-  // Methods for support type inquiry through isa, cast, and dyn_cast:
-  static bool classof(const IntrinsicInst *I) {
-    switch (I->getIntrinsicID()) {
-    case Intrinsic::experimental_constrained_fcmp:
-    case Intrinsic::experimental_constrained_fcmps:
-      return true;
-    default:
-      return false;
-    }
-  }
-  static bool classof(const Value *V) {
-    return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
-  }
-};
 
 /// This class represents min/max intrinsics.
 class MinMaxIntrinsic : public IntrinsicInst {
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 80ef58f51e7bb..57ea66efbbd76 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1138,6 +1138,40 @@ def int_experimental_memset_pattern
 // rounding modes and FP exception handling.
 
 let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrNoCreateUndefOrPoison] in {
+  // Intrinsic versions of the basic binary FP operations, allowing operand
+  // bundles (e.g. "fp.control") to carry per-call FP environment flags such
+  // as flush-to-zero denormal mode that cannot be attached to plain IR
+  // instructions (fadd/fsub/fmul).
+  def int_fadd : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
+                           [LLVMMatchType<0>, LLVMMatchType<0>]>;
+  def int_fsub : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
+                           [LLVMMatchType<0>, LLVMMatchType<0>]>;
+  def int_fmul : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
+                           [LLVMMatchType<0>, LLVMMatchType<0>]>;
+  def int_fneg : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
+                           [LLVMMatchType<0>]>;
+
+  // fdiv and frem: same binary pattern as fadd/fsub/fmul
+  def int_fdiv : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
+                           [LLVMMatchType<0>, LLVMMatchType<0>]>;
+  def int_frem : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
+                           [LLVMMatchType<0>, LLVMMatchType<0>]>;
+
+  // fptrunc/fpext: unary; result type differs from operand type
+  def int_fptrunc : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_anyfloat_ty]>;
+  def int_fpext   : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_anyfloat_ty]>;
+
+  // Integer/FP conversions: result type differs from operand type
+  def int_sitofp : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_anyint_ty]>;
+  def int_uitofp : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_anyint_ty]>;
+  def int_fptosi : DefaultAttrsIntrinsic<[llvm_anyint_ty],   [llvm_anyfloat_ty]>;
+  def int_fptoui : DefaultAttrsIntrinsic<[llvm_anyint_ty],   [llvm_anyfloat_ty]>;
+
+  // fcmp: produces i1 (or vector of i1); predicate passed as metadata string
+  def int_fcmp : DefaultAttrsIntrinsic<
+      [LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
+      [llvm_anyfloat_ty, LLVMMatchType<0>, llvm_metadata_ty]>;
+
   def int_fma  : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
                            [LLVMMatchType<0>, LLVMMatchType<0>,
                             LLVMMatchType<0>]>;
@@ -1212,6 +1246,17 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrNoCreateUndefOrPoison] in
   def int_frexp : DefaultAttrsIntrinsic<[llvm_anyfloat_ty, llvm_anyint_ty], [LLVMMatchType<0>]>;
 }
 
+// fcmps: signaling FP compare — raises Invalid exception for any NaN operand
+// (not just signaling NaN, unlike non-signaling fcmp).  Uses the fp.except
+// operand bundle to control exception observability; without a bundle the
+// exception is always observable (strict behaviour).
+// Note: defined outside the IntrNoMem let-block above so that
+// IntrInaccessibleMemOnly and IntrWillReturn are not overridden.
+def int_fcmps : Intrinsic<
+    [LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
+    [llvm_anyfloat_ty, LLVMMatchType<0>, llvm_metadata_ty],
+    [IntrInaccessibleMemOnly, IntrWillReturn, IntrNoCreateUndefOrPoison]>;
+
 // TODO: Move all of these into the IntrNoCreateUndefOrPoison case above.
 let IntrProperties = [IntrNoMem, IntrSpeculatable] in {
   // These functions do not read memory, but are sensitive to the
@@ -1289,232 +1334,10 @@ def int_is_fpclass
                             [llvm_anyfloat_ty, llvm_i32_ty],
                             [IntrNoMem, IntrSpeculatable, IntrNoCreateUndefOrPoison, ImmArg<ArgIndex<1>>]>;
 
-//===--------------- Constrained Floating Point Intrinsics ----------------===//
-//
-
-/// IntrStrictFP - The intrinsic is allowed to be used in an alternate
-/// floating point environment.
-def IntrStrictFP : IntrinsicProperty;
-
-let IntrProperties = [IntrInaccessibleMemOnly, IntrStrictFP] in {
-  def int_experimental_constrained_fadd : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_fsub : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_fmul : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_fdiv : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_frem : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_fma : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_fmuladd : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                       [ LLVMMatchType<0>,
-                                                         LLVMMatchType<0>,
-                                                         LLVMMatchType<0>,
-                                                         llvm_metadata_ty,
-                                                         llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_fptosi : DefaultAttrsIntrinsic<[ llvm_anyint_ty ],
-                                                    [ llvm_anyfloat_ty,
-                                                      llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_fptoui : DefaultAttrsIntrinsic<[ llvm_anyint_ty ],
-                                                    [ llvm_anyfloat_ty,
-                                                      llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_sitofp : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                       [ llvm_anyint_ty,
-                                                         llvm_metadata_ty,
-                                                         llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_uitofp : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                       [ llvm_anyint_ty,
-                                                         llvm_metadata_ty,
-                                                         llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_fptrunc : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                       [ llvm_anyfloat_ty,
-                                                         llvm_metadata_ty,
-                                                         llvm_metadata_ty ]>;
-
-  def int_experimental_constrained_fpext : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                     [ llvm_anyfloat_ty,
-                                                       llvm_metadata_ty ]>;
-
-  // These intrinsics are sensitive to the rounding mode so we need constrained
-  // versions of each of them.  When strict rounding and exception control are
-  // not required the non-constrained versions of these intrinsics should be
-  // used.
-  def int_experimental_constrained_sqrt : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_powi : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_i32_ty,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_ldexp : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_anyint_ty,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_asin  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_acos  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_atan  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_atan2 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_sin  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_cos  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_tan  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_sinh  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_cosh  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_tanh  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_pow  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_log  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_log10: DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_log2 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_exp  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_exp2 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_rint  : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                     [ LLVMMatchType<0>,
-                                                       llvm_metadata_ty,
-                                                       llvm_metadata_ty ]>;
-  def int_experimental_constrained_nearbyint : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                         [ LLVMMatchType<0>,
-                                                           llvm_metadata_ty,
-                                                           llvm_metadata_ty ]>;
-  def int_experimental_constrained_lrint : DefaultAttrsIntrinsic<[ llvm_anyint_ty ],
-                                                     [ llvm_anyfloat_ty,
-                                                       llvm_metadata_ty,
-                                                       llvm_metadata_ty ]>;
-  def int_experimental_constrained_llrint : DefaultAttrsIntrinsic<[ llvm_anyint_ty ],
-                                                      [ llvm_anyfloat_ty,
-                                                        llvm_metadata_ty,
-                                                        llvm_metadata_ty ]>;
-  def int_experimental_constrained_maxnum : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                      [ LLVMMatchType<0>,
-                                                        LLVMMatchType<0>,
-                                                        llvm_metadata_ty ]>;
-  def int_experimental_constrained_minnum : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                      [ LLVMMatchType<0>,
-                                                        LLVMMatchType<0>,
-                                                        llvm_metadata_ty ]>;
-  def int_experimental_constrained_maximum : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                       [ LLVMMatchType<0>,
-                                                         LLVMMatchType<0>,
-                                                         llvm_metadata_ty ]>;
-  def int_experimental_constrained_minimum : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                       [ LLVMMatchType<0>,
-                                                         LLVMMatchType<0>,
-                                                         llvm_metadata_ty ]>;
-  def int_experimental_constrained_ceil : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                    [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_floor : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                     [ LLVMMatchType<0>,
-                                                       llvm_metadata_ty ]>;
-  def int_experimental_constrained_lround : DefaultAttrsIntrinsic<[ llvm_anyint_ty ],
-                                                      [ llvm_anyfloat_ty,
-                                                        llvm_metadata_ty ]>;
-  def int_experimental_constrained_llround : DefaultAttrsIntrinsic<[ llvm_anyint_ty ],
-                                                       [ llvm_anyfloat_ty,
-                                                         llvm_metadata_ty ]>;
-  def int_experimental_constrained_round : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                     [ LLVMMatchType<0>,
-                                                      llvm_metadata_ty ]>;
-  def int_experimental_constrained_roundeven : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                         [ LLVMMatchType<0>,
-                                                           llvm_metadata_ty ]>;
-  def int_experimental_constrained_trunc : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
-                                                     [ LLVMMatchType<0>,
-                                                       llvm_metadata_ty ]>;
-
-  // Constrained floating-point comparison (quiet and signaling variants).
-  // Third operand is the predicate represented as a metadata string.
-  def int_experimental_constrained_fcmp
-      : DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> ],
-                  [ llvm_anyfloat_ty, LLVMMatchType<0>,
-                    llvm_metadata_ty, llvm_metadata_ty ]>;
-  def int_experimental_constrained_fcmps
-      : DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> ],
-                  [ llvm_anyfloat_ty, LLVMMatchType<0>,
-                    llvm_metadata_ty, llvm_metadata_ty ]>;
-}
-// FIXME: Consider maybe adding intrinsics for sitofp, uitofp.
-
+// Constrained Floating Point Intrinsics (experimental_constrained_*) have been
+// removed. They are auto-upgraded to new FP intrinsics with fp.control /
+// fp.except operand bundles on IR load. See lib/IR/AutoUpgrade.cpp.
+// experimental_constrained_fcmps is upgraded to llvm.fcmps.
 
 //===------------------------- Expect Intrinsics --------------------------===//
 //
diff --git a/llvm/include/llvm/IR/LLVMContext.h b/llvm/include/llvm/IR/LLVMContext.h
index 646a04673dbd0..20a2807fe9613 100644
--- a/llvm/include/llvm/IR/LLVMContext.h
+++ b/llvm/include/llvm/IR/LLVMContext.h
@@ -99,7 +99,9 @@ class LLVMContext {
     OB_convergencectrl = 9,        // "convergencectrl"
     OB_align = 10,                 // "align"
     OB_deactivation_symbol = 11,   // "deactivation-symbol"
-    OB_LastBundleID = OB_deactivation_symbol
+    OB_fp_control = 12,            // "fp.control"
+    OB_fp_except = 13,             // "fp.except"
+    OB_LastBundleID = OB_fp_except
   };
 
   /// getMDKindID - Return a unique non-zero ID for the specified metadata kind.
diff --git a/llvm/include/llvm/IR/Type.h b/llvm/include/llvm/IR/Type.h
index 4217d797cdf28..18fae5df8a2c2 100644
--- a/llvm/include/llvm/IR/Type.h
+++ b/llvm/include/llvm/IR/Type.h
@@ -196,6 +196,8 @@ class Type {
     return getTypeID() == PPC_FP128TyID;
   }
 
+  LLVM_ABI const fltSemantics *hasFltSemantics() const;
+
   LLVM_ABI const fltSemantics &getFltSemantics() const;
 
   /// Return true if this is X86 AMX.
diff --git a/llvm/include/llvm/IR/VPIntrinsics.def b/llvm/include/llvm/IR/VPIntrinsics.def
index 0b0c744487b92..b68fb30b16c34 100644
--- a/llvm/include/llvm/IR/VPIntrinsics.def
+++ b/llvm/include/llvm/IR/VPIntrinsics.def
@@ -95,12 +95,6 @@
 #define VP_PROPERTY_FUNCTIONAL_OPC(OPC)
 #endif
 
-// If operation can have rounding or fp exceptions, maps to corresponding
-// constrained fp intrinsic.
-#ifndef VP_PROPERTY_CONSTRAINEDFP
-#define VP_PROPERTY_CONSTRAINEDFP(INTRINID)
-#endif
-
 // The intrinsic and/or SDNode has the same function as this ISD Opcode.
 // \p SDOPC      The opcode of the instruction with the same function.
 #ifndef VP_PROPERTY_FUNCTIONAL_SDOPC
@@ -316,7 +310,6 @@ END_REGISTER_VP(vp_usub_sat, VP_USUBSAT)
 #define HELPER_REGISTER_BINARY_FP_VP(OPSUFFIX, VPSD, IROPC, SDOPC)             \
   BEGIN_REGISTER_VP(vp_##OPSUFFIX, 2, 3, VPSD, -1)                             \
   VP_PROPERTY_FUNCTIONAL_OPC(IROPC)                                            \
-  VP_PROPERTY_CONSTRAINEDFP(experimental_constrained_##OPSUFFIX)         \
   VP_PROPERTY_FUNCTIONAL_SDOPC(SDOPC)                                          \
   VP_PROPERTY_BINARYOP                                                         \
   END_REGISTER_VP(vp_##OPSUFFIX, VPSD)
@@ -358,14 +351,12 @@ END_REGISTER_VP(vp_sqrt, VP_SQRT)
 
 // llvm.vp.fma(x,y,z,mask,vlen)
 BEGIN_REGISTER_VP(vp_fma, 3, 4, VP_FMA, -1)
-VP_PROPERTY_CONSTRAINEDFP(experimental_constrained_fma)
 VP_PROPERTY_FUNCTIONAL_INTRINSIC(fma)
 VP_PROPERTY_FUNCTIONAL_SDOPC(FMA)
 END_REGISTER_VP(vp_fma, VP_FMA)
 
 // llvm.vp.fmuladd(x,y,z,mask,vlen)
 BEGIN_REGISTER_VP(vp_fmuladd, 3, 4, VP_FMULADD, -1)
-VP_PROPERTY_CONSTRAINEDFP(experimental_constrained_fmuladd)
 VP_PROPERTY_FUNCTIONAL_INTRINSIC(fmuladd)
 VP_PROPERTY_FUNCTIONAL_SDOPC(FMAD)
 END_REGISTER_VP(vp_fmuladd, VP_FMULADD)
@@ -472,7 +463,6 @@ END_REGISTER_VP(vp_llrint, VP_LLRINT)
   BEGIN_REGISTER_VP(vp_##OPSUFFIX, 1, 2, VPSD, -1)                             \
   VP_PROPERTY_FUNCTIONAL_OPC(IROPC)                                            \
   VP_PROPERTY_FUNCTIONAL_SDOPC(SDOPC)                                          \
-  VP_PROPERTY_CONSTRAINEDFP(experimental_constrained_##OPSUFFIX)  \
   END_REGISTER_VP(vp_##OPSUFFIX, VPSD)
 
 // llvm.vp.fptoui(x,mask,vlen)
@@ -540,7 +530,6 @@ END_REGISTER_VP_SDNODE(VP_SETCC)
 BEGIN_REGISTER_VP_INTRINSIC(vp_fcmp, 3, 4)
 HELPER_MAP_VPID_TO_VPSD(vp_fcmp, VP_SETCC)
 VP_PROPERTY_FUNCTIONAL_OPC(FCmp)
-VP_PROPERTY_CONSTRAINEDFP(experimental_constrained_fcmp)
 END_REGISTER_VP_INTRINSIC(vp_fcmp)
 
 // llvm.vp.icmp(x,y,cc,mask,vlen)
@@ -756,7 +745,6 @@ END_REGISTER_VP(experimental_vp_reverse, EXPERIMENTAL_VP_REVERSE)
 #undef END_REGISTER_VP_SDNODE
 #undef HELPER_MAP_VPID_TO_VPSD
 #undef VP_PROPERTY_BINARYOP
-#undef VP_PROPERTY_CONSTRAINEDFP
 #undef VP_PROPERTY_FUNCTIONAL_INTRINSIC
 #undef VP_PROPERTY_FUNCTIONAL_OPC
 #undef VP_PROPERTY_FUNCTIONAL_SDOPC
diff --git a/llvm/include/llvm/Support/ModRef.h b/llvm/include/llvm/Support/ModRef.h
index 83091c617f629..4289b29cb1be6 100644
--- a/llvm/include/llvm/Support/ModRef.h
+++ b/llvm/include/llvm/Support/ModRef.h
@@ -295,6 +295,11 @@ template <typename LocationEnum> class MemoryEffectsBase {
     return true;
   }
 
+  /// Whether this function accesses inaccessible memory.
+  bool doesAccessInaccessibleMem() const {
+    return isModOrRefSet(getModRef(Location::InaccessibleMem));
+  }
+
   /// Whether this function only (at most) accesses errno memory.
   bool onlyAccessesErrnoMem() const {
     return getWithoutLoc(Location::ErrnoMem).doesNotAccessMemory();
diff --git a/llvm/include/module.modulemap b/llvm/include/module.modulemap
index cf3ef4b08965d..9f83613760609 100644
--- a/llvm/include/module.modulemap
+++ b/llvm/include/module.modulemap
@@ -274,7 +274,6 @@ module LLVM_IR {
   extern module LLVM_Extern_IR_Intrinsics_Enum "module.extern.modulemap"
 
   // These are intended for (repeated) textual inclusion.
-  textual header "llvm/IR/ConstrainedOps.def"
   textual header "llvm/IR/DebugInfoFlags.def"
   textual header "llvm/IR/FunctionProperties.def"
   textual header "llvm/IR/Instruction.def"
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index cb8ecc3872c0c..7495e49ff2634 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -24,6 +24,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/Analysis/TargetFolder.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
@@ -1362,6 +1363,11 @@ static ConstantFP *flushDenormalConstantFP(ConstantFP *CFP,
   if (!APF.isDenormal())
     return CFP;
 
+  if (auto *CB = dyn_cast_or_null<CallBase>(Inst)) {
+    auto Mode = IsOutput ? CB->getOutputDenormMode() : CB->getInputDenormMode();
+    return flushDenormalConstant(CFP->getType(), APF, *Mode);
+  }
+
   DenormalMode Mode = getInstrDenormalMode(Inst, CFP->getType());
   return flushDenormalConstant(CFP->getType(), APF,
                                IsOutput ? Mode.Output : Mode.Input);
@@ -1946,24 +1952,16 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
   case Intrinsic::rint:
   case Intrinsic::canonicalize:
 
-  // Constrained intrinsics can be folded if FP environment is known
-  // to compiler.
-  case Intrinsic::experimental_constrained_fma:
-  case Intrinsic::experimental_constrained_fmuladd:
-  case Intrinsic::experimental_constrained_fadd:
-  case Intrinsic::experimental_constrained_fsub:
-  case Intrinsic::experimental_constrained_fmul:
-  case Intrinsic::experimental_constrained_fdiv:
-  case Intrinsic::experimental_constrained_frem:
-  case Intrinsic::experimental_constrained_ceil:
-  case Intrinsic::experimental_constrained_floor:
-  case Intrinsic::experimental_constrained_round:
-  case Intrinsic::experimental_constrained_roundeven:
-  case Intrinsic::experimental_constrained_trunc:
-  case Intrinsic::experimental_constrained_nearbyint:
-  case Intrinsic::experimental_constrained_rint:
-  case Intrinsic::experimental_constrained_fcmp:
-  case Intrinsic::experimental_constrained_fcmps:
+  // New FP intrinsics (llvm.fadd, llvm.fsub, etc.) carry rounding mode and
+  // exception behavior via operand bundles. They can be folded when those
+  // bundles permit (see mayFoldNewFPIntrinsic).
+  case Intrinsic::fadd:
+  case Intrinsic::fsub:
+  case Intrinsic::fmul:
+  case Intrinsic::fdiv:
+  case Intrinsic::frem:
+  case Intrinsic::fcmp:
+  case Intrinsic::fcmps:
 
   case Intrinsic::experimental_cttz_elts:
     return true;
@@ -2313,10 +2311,10 @@ static bool getConstIntOrUndef(Value *Op, const APInt *&C) {
 ///
 /// \param CI Constrained intrinsic call.
 /// \param St Exception flags raised during constant evaluation.
-static bool mayFoldConstrained(ConstrainedFPIntrinsic *CI,
+static bool mayFoldConstrained(const CallBase *CI,
                                APFloat::opStatus St) {
-  std::optional<RoundingMode> ORM = CI->getRoundingMode();
-  std::optional<fp::ExceptionBehavior> EB = CI->getExceptionBehavior();
+  RoundingMode ORM = CI->getRoundingMode();
+  fp::ExceptionBehavior EB = CI->getExceptionBehavior();
 
   // If the operation does not change exception status flags, it is safe
   // to fold.
@@ -2330,7 +2328,7 @@ static bool mayFoldConstrained(ConstrainedFPIntrinsic *CI,
 
   // If FP exceptions are ignored, fold the call, even if such exception is
   // raised.
-  if (EB && *EB != fp::ExceptionBehavior::ebStrict)
+  if (EB != fp::ExceptionBehavior::ebStrict)
     return true;
 
   // Leave the calculation for runtime so that exception flags be correctly set
@@ -2338,17 +2336,51 @@ static bool mayFoldConstrained(ConstrainedFPIntrinsic *CI,
   return false;
 }
 
+/// Like mayFoldConstrained, but for new FP intrinsics (llvm.fadd etc.) that
+/// carry rounding mode and exception behavior via operand bundles.
+static bool mayFoldNewFPIntrinsic(const CallBase *CI, APFloat::opStatus St) {
+  // If the operation produced no exception, the result does not depend on
+  // rounding mode or exception behavior.
+  if (St == APFloat::opStatus::opOK)
+    return true;
+
+  // If evaluation raised an FP exception, the result can depend on the rounding
+  // mode.  If the rounding mode is dynamic (unknown at compile time), folding is
+  // not safe.
+  if (CI->getRoundingMode() == RoundingMode::Dynamic)
+    return false;
+
+  // If exceptions are ignored (ebIgnore / ebMayTrap), fold even if an exception
+  // was raised.
+  if (CI->getExceptionBehavior() != fp::ExceptionBehavior::ebStrict)
+    return true;
+
+  // Leave the calculation for runtime so exception flags are set correctly.
+  return false;
+}
+
+/// Returns the rounding mode that should be used for constant evaluation of a
+/// new FP intrinsic (llvm.fadd etc.).  Mirrors getEvaluationRoundingMode().
+static RoundingMode getEvaluationRoundingModeForNewFP(const CallBase *CI) {
+  RoundingMode RM = CI->getRoundingMode();
+  if (RM == RoundingMode::Dynamic)
+    // Evaluate in nearest-even; if no exception is raised the result is exact
+    // and independent of the actual runtime rounding mode.
+    return RoundingMode::NearestTiesToEven;
+  return RM;
+}
+
 /// Returns the rounding mode that should be used for constant evaluation.
 static RoundingMode
-getEvaluationRoundingMode(const ConstrainedFPIntrinsic *CI) {
-  std::optional<RoundingMode> ORM = CI->getRoundingMode();
-  if (!ORM || *ORM == RoundingMode::Dynamic)
+getEvaluationRoundingMode(const CallBase *CI) {
+  RoundingMode ORM = CI->getRoundingMode();
+  if (ORM == RoundingMode::Dynamic)
     // Even if the rounding mode is unknown, try evaluating the operation.
     // If it does not raise inexact exception, rounding was not applied,
     // so the result is exact and does not depend on rounding mode. Whether
     // other FP exceptions are raised, it does not depend on rounding mode.
     return RoundingMode::NearestTiesToEven;
-  return *ORM;
+  return ORM;
 }
 
 /// Try to constant fold llvm.canonicalize for the given caller and value.
@@ -2564,40 +2596,39 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
     switch (IntrinsicID) {
     default:
       break;
-    case Intrinsic::experimental_constrained_nearbyint:
-    case Intrinsic::experimental_constrained_rint: {
-      auto CI = cast<ConstrainedFPIntrinsic>(Call);
-      RM = CI->getRoundingMode();
-      if (!RM || *RM == RoundingMode::Dynamic)
+    case Intrinsic::nearbyint:
+    case Intrinsic::rint: {
+      RoundingMode CallRM = Call->getRoundingMode();
+      if (CallRM == RoundingMode::Dynamic)
         return nullptr;
+      RM = CallRM;
       break;
     }
-    case Intrinsic::experimental_constrained_round:
+    case Intrinsic::round:
       RM = APFloat::rmNearestTiesToAway;
       break;
-    case Intrinsic::experimental_constrained_ceil:
+    case Intrinsic::ceil:
       RM = APFloat::rmTowardPositive;
       break;
-    case Intrinsic::experimental_constrained_floor:
+    case Intrinsic::floor:
       RM = APFloat::rmTowardNegative;
       break;
-    case Intrinsic::experimental_constrained_trunc:
+    case Intrinsic::trunc:
       RM = APFloat::rmTowardZero;
       break;
     }
     if (RM) {
-      auto CI = cast<ConstrainedFPIntrinsic>(Call);
       if (U.isFinite()) {
         APFloat::opStatus St = U.roundToIntegral(*RM);
-        if (IntrinsicID == Intrinsic::experimental_constrained_rint &&
+        if (IntrinsicID == Intrinsic::rint &&
             St == APFloat::opInexact) {
-          std::optional<fp::ExceptionBehavior> EB = CI->getExceptionBehavior();
+          fp::ExceptionBehavior EB = Call->getExceptionBehavior();
           if (EB == fp::ebStrict)
             return nullptr;
         }
       } else if (U.isSignaling()) {
-        std::optional<fp::ExceptionBehavior> EB = CI->getExceptionBehavior();
-        if (EB && *EB != fp::ebIgnore)
+        fp::ExceptionBehavior EB = Call->getExceptionBehavior();
+        if (EB != fp::ebIgnore)
           return nullptr;
         U = APFloat::getQNaN(U.getSemantics());
       }
@@ -3155,11 +3186,29 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 }
 
 static Constant *evaluateCompare(const APFloat &Op1, const APFloat &Op2,
-                                 const ConstrainedFPIntrinsic *Call) {
+                                 const IntrinsicInst *Call) {
   APFloat::opStatus St = APFloat::opOK;
-  auto *FCmp = cast<ConstrainedFPCmpIntrinsic>(Call);
-  FCmpInst::Predicate Cond = FCmp->getPredicate();
-  if (FCmp->isSignaling()) {
+  Metadata *MD =
+      cast<MetadataAsValue>(Call->getArgOperand(2))->getMetadata();
+  FCmpInst::Predicate Cond =
+      StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
+          .Case("oeq", FCmpInst::FCMP_OEQ)
+          .Case("ogt", FCmpInst::FCMP_OGT)
+          .Case("oge", FCmpInst::FCMP_OGE)
+          .Case("olt", FCmpInst::FCMP_OLT)
+          .Case("ole", FCmpInst::FCMP_OLE)
+          .Case("one", FCmpInst::FCMP_ONE)
+          .Case("ord", FCmpInst::FCMP_ORD)
+          .Case("uno", FCmpInst::FCMP_UNO)
+          .Case("ueq", FCmpInst::FCMP_UEQ)
+          .Case("ugt", FCmpInst::FCMP_UGT)
+          .Case("uge", FCmpInst::FCMP_UGE)
+          .Case("ult", FCmpInst::FCMP_ULT)
+          .Case("ule", FCmpInst::FCMP_ULE)
+          .Case("une", FCmpInst::FCMP_UNE)
+          .Default(FCmpInst::BAD_FCMP_PREDICATE);
+  bool IsSignaling = (Call->getIntrinsicID() == Intrinsic::fcmps);
+  if (IsSignaling) {
     if (Op1.isNaN() || Op2.isNaN())
       St = APFloat::opInvalidOp;
   } else {
@@ -3167,7 +3216,11 @@ static Constant *evaluateCompare(const APFloat &Op1, const APFloat &Op2,
       St = APFloat::opInvalidOp;
   }
   bool Result = FCmpInst::compare(Op1, Op2, Cond);
-  if (mayFoldConstrained(const_cast<ConstrainedFPCmpIntrinsic *>(FCmp), St))
+
+  // Compares are independent of rounding mode, so only exception behavior
+  // matters: fold unless an exception was raised *and* EB is strict.
+  fp::ExceptionBehavior EB = Call->getExceptionBehavior();
+  if (St == APFloat::opOK || EB != fp::ebStrict)
     return ConstantInt::get(Call->getType()->getScalarType(), Result);
   return nullptr;
 }
@@ -3300,45 +3353,94 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
   }
 
   if (const auto *Op1 = dyn_cast<ConstantFP>(Operands[0])) {
-    const APFloat &Op1V = Op1->getValueAPF();
+    ConstantFP *Op1F =
+        flushDenormalConstantFP(const_cast<ConstantFP *>(Op1), Call, false);
+    if (!Op1F)
+      return nullptr;
+    const APFloat &Op1V = Op1F->getValueAPF();
 
     if (const auto *Op2 = dyn_cast<ConstantFP>(Operands[1])) {
       if (Op2->getType() != Op1->getType())
         return nullptr;
-      const APFloat &Op2V = Op2->getValueAPF();
-
-      if (const auto *ConstrIntr =
-              dyn_cast_if_present<ConstrainedFPIntrinsic>(Call)) {
+      ConstantFP *Op2F =
+          flushDenormalConstantFP(const_cast<ConstantFP *>(Op2), Call, false);
+      if (!Op2F)
+        return nullptr;
+      const APFloat &Op2V = Op2F->getValueAPF();
+
+      const auto *ConstrIntr =
+          (Call && isa<IntrinsicInst>(Call) &&
+           Intrinsic::isConstrainedFPIntrinsic(
+               cast<IntrinsicInst>(Call)->getIntrinsicID()))
+              ? cast<IntrinsicInst>(Call)
+              : nullptr;
+      if (ConstrIntr) {
         RoundingMode RM = getEvaluationRoundingMode(ConstrIntr);
         APFloat Res = Op1V;
         APFloat::opStatus St;
         switch (IntrinsicID) {
         default:
           return nullptr;
-        case Intrinsic::experimental_constrained_fadd:
+        case Intrinsic::fadd:
           St = Res.add(Op2V, RM);
           break;
-        case Intrinsic::experimental_constrained_fsub:
+        case Intrinsic::fsub:
           St = Res.subtract(Op2V, RM);
           break;
-        case Intrinsic::experimental_constrained_fmul:
+        case Intrinsic::fmul:
           St = Res.multiply(Op2V, RM);
           break;
-        case Intrinsic::experimental_constrained_fdiv:
+        case Intrinsic::fdiv:
           St = Res.divide(Op2V, RM);
           break;
-        case Intrinsic::experimental_constrained_frem:
+        case Intrinsic::frem:
           St = Res.mod(Op2V);
           break;
-        case Intrinsic::experimental_constrained_fcmp:
-        case Intrinsic::experimental_constrained_fcmps:
+        case Intrinsic::fcmp:
+        case Intrinsic::fcmps:
           return evaluateCompare(Op1V, Op2V, ConstrIntr);
         }
-        if (mayFoldConstrained(const_cast<ConstrainedFPIntrinsic *>(ConstrIntr),
-                               St))
-          return ConstantFP::get(Ty, Res);
+        if (mayFoldConstrained(ConstrIntr, St)) {
+          DenormalMode::DenormalModeKind Mode = *Call->getOutputDenormMode();
+          return flushDenormalConstant(Op2->getType(), Res, Mode);
+        }
+        return nullptr;
+      }
+
+      // Handle new FP intrinsics (llvm.fadd, llvm.fsub, etc.) with optional
+      // fp.control / fp.except operand bundles.
+      switch (IntrinsicID) {
+      case Intrinsic::fcmp:
+      case Intrinsic::fcmps: {
+        const auto *FcmpII = cast<IntrinsicInst>(Call);
+        return evaluateCompare(Op1V, Op2V, FcmpII);
+      }
+      case Intrinsic::fadd:
+      case Intrinsic::fsub:
+      case Intrinsic::fmul:
+      case Intrinsic::fdiv:
+      case Intrinsic::frem: {
+        RoundingMode RM = getEvaluationRoundingModeForNewFP(Call);
+        APFloat Res = Op1V;
+        APFloat::opStatus St;
+        switch (IntrinsicID) {
+        default:
+          llvm_unreachable("unexpected intrinsic");
+        case Intrinsic::fadd: St = Res.add(Op2V, RM); break;
+        case Intrinsic::fsub: St = Res.subtract(Op2V, RM); break;
+        case Intrinsic::fmul: St = Res.multiply(Op2V, RM); break;
+        case Intrinsic::fdiv: St = Res.divide(Op2V, RM); break;
+        case Intrinsic::frem: St = Res.mod(Op2V); break;
+        }
+        if (mayFoldNewFPIntrinsic(Call, St)) {
+          DenormalMode::DenormalModeKind Mode = *Call->getOutputDenormMode();
+          return flushDenormalConstant(Op2->getType(), Res, Mode);
+        }
         return nullptr;
       }
+      default:
+        break;
+      }
 
       switch (IntrinsicID) {
       default:
@@ -3915,20 +4017,25 @@ static Constant *ConstantFoldScalarCall3(StringRef Name,
         const APFloat &C2 = Op2->getValueAPF();
         const APFloat &C3 = Op3->getValueAPF();
 
-        if (const auto *ConstrIntr = dyn_cast<ConstrainedFPIntrinsic>(Call)) {
+        const auto *ConstrIntr =
+            (Call && isa<IntrinsicInst>(Call) &&
+             Intrinsic::isConstrainedFPIntrinsic(
+                 cast<IntrinsicInst>(Call)->getIntrinsicID()))
+                ? cast<IntrinsicInst>(Call)
+                : nullptr;
+        if (ConstrIntr) {
           RoundingMode RM = getEvaluationRoundingMode(ConstrIntr);
           APFloat Res = C1;
           APFloat::opStatus St;
           switch (IntrinsicID) {
           default:
             return nullptr;
-          case Intrinsic::experimental_constrained_fma:
-          case Intrinsic::experimental_constrained_fmuladd:
+          case Intrinsic::fma:
+          case Intrinsic::fmuladd:
             St = Res.fusedMultiplyAdd(C2, C3, RM);
             break;
           }
-          if (mayFoldConstrained(
-                  const_cast<ConstrainedFPIntrinsic *>(ConstrIntr), St))
+          if (mayFoldConstrained(ConstrIntr, St))
             return ConstantFP::get(Ty, Res);
           return nullptr;
         }
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 4b7e23d3025aa..194960b18cbe6 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -7301,13 +7301,6 @@ static Value *simplifyIntrinsic(CallBase *Call, Value *Callee,
 
     return nullptr;
   }
-  case Intrinsic::experimental_constrained_fma: {
-    auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
-    if (Value *V = simplifyFPOp(Args, {}, Q, *FPI->getExceptionBehavior(),
-                                *FPI->getRoundingMode()))
-      return V;
-    return nullptr;
-  }
   case Intrinsic::fma:
   case Intrinsic::fmuladd: {
     if (Value *V = simplifyFPOp(Args, {}, Q, fp::ebIgnore,
@@ -7387,37 +7380,27 @@ static Value *simplifyIntrinsic(CallBase *Call, Value *Callee,
 
     return nullptr;
   }
-  case Intrinsic::experimental_constrained_fadd: {
-    auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
-    return simplifyFAddInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
-                            *FPI->getExceptionBehavior(),
-                            *FPI->getRoundingMode());
-  }
-  case Intrinsic::experimental_constrained_fsub: {
-    auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
-    return simplifyFSubInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
-                            *FPI->getExceptionBehavior(),
-                            *FPI->getRoundingMode());
-  }
-  case Intrinsic::experimental_constrained_fmul: {
-    auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
-    return simplifyFMulInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
-                            *FPI->getExceptionBehavior(),
-                            *FPI->getRoundingMode());
-  }
-  case Intrinsic::experimental_constrained_fdiv: {
-    auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
-    return simplifyFDivInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
-                            *FPI->getExceptionBehavior(),
-                            *FPI->getRoundingMode());
-  }
-  case Intrinsic::experimental_constrained_frem: {
-    auto *FPI = cast<ConstrainedFPIntrinsic>(Call);
-    return simplifyFRemInst(Args[0], Args[1], FPI->getFastMathFlags(), Q,
-                            *FPI->getExceptionBehavior(),
-                            *FPI->getRoundingMode());
-  }
-  case Intrinsic::experimental_constrained_ldexp:
+  case Intrinsic::fadd:
+    return simplifyFAddInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
+                            Call->getExceptionBehavior(),
+                            Call->getRoundingMode());
+  case Intrinsic::fsub:
+    return simplifyFSubInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
+                            Call->getExceptionBehavior(),
+                            Call->getRoundingMode());
+  case Intrinsic::fmul:
+    return simplifyFMulInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
+                            Call->getExceptionBehavior(),
+                            Call->getRoundingMode());
+  case Intrinsic::fdiv:
+    return simplifyFDivInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
+                            Call->getExceptionBehavior(),
+                            Call->getRoundingMode());
+  case Intrinsic::frem:
+    return simplifyFRemInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
+                            Call->getExceptionBehavior(),
+                            Call->getRoundingMode());
+  case Intrinsic::ldexp:
     return simplifyLdexp(Args[0], Args[1], Q, true);
   case Intrinsic::experimental_gc_relocate: {
     GCRelocateInst &GCR = *cast<GCRelocateInst>(Call);
@@ -7509,7 +7492,7 @@ Value *llvm::simplifyCall(CallBase *Call, Value *Callee, ArrayRef<Value *> Args,
 }
 
 Value *llvm::simplifyConstrainedFPCall(CallBase *Call, const SimplifyQuery &Q) {
-  assert(isa<ConstrainedFPIntrinsic>(Call));
+  assert(isa<IntrinsicInst>(Call) && Intrinsic::isConstrainedFPIntrinsic(cast<IntrinsicInst>(Call)->getIntrinsicID()));
   SmallVector<Value *, 4> Args(Call->args());
   if (Value *V = tryConstantFoldCall(Call, Call->getCalledOperand(), Args, Q))
     return V;
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 5b52efb9e1dc6..087845332696d 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -5231,8 +5231,7 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts,
       Known = KnownFPClass::fma(KnownSrc[0], KnownSrc[1], KnownSrc[2], Mode);
       break;
     }
-    case Intrinsic::sqrt:
-    case Intrinsic::experimental_constrained_sqrt: {
+    case Intrinsic::sqrt: {
       KnownFPClass KnownSrc;
       FPClassTest InterestedSrcs = InterestedClasses;
       if (InterestedClasses & fcNan)
@@ -5429,9 +5428,6 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts,
     case Intrinsic::log:
     case Intrinsic::log10:
     case Intrinsic::log2:
-    case Intrinsic::experimental_constrained_log:
-    case Intrinsic::experimental_constrained_log10:
-    case Intrinsic::experimental_constrained_log2:
     case Intrinsic::amdgcn_log: {
       Type *EltTy = II->getType()->getScalarType();
 
@@ -5506,8 +5502,8 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts,
                           Known, Q, Depth + 1);
       break;
     }
-    case Intrinsic::experimental_constrained_sitofp:
-    case Intrinsic::experimental_constrained_uitofp:
+    case Intrinsic::sitofp:
+    case Intrinsic::uitofp:
       // Cannot produce nan
       Known.knownNot(fcNan);
 
@@ -5517,7 +5513,7 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts,
       // Integers cannot be subnormal
       Known.knownNot(fcSubnormal);
 
-      if (IID == Intrinsic::experimental_constrained_uitofp)
+      if (IID == Intrinsic::uitofp)
         Known.signBitMustBeZero();
 
       // TODO: Copy inf handling from instructions
diff --git a/llvm/lib/CodeGen/ExpandVectorPredication.cpp b/llvm/lib/CodeGen/ExpandVectorPredication.cpp
index 892348d871e32..725623d81051b 100644
--- a/llvm/lib/CodeGen/ExpandVectorPredication.cpp
+++ b/llvm/lib/CodeGen/ExpandVectorPredication.cpp
@@ -310,19 +310,13 @@ bool CachingVPExpander::expandPredicationToFPCall(
     return true;
   }
   case Intrinsic::fma:
-  case Intrinsic::fmuladd:
-  case Intrinsic::experimental_constrained_fma:
-  case Intrinsic::experimental_constrained_fmuladd: {
+  case Intrinsic::fmuladd: {
     Value *Op0 = VPI.getOperand(0);
     Value *Op1 = VPI.getOperand(1);
     Value *Op2 = VPI.getOperand(2);
     Function *Fn = Intrinsic::getOrInsertDeclaration(
         VPI.getModule(), UnpredicatedIntrinsicID, {VPI.getType()});
-    Value *NewOp;
-    if (Intrinsic::isConstrainedFPIntrinsic(UnpredicatedIntrinsicID))
-      NewOp = Builder.CreateConstrainedFPCall(Fn, {Op0, Op1, Op2});
-    else
-      NewOp = Builder.CreateCall(Fn, {Op0, Op1, Op2});
+    Value *NewOp = Builder.CreateCall(Fn, {Op0, Op1, Op2});
     replaceOperation(*NewOp, VPI);
     return true;
   }
@@ -646,10 +640,6 @@ bool CachingVPExpander::expandPredication(VPIntrinsic &VPI) {
     return expandPredicationInMemoryIntrinsic(Builder, VPI);
   }
 
-  if (auto CID = VPI.getConstrainedIntrinsicID())
-    if (expandPredicationToFPCall(Builder, VPI, *CID))
-      return true;
-
   return false;
 }
 
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index 89daec05f7135..a01e7875912e0 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -306,9 +306,23 @@ void IRTranslator::addMachineCFGPred(CFGEdge Edge, MachineBasicBlock *NewPred) {
   MachinePreds[Edge].push_back(NewPred);
 }
 
+static bool targetSupportsBF16Type(const MachineFunction *MF) {
+  return MF->getTarget().getTargetTriple().isSPIRV();
+}
+
+static bool containsBF16Type(const User &U) {
+  // BF16 cannot currently be represented by LLT, to avoid miscompiles we
+  // prevent any instructions using them. FIXME: This can be removed once LLT
+  // supports bfloat.
+  return U.getType()->getScalarType()->isBFloatTy() ||
+         any_of(U.operands(), [](Value *V) {
+           return V->getType()->getScalarType()->isBFloatTy();
+         });
+}
+
 bool IRTranslator::translateBinaryOp(unsigned Opcode, const User &U,
                                      MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U) && !targetSupportsBF16Type(MF))
     return false;
 
   // Get or create a virtual register for each value.
@@ -330,7 +344,7 @@ bool IRTranslator::translateBinaryOp(unsigned Opcode, const User &U,
 
 bool IRTranslator::translateUnaryOp(unsigned Opcode, const User &U,
                                     MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U) && !targetSupportsBF16Type(MF))
     return false;
 
   Register Op0 = getOrCreateVReg(*U.getOperand(0));
@@ -350,7 +364,7 @@ bool IRTranslator::translateFNeg(const User &U, MachineIRBuilder &MIRBuilder) {
 
 bool IRTranslator::translateCompare(const User &U,
                                     MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U) && !targetSupportsBF16Type(MF))
     return false;
 
   auto *CI = cast<CmpInst>(&U);
@@ -367,7 +381,12 @@ bool IRTranslator::translateCompare(const User &U,
   else if (Pred == CmpInst::FCMP_TRUE)
     MIRBuilder.buildCopy(
         Res, getOrCreateVReg(*Constant::getAllOnesValue(U.getType())));
-  else
+  else if (MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
+    // In a strictfp function, plain fcmp instructions still need strict
+    // semantics (e.g. when auto-upgraded from experimental_constrained_fcmp).
+    MIRBuilder.buildInstr(TargetOpcode::G_STRICT_FCMP, {Res},
+                          {Pred, Op0, Op1}, Flags);
+  } else
     MIRBuilder.buildFCmp(Pred, Res, Op0, Op1, Flags);
 
   return true;
@@ -904,7 +923,7 @@ bool IRTranslator::emitJumpTableHeader(SwitchCG::JumpTable &JT,
   auto Cst = getOrCreateVReg(
       *ConstantInt::get(SValue.getType(), JTH.Last - JTH.First));
   Cst = MIB.buildZExtOrTrunc(PtrScalarTy, Cst).getReg(0);
-  auto Cmp = MIB.buildICmp(CmpInst::ICMP_UGT, LLT::integer(1), Sub, Cst);
+  auto Cmp = MIB.buildICmp(CmpInst::ICMP_UGT, LLT::scalar(1), Sub, Cst);
 
   auto BrCond = MIB.buildBrCond(Cmp.getReg(0), *JT.Default);
 
@@ -935,7 +954,7 @@ void IRTranslator::emitSwitchCase(SwitchCG::CaseBlock &CB,
     return;
   }
 
-  const LLT i1Ty = LLT::integer(1);
+  const LLT i1Ty = LLT::scalar(1);
   // Build the compare.
   if (!CB.CmpMHS) {
     const auto *CI = dyn_cast<ConstantInt>(CB.CmpRHS);
@@ -1147,7 +1166,7 @@ void IRTranslator::emitBitTestHeader(SwitchCG::BitTestBlock &B,
   if (!B.FallthroughUnreachable) {
     // Conditional branch to the default block.
     auto RangeCst = MIB.buildConstant(SwitchOpTy, B.Range);
-    auto RangeCmp = MIB.buildICmp(CmpInst::Predicate::ICMP_UGT, LLT::integer(1),
+    auto RangeCmp = MIB.buildICmp(CmpInst::Predicate::ICMP_UGT, LLT::scalar(1),
                                   RangeSub, RangeCst);
     MIB.buildBrCond(RangeCmp, *B.Default);
   }
@@ -1173,16 +1192,15 @@ void IRTranslator::emitBitTestCase(SwitchCG::BitTestBlock &BB,
     // would need to be to shift a 1 bit in that position.
     auto MaskTrailingZeros =
         MIB.buildConstant(SwitchTy, llvm::countr_zero(B.Mask));
-    Cmp = MIB.buildICmp(ICmpInst::ICMP_EQ, LLT::integer(1), Reg,
-                        MaskTrailingZeros)
-              .getReg(0);
+    Cmp =
+        MIB.buildICmp(ICmpInst::ICMP_EQ, LLT::scalar(1), Reg, MaskTrailingZeros)
+            .getReg(0);
   } else if (PopCount == BB.Range) {
     // There is only one zero bit in the range, test for it directly.
     auto MaskTrailingOnes =
         MIB.buildConstant(SwitchTy, llvm::countr_one(B.Mask));
-    Cmp =
-        MIB.buildICmp(CmpInst::ICMP_NE, LLT::integer(1), Reg, MaskTrailingOnes)
-            .getReg(0);
+    Cmp = MIB.buildICmp(CmpInst::ICMP_NE, LLT::scalar(1), Reg, MaskTrailingOnes)
+              .getReg(0);
   } else {
     // Make desired shift.
     auto CstOne = MIB.buildConstant(SwitchTy, 1);
@@ -1192,7 +1210,7 @@ void IRTranslator::emitBitTestCase(SwitchCG::BitTestBlock &BB,
     auto CstMask = MIB.buildConstant(SwitchTy, B.Mask);
     auto AndOp = MIB.buildAnd(SwitchTy, SwitchVal, CstMask);
     auto CstZero = MIB.buildConstant(SwitchTy, 0);
-    Cmp = MIB.buildICmp(CmpInst::ICMP_NE, LLT::integer(1), AndOp, CstZero)
+    Cmp = MIB.buildICmp(CmpInst::ICMP_NE, LLT::scalar(1), AndOp, CstZero)
               .getReg(0);
   }
 
@@ -1577,7 +1595,7 @@ bool IRTranslator::translateBitCast(const User &U,
 
 bool IRTranslator::translateCast(unsigned Opcode, const User &U,
                                  MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U) && !targetSupportsBF16Type(MF))
     return false;
 
   uint32_t Flags = 0;
@@ -1739,7 +1757,7 @@ bool IRTranslator::translateMemFunc(const CallInst &CI,
     SrcRegs.push_back(SrcReg);
   }
 
-  LLT SizeTy = LLT::integer(MinPtrSize);
+  LLT SizeTy = LLT::scalar(MinPtrSize);
 
   // The size operand should be the minimum of the pointer sizes.
   Register &SizeOpReg = SrcRegs[SrcRegs.size() - 1];
@@ -2074,66 +2092,23 @@ bool IRTranslator::translateSimpleIntrinsic(const CallInst &CI,
   return true;
 }
 
-// TODO: Include ConstainedOps.def when all strict instructions are defined.
-static unsigned getConstrainedOpcode(Intrinsic::ID ID) {
+/// Map new-form FP intrinsic (llvm.fadd etc.) to the corresponding G_STRICT_*
+/// opcode, or return 0 if no strict form is available.
+static unsigned getBundledFPStrictGOpcode(Intrinsic::ID ID) {
   switch (ID) {
-  case Intrinsic::experimental_constrained_fadd:
-    return TargetOpcode::G_STRICT_FADD;
-  case Intrinsic::experimental_constrained_fsub:
-    return TargetOpcode::G_STRICT_FSUB;
-  case Intrinsic::experimental_constrained_fmul:
-    return TargetOpcode::G_STRICT_FMUL;
-  case Intrinsic::experimental_constrained_fdiv:
-    return TargetOpcode::G_STRICT_FDIV;
-  case Intrinsic::experimental_constrained_frem:
-    return TargetOpcode::G_STRICT_FREM;
-  case Intrinsic::experimental_constrained_fma:
-    return TargetOpcode::G_STRICT_FMA;
-  case Intrinsic::experimental_constrained_sqrt:
-    return TargetOpcode::G_STRICT_FSQRT;
-  case Intrinsic::experimental_constrained_ldexp:
-    return TargetOpcode::G_STRICT_FLDEXP;
-  case Intrinsic::experimental_constrained_fcmp:
-    return TargetOpcode::G_STRICT_FCMP;
-  case Intrinsic::experimental_constrained_fcmps:
-    return TargetOpcode::G_STRICT_FCMPS;
-  default:
-    return 0;
+  case Intrinsic::fadd: return TargetOpcode::G_STRICT_FADD;
+  case Intrinsic::fsub: return TargetOpcode::G_STRICT_FSUB;
+  case Intrinsic::fmul: return TargetOpcode::G_STRICT_FMUL;
+  case Intrinsic::fdiv: return TargetOpcode::G_STRICT_FDIV;
+  case Intrinsic::frem: return TargetOpcode::G_STRICT_FREM;
+  case Intrinsic::fma:  return TargetOpcode::G_STRICT_FMA;
+  case Intrinsic::sqrt: return TargetOpcode::G_STRICT_FSQRT;
+  case Intrinsic::ldexp: return TargetOpcode::G_STRICT_FLDEXP;
+  default: return 0;
   }
 }
 
-bool IRTranslator::translateConstrainedFPIntrinsic(
-  const ConstrainedFPIntrinsic &FPI, MachineIRBuilder &MIRBuilder) {
-  fp::ExceptionBehavior EB = *FPI.getExceptionBehavior();
-
-  unsigned Opcode = getConstrainedOpcode(FPI.getIntrinsicID());
-  if (!Opcode)
-    return false;
-
-  uint32_t Flags = MachineInstr::copyFlagsFromInstruction(FPI);
-  if (EB == fp::ExceptionBehavior::ebIgnore)
-    Flags |= MachineInstr::NoFPExcept;
-
-  if (Opcode == TargetOpcode::G_STRICT_FCMP ||
-      Opcode == TargetOpcode::G_STRICT_FCMPS) {
-    auto *FPCmp = cast<ConstrainedFPCmpIntrinsic>(&FPI);
-    Register Operand0 = getOrCreateVReg(*FPCmp->getArgOperand(0));
-    Register Operand1 = getOrCreateVReg(*FPCmp->getArgOperand(1));
-    Register Result = getOrCreateVReg(FPI);
-    MIRBuilder.buildInstr(Opcode, {Result}, {}, Flags)
-        .addPredicate(FPCmp->getPredicate())
-        .addUse(Operand0)
-        .addUse(Operand1);
-    return true;
-  }
-
-  SmallVector<llvm::SrcOp, 4> VRegs;
-  for (unsigned I = 0, E = FPI.getNonMetadataArgCount(); I != E; ++I)
-    VRegs.push_back(getOrCreateVReg(*FPI.getArgOperand(I)));
-
-  MIRBuilder.buildInstr(Opcode, {getOrCreateVReg(FPI)}, VRegs, Flags);
-  return true;
-}
+// TODO: Include ConstainedOps.def when all strict instructions are defined.
 
 std::optional<MCRegister> IRTranslator::getArgPhysReg(Argument &Arg) {
   auto VRegs = getOrCreateVRegs(Arg);
@@ -2225,6 +2200,98 @@ bool IRTranslator::translateKnownIntrinsic(const CallInst &CI, Intrinsic::ID ID,
   if (translateSimpleIntrinsic(CI, ID, MIRBuilder))
     return true;
 
+  // Redirect new-form FP intrinsics with non-default bundles to G_STRICT_*.
+  {
+    fp::ExceptionBehavior EB = CI.getExceptionBehavior();
+    RoundingMode RM = CI.getRoundingMode();
+    if (EB != fp::ebStrict || RM != RoundingMode::Dynamic) {
+      if (unsigned StrictOp = getBundledFPStrictGOpcode(ID)) {
+        uint32_t Flags = MachineInstr::copyFlagsFromInstruction(CI);
+        if (EB == fp::ebIgnore)
+          Flags |= MachineInstr::NoFPExcept;
+        SmallVector<SrcOp, 4> VRegs;
+        for (const auto &Arg : CI.args()) {
+          // Skip metadata args (e.g., predicates passed as MetadataAsValue).
+          if (!isa<MetadataAsValue>(Arg.get()))
+            VRegs.push_back(getOrCreateVReg(*Arg.get()));
+        }
+        MIRBuilder.buildInstr(StrictOp, {getOrCreateVReg(CI)}, VRegs, Flags);
+        return true;
+      }
+    }
+  }
+
+  // llvm.fcmps is a signaling FP comparison — inherently strict regardless of
+  // explicit bundles.  llvm.fcmp with an explicit fp.except bundle also needs
+  // the strict path.
+  if (ID == Intrinsic::fcmps ||
+      (ID == Intrinsic::fcmp &&
+       CI.getOperandBundle(LLVMContext::OB_fp_except).has_value())) {
+    bool IsSignaling = (ID == Intrinsic::fcmps);
+    auto *MD = cast<MetadataAsValue>(CI.getArgOperand(2))->getMetadata();
+    FCmpInst::Predicate Pred =
+        StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
+            .Case("oeq", FCmpInst::FCMP_OEQ)
+            .Case("ogt", FCmpInst::FCMP_OGT)
+            .Case("oge", FCmpInst::FCMP_OGE)
+            .Case("olt", FCmpInst::FCMP_OLT)
+            .Case("ole", FCmpInst::FCMP_OLE)
+            .Case("one", FCmpInst::FCMP_ONE)
+            .Case("ord", FCmpInst::FCMP_ORD)
+            .Case("uno", FCmpInst::FCMP_UNO)
+            .Case("ueq", FCmpInst::FCMP_UEQ)
+            .Case("ugt", FCmpInst::FCMP_UGT)
+            .Case("uge", FCmpInst::FCMP_UGE)
+            .Case("ult", FCmpInst::FCMP_ULT)
+            .Case("ule", FCmpInst::FCMP_ULE)
+            .Case("une", FCmpInst::FCMP_UNE)
+            .Default(FCmpInst::BAD_FCMP_PREDICATE);
+    assert(Pred != FCmpInst::BAD_FCMP_PREDICATE &&
+           "invalid predicate in llvm.fcmp/fcmps");
+    uint32_t Flags = MachineInstr::copyFlagsFromInstruction(CI);
+    fp::ExceptionBehavior EB = CI.getExceptionBehavior();
+    if (EB == fp::ebIgnore)
+      Flags |= MachineInstr::NoFPExcept;
+    Register Op0 = getOrCreateVReg(*CI.getArgOperand(0));
+    Register Op1 = getOrCreateVReg(*CI.getArgOperand(1));
+    Register Res = getOrCreateVReg(CI);
+    unsigned Opcode = IsSignaling ? TargetOpcode::G_STRICT_FCMPS
+                                  : TargetOpcode::G_STRICT_FCMP;
+    MIRBuilder.buildInstr(Opcode, {Res}, {Pred, Op0, Op1}, Flags);
+    return true;
+  }
+
+  // llvm.fcmp without any fp.except bundle = a non-strict FP comparison.
+  // Translate it the same way as a plain `fcmp` instruction.
+  if (ID == Intrinsic::fcmp) {
+    auto *MD = cast<MetadataAsValue>(CI.getArgOperand(2))->getMetadata();
+    FCmpInst::Predicate Pred =
+        StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
+            .Case("oeq", FCmpInst::FCMP_OEQ)
+            .Case("ogt", FCmpInst::FCMP_OGT)
+            .Case("oge", FCmpInst::FCMP_OGE)
+            .Case("olt", FCmpInst::FCMP_OLT)
+            .Case("ole", FCmpInst::FCMP_OLE)
+            .Case("one", FCmpInst::FCMP_ONE)
+            .Case("ord", FCmpInst::FCMP_ORD)
+            .Case("uno", FCmpInst::FCMP_UNO)
+            .Case("ueq", FCmpInst::FCMP_UEQ)
+            .Case("ugt", FCmpInst::FCMP_UGT)
+            .Case("uge", FCmpInst::FCMP_UGE)
+            .Case("ult", FCmpInst::FCMP_ULT)
+            .Case("ule", FCmpInst::FCMP_ULE)
+            .Case("une", FCmpInst::FCMP_UNE)
+            .Default(FCmpInst::BAD_FCMP_PREDICATE);
+    assert(Pred != FCmpInst::BAD_FCMP_PREDICATE &&
+           "invalid predicate in llvm.fcmp");
+    uint32_t Flags = MachineInstr::copyFlagsFromInstruction(CI);
+    Register Op0 = getOrCreateVReg(*CI.getArgOperand(0));
+    Register Op1 = getOrCreateVReg(*CI.getArgOperand(1));
+    Register Res = getOrCreateVReg(CI);
+    MIRBuilder.buildFCmp(Pred, Res, Op0, Op1, Flags);
+    return true;
+  }
+
   switch (ID) {
   default:
     break;
@@ -2687,11 +2754,8 @@ bool IRTranslator::translateKnownIntrinsic(const CallInst &CI, Intrinsic::ID ID,
     return translateVectorDeinterleave2Intrinsic(CI, MIRBuilder);
   }
 
-#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)  \
-  case Intrinsic::INTRINSIC:
-#include "llvm/IR/ConstrainedOps.def"
-    return translateConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(CI),
-                                           MIRBuilder);
+    // experimental_constrained_* intrinsics are removed; auto-upgrade replaces
+    // them before IRTranslator runs.
   case Intrinsic::experimental_convergence_anchor:
   case Intrinsic::experimental_convergence_entry:
   case Intrinsic::experimental_convergence_loop:
@@ -2709,7 +2773,7 @@ bool IRTranslator::translateKnownIntrinsic(const CallInst &CI, Intrinsic::ID ID,
 
 bool IRTranslator::translateInlineAsm(const CallBase &CB,
                                       MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(CB))
+  if (containsBF16Type(CB) && !targetSupportsBF16Type(MF))
     return false;
 
   const InlineAsmLowering *ALI = MF->getSubtarget().getInlineAsmLowering();
@@ -2800,7 +2864,7 @@ bool IRTranslator::translateCallBase(const CallBase &CB,
 }
 
 bool IRTranslator::translateCall(const User &U, MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U) && !targetSupportsBF16Type(MF))
     return false;
 
   const CallInst &CI = cast<CallInst>(U);
@@ -3071,7 +3135,7 @@ bool IRTranslator::translateInvoke(const User &U,
 /// intrinsics such as amdgcn.kill.
 bool IRTranslator::translateCallBr(const User &U,
                                    MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U))
     return false; // see translateCall
 
   const CallBrInst &I = cast<CallBrInst>(U);
@@ -3288,8 +3352,7 @@ bool IRTranslator::translateInsertElement(const User &U,
   if (!Idx)
     Idx = getOrCreateVReg(*U.getOperand(2));
   if (MRI->getType(Idx).getSizeInBits() != PreferredVecIdxWidth) {
-    const LLT VecIdxTy =
-        MRI->getType(Idx).changeElementSize(PreferredVecIdxWidth);
+    const LLT VecIdxTy = LLT::scalar(PreferredVecIdxWidth);
     Idx = MIRBuilder.buildZExtOrTrunc(VecIdxTy, Idx).getReg(0);
   }
   MIRBuilder.buildInsertVectorElement(Res, Val, Elt, Idx);
@@ -3370,8 +3433,7 @@ bool IRTranslator::translateExtractElement(const User &U,
   if (!Idx)
     Idx = getOrCreateVReg(*U.getOperand(1));
   if (MRI->getType(Idx).getSizeInBits() != PreferredVecIdxWidth) {
-    const LLT VecIdxTy =
-        MRI->getType(Idx).changeElementSize(PreferredVecIdxWidth);
+    const LLT VecIdxTy = LLT::scalar(PreferredVecIdxWidth);
     Idx = MIRBuilder.buildZExtOrTrunc(VecIdxTy, Idx).getReg(0);
   }
   MIRBuilder.buildExtractVectorElement(Res, Val, Idx);
@@ -3539,7 +3601,7 @@ bool IRTranslator::translateAtomicCmpXchg(const User &U,
 
 bool IRTranslator::translateAtomicRMW(const User &U,
                                       MachineIRBuilder &MIRBuilder) {
-  if (!mayTranslateUserTypes(U))
+  if (containsBF16Type(U) && !targetSupportsBF16Type(MF))
     return false;
 
   const AtomicRMWInst &I = cast<AtomicRMWInst>(U);
@@ -3886,22 +3948,6 @@ bool IRTranslator::translate(const Constant &C, Register Reg) {
   return true;
 }
 
-bool IRTranslator::mayTranslateUserTypes(const User &U) const {
-  const TargetMachine &TM = TLI->getTargetMachine();
-  if (LLT::getUseExtended())
-    return true;
-
-  // BF16 cannot currently be represented by default LLT. To avoid miscompiles
-  // we prevent any instructions using them by default in all targets that do
-  // not explicitly enable it via LLT::setUseExtended(true).
-  // SPIRV target is exception.
-  return TM.getTargetTriple().isSPIRV() ||
-         (!U.getType()->getScalarType()->isBFloatTy() &&
-          !any_of(U.operands(), [](Value *V) {
-            return V->getType()->getScalarType()->isBFloatTy();
-          }));
-}
-
 bool IRTranslator::finalizeBasicBlock(const BasicBlock &BB,
                                       MachineBasicBlock &MBB) {
   for (auto &BTB : SL->BitTestCases) {
@@ -4102,7 +4148,7 @@ bool IRTranslator::emitSPDescriptorParent(StackProtectorDescriptor &SPD,
 
   // Perform the comparison.
   auto Cmp =
-      CurBuilder->buildICmp(CmpInst::ICMP_NE, LLT::integer(1), Guard, GuardVal);
+      CurBuilder->buildICmp(CmpInst::ICMP_NE, LLT::scalar(1), Guard, GuardVal);
   // If the guard/stackslot do not equal, branch to failure MBB.
   CurBuilder->buildBrCond(Cmp, *SPD.getFailureMBB());
   // Otherwise branch to success MBB.
@@ -4299,7 +4345,7 @@ bool IRTranslator::runOnMachineFunction(MachineFunction &CurMF) {
     ArrayRef<Register> VRegs = getOrCreateVRegs(Arg);
     VRegArgs.push_back(VRegs);
 
-    if (CLI->supportSwiftError() && Arg.hasSwiftErrorAttr()) {
+    if (Arg.hasSwiftErrorAttr()) {
       assert(VRegs.size() == 1 && "Too many vregs for Swift error");
       SwiftError.setCurrentVReg(EntryBB, SwiftError.getFunctionArg(), VRegs[0]);
     }
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index 25f4f75eaedea..2ddeaec579b9d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -2482,11 +2482,18 @@ SDValue DAGTypeLegalizer::ExpandFloatOp_FP_TO_XINT(SDNode *N) {
   TargetLowering::MakeLibCallOptions CallOptions;
   std::pair<SDValue, SDValue> Tmp =
       TLI.makeLibCall(DAG, LC, NVT, Op, CallOptions, dl, Chain);
+
+  // Truncate the result if the libcall returns a larger type (e.g. ppc_fp128
+  // to i1 uses a __gcc_qtou libcall that returns i32).
+  SDValue Res = Tmp.first;
+  if (NVT != RVT)
+    Res = DAG.getNode(ISD::TRUNCATE, dl, RVT, Res);
+
   if (!IsStrict)
-    return Tmp.first;
+    return Res;
 
   ReplaceValueWith(SDValue(N, 1), Tmp.second);
-  ReplaceValueWith(SDValue(N, 0), Tmp.first);
+  ReplaceValueWith(SDValue(N, 0), Res);
   return SDValue();
 }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index a96c77bc6a4e9..d45270f6ea77d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -227,9 +227,54 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
     R = ScalarizeVecRes_TernaryOp(N);
     break;
 
-#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
-  case ISD::STRICT_##DAGN:
-#include "llvm/IR/ConstrainedOps.def"
+  case ISD::STRICT_FADD:
+  case ISD::STRICT_FSUB:
+  case ISD::STRICT_FMUL:
+  case ISD::STRICT_FDIV:
+  case ISD::STRICT_FREM:
+  case ISD::STRICT_FP_EXTEND:
+  case ISD::STRICT_SINT_TO_FP:
+  case ISD::STRICT_UINT_TO_FP:
+  case ISD::STRICT_FP_TO_SINT:
+  case ISD::STRICT_FP_TO_UINT:
+  case ISD::STRICT_FP_ROUND:
+  case ISD::STRICT_FSETCC:
+  case ISD::STRICT_FSETCCS:
+  case ISD::STRICT_FACOS:
+  case ISD::STRICT_FASIN:
+  case ISD::STRICT_FATAN:
+  case ISD::STRICT_FATAN2:
+  case ISD::STRICT_FCEIL:
+  case ISD::STRICT_FCOS:
+  case ISD::STRICT_FCOSH:
+  case ISD::STRICT_FEXP:
+  case ISD::STRICT_FEXP2:
+  case ISD::STRICT_FFLOOR:
+  case ISD::STRICT_FMA:
+  case ISD::STRICT_FLOG:
+  case ISD::STRICT_FLOG10:
+  case ISD::STRICT_FLOG2:
+  case ISD::STRICT_LRINT:
+  case ISD::STRICT_LLRINT:
+  case ISD::STRICT_LROUND:
+  case ISD::STRICT_LLROUND:
+  case ISD::STRICT_FMAXNUM:
+  case ISD::STRICT_FMINNUM:
+  case ISD::STRICT_FMAXIMUM:
+  case ISD::STRICT_FMINIMUM:
+  case ISD::STRICT_FNEARBYINT:
+  case ISD::STRICT_FPOW:
+  case ISD::STRICT_FPOWI:
+  case ISD::STRICT_FLDEXP:
+  case ISD::STRICT_FRINT:
+  case ISD::STRICT_FROUND:
+  case ISD::STRICT_FROUNDEVEN:
+  case ISD::STRICT_FSIN:
+  case ISD::STRICT_FSINH:
+  case ISD::STRICT_FSQRT:
+  case ISD::STRICT_FTAN:
+  case ISD::STRICT_FTANH:
+  case ISD::STRICT_FTRUNC:
     R = ScalarizeVecRes_StrictFPOp(N);
     break;
 
@@ -1571,9 +1616,54 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
     SplitVecRes_CMP(N, Lo, Hi);
     break;
 
-#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
-  case ISD::STRICT_##DAGN:
-#include "llvm/IR/ConstrainedOps.def"
+  case ISD::STRICT_FADD:
+  case ISD::STRICT_FSUB:
+  case ISD::STRICT_FMUL:
+  case ISD::STRICT_FDIV:
+  case ISD::STRICT_FREM:
+  case ISD::STRICT_FP_EXTEND:
+  case ISD::STRICT_SINT_TO_FP:
+  case ISD::STRICT_UINT_TO_FP:
+  case ISD::STRICT_FP_TO_SINT:
+  case ISD::STRICT_FP_TO_UINT:
+  case ISD::STRICT_FP_ROUND:
+  case ISD::STRICT_FSETCC:
+  case ISD::STRICT_FSETCCS:
+  case ISD::STRICT_FACOS:
+  case ISD::STRICT_FASIN:
+  case ISD::STRICT_FATAN:
+  case ISD::STRICT_FATAN2:
+  case ISD::STRICT_FCEIL:
+  case ISD::STRICT_FCOS:
+  case ISD::STRICT_FCOSH:
+  case ISD::STRICT_FEXP:
+  case ISD::STRICT_FEXP2:
+  case ISD::STRICT_FFLOOR:
+  case ISD::STRICT_FMA:
+  case ISD::STRICT_FLOG:
+  case ISD::STRICT_FLOG10:
+  case ISD::STRICT_FLOG2:
+  case ISD::STRICT_LRINT:
+  case ISD::STRICT_LLRINT:
+  case ISD::STRICT_LROUND:
+  case ISD::STRICT_LLROUND:
+  case ISD::STRICT_FMAXNUM:
+  case ISD::STRICT_FMINNUM:
+  case ISD::STRICT_FMAXIMUM:
+  case ISD::STRICT_FMINIMUM:
+  case ISD::STRICT_FNEARBYINT:
+  case ISD::STRICT_FPOW:
+  case ISD::STRICT_FPOWI:
+  case ISD::STRICT_FLDEXP:
+  case ISD::STRICT_FRINT:
+  case ISD::STRICT_FROUND:
+  case ISD::STRICT_FROUNDEVEN:
+  case ISD::STRICT_FSIN:
+  case ISD::STRICT_FSINH:
+  case ISD::STRICT_FSQRT:
+  case ISD::STRICT_FTAN:
+  case ISD::STRICT_FTANH:
+  case ISD::STRICT_FTRUNC:
     SplitVecRes_StrictFPOp(N, Lo, Hi);
     break;
 
@@ -5238,9 +5328,54 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
     Res = WidenVecRes_BinaryWithExtraScalarOp(N);
     break;
 
-#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
-  case ISD::STRICT_##DAGN:
-#include "llvm/IR/ConstrainedOps.def"
+  case ISD::STRICT_FADD:
+  case ISD::STRICT_FSUB:
+  case ISD::STRICT_FMUL:
+  case ISD::STRICT_FDIV:
+  case ISD::STRICT_FREM:
+  case ISD::STRICT_FP_EXTEND:
+  case ISD::STRICT_SINT_TO_FP:
+  case ISD::STRICT_UINT_TO_FP:
+  case ISD::STRICT_FP_TO_SINT:
+  case ISD::STRICT_FP_TO_UINT:
+  case ISD::STRICT_FP_ROUND:
+  case ISD::STRICT_FSETCC:
+  case ISD::STRICT_FSETCCS:
+  case ISD::STRICT_FACOS:
+  case ISD::STRICT_FASIN:
+  case ISD::STRICT_FATAN:
+  case ISD::STRICT_FATAN2:
+  case ISD::STRICT_FCEIL:
+  case ISD::STRICT_FCOS:
+  case ISD::STRICT_FCOSH:
+  case ISD::STRICT_FEXP:
+  case ISD::STRICT_FEXP2:
+  case ISD::STRICT_FFLOOR:
+  case ISD::STRICT_FMA:
+  case ISD::STRICT_FLOG:
+  case ISD::STRICT_FLOG10:
+  case ISD::STRICT_FLOG2:
+  case ISD::STRICT_LRINT:
+  case ISD::STRICT_LLRINT:
+  case ISD::STRICT_LROUND:
+  case ISD::STRICT_LLROUND:
+  case ISD::STRICT_FMAXNUM:
+  case ISD::STRICT_FMINNUM:
+  case ISD::STRICT_FMAXIMUM:
+  case ISD::STRICT_FMINIMUM:
+  case ISD::STRICT_FNEARBYINT:
+  case ISD::STRICT_FPOW:
+  case ISD::STRICT_FPOWI:
+  case ISD::STRICT_FLDEXP:
+  case ISD::STRICT_FRINT:
+  case ISD::STRICT_FROUND:
+  case ISD::STRICT_FROUNDEVEN:
+  case ISD::STRICT_FSIN:
+  case ISD::STRICT_FSINH:
+  case ISD::STRICT_FSQRT:
+  case ISD::STRICT_FTAN:
+  case ISD::STRICT_FTANH:
+  case ISD::STRICT_FTRUNC:
     Res = WidenVecRes_StrictFP(N);
     break;
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 1f3b099c9c577..4f304e095c3f6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -19,6 +19,7 @@
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/BranchProbabilityInfo.h"
@@ -3676,6 +3677,35 @@ void SelectionDAGBuilder::visitUnreachable(const UnreachableInst &I) {
   DAG.setRoot(DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, DAG.getRoot()));
 }
 
+/// Set F32InputDenormMode/F32OutputDenormMode on \p Flags from the
+/// function-level f32 denorm attribute (denormal-fp-math-f32 or
+/// denormal-fp-math). Used for plain IR ops that cannot carry operand bundles.
+/// Return true if \p CB has a non-default fp.control or fp.except bundle,
+/// i.e., explicit bundles are present and specify something other than the
+/// constrained-FP default (round.dynamic + fpexcept.strict).
+static bool hasNonDefaultFPBundles(const CallBase &CB) {
+  // If no fp.control or fp.except bundles are present at all, the call uses
+  // the default FP environment — do not treat it as non-default.
+  bool HasControl =
+      CB.getOperandBundle(LLVMContext::OB_fp_control).has_value();
+  bool HasExcept =
+      CB.getOperandBundle(LLVMContext::OB_fp_except).has_value();
+  if (!HasControl && !HasExcept)
+    return false;
+  // Bundles are present; check whether they request non-default behavior.
+  // STRICT_* nodes are needed when:
+  //   1. The rounding mode is not the default (Dynamic), OR
+  //   2. Any fp.except bundle is explicitly present.
+  // Even fp.except { "ignore" } must go through STRICT_ (with NoFPExcept=true)
+  // so that the nofpexcept flag is correctly propagated to machine instructions
+  // via the existing chain-based mechanism.
+  if (HasControl && CB.getRoundingMode() != RoundingMode::Dynamic)
+    return true;
+  if (HasExcept)
+    return true;
+  return false;
+}
+
 void SelectionDAGBuilder::visitUnary(const User &I, unsigned Opcode) {
   SDNodeFlags Flags;
   if (auto *FPOp = dyn_cast<FPMathOperator>(&I))
@@ -6635,6 +6665,27 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
   if (auto *FPOp = dyn_cast<FPMathOperator>(&I))
     Flags.copyFMF(*FPOp);
 
+  // Route new-form FP intrinsics with non-default bundles to STRICT_* SDNodes.
+  if (hasNonDefaultFPBundles(I)) {
+    switch (Intrinsic) {
+    case Intrinsic::fadd: case Intrinsic::fsub:  case Intrinsic::fmul:
+    case Intrinsic::fdiv: case Intrinsic::frem:  case Intrinsic::fma:
+    case Intrinsic::sqrt: case Intrinsic::fptoui: case Intrinsic::fptosi:
+    case Intrinsic::uitofp: case Intrinsic::sitofp:
+    case Intrinsic::fptrunc: case Intrinsic::fpext: case Intrinsic::fcmp:
+      visitBundledFPIntrinsicAsStrict(cast<IntrinsicInst>(I));
+      return;
+    default:
+      break;
+    }
+  }
+  // llvm.fcmps (signaling compare) always uses the strict lowering path since
+  // it inherently raises FP exceptions for any NaN operand.
+  if (Intrinsic == Intrinsic::fcmps) {
+    visitBundledFPIntrinsicAsStrict(cast<IntrinsicInst>(I));
+    return;
+  }
+
   switch (Intrinsic) {
   default:
     // By default, turn this into a target intrinsic node.
@@ -7090,17 +7141,114 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
                              getValue(I.getArgOperand(0)), Flags));
     return;
   }
+  case Intrinsic::fadd:
+    setValue(&I, DAG.getNode(ISD::FADD, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1)), Flags));
+    return;
+  case Intrinsic::fsub:
+    setValue(&I, DAG.getNode(ISD::FSUB, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1)), Flags));
+    return;
+  case Intrinsic::fmul:
+    setValue(&I, DAG.getNode(ISD::FMUL, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1)), Flags));
+    return;
+  case Intrinsic::fneg:
+    setValue(&I, DAG.getNode(ISD::FNEG, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)), Flags));
+    return;
+  case Intrinsic::fdiv:
+    setValue(&I, DAG.getNode(ISD::FDIV, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1)), Flags));
+    return;
+  case Intrinsic::frem:
+    setValue(&I, DAG.getNode(ISD::FREM, sdl,
+                             getValue(I.getArgOperand(0)).getValueType(),
+                             getValue(I.getArgOperand(0)),
+                             getValue(I.getArgOperand(1)), Flags));
+    return;
+  case Intrinsic::fptrunc: {
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getNode(ISD::FP_ROUND, sdl, DestVT,
+                             getValue(I.getArgOperand(0)),
+                             DAG.getTargetConstant(0, sdl, MVT::i32), Flags));
+    return;
+  }
+  case Intrinsic::fpext: {
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getNode(ISD::FP_EXTEND, sdl, DestVT,
+                             getValue(I.getArgOperand(0)), Flags));
+    return;
+  }
+  case Intrinsic::uitofp: {
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getNode(ISD::UINT_TO_FP, sdl, DestVT,
+                             getValue(I.getArgOperand(0)), Flags));
+    return;
+  }
+  case Intrinsic::sitofp: {
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getNode(ISD::SINT_TO_FP, sdl, DestVT,
+                             getValue(I.getArgOperand(0)), Flags));
+    return;
+  }
+  case Intrinsic::fptosi: {
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getNode(ISD::FP_TO_SINT, sdl, DestVT,
+                             getValue(I.getArgOperand(0))));
+    return;
+  }
+  case Intrinsic::fptoui: {
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getNode(ISD::FP_TO_UINT, sdl, DestVT,
+                             getValue(I.getArgOperand(0))));
+    return;
+  }
+  case Intrinsic::fcmp: {
+    Metadata *MD =
+        cast<MetadataAsValue>(I.getArgOperand(2))->getMetadata();
+    FCmpInst::Predicate Pred =
+        StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
+            .Case("oeq", FCmpInst::FCMP_OEQ)
+            .Case("ogt", FCmpInst::FCMP_OGT)
+            .Case("oge", FCmpInst::FCMP_OGE)
+            .Case("olt", FCmpInst::FCMP_OLT)
+            .Case("ole", FCmpInst::FCMP_OLE)
+            .Case("one", FCmpInst::FCMP_ONE)
+            .Case("ord", FCmpInst::FCMP_ORD)
+            .Case("uno", FCmpInst::FCMP_UNO)
+            .Case("ueq", FCmpInst::FCMP_UEQ)
+            .Case("ugt", FCmpInst::FCMP_UGT)
+            .Case("uge", FCmpInst::FCMP_UGE)
+            .Case("ult", FCmpInst::FCMP_ULT)
+            .Case("ule", FCmpInst::FCMP_ULE)
+            .Case("une", FCmpInst::FCMP_UNE)
+            .Default(FCmpInst::BAD_FCMP_PREDICATE);
+    assert(Pred != FCmpInst::BAD_FCMP_PREDICATE &&
+           "invalid predicate in llvm.fcmp");
+    ISD::CondCode Condition = getFCmpCondCode(Pred);
+    EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+    setValue(&I, DAG.getSetCC(sdl, DestVT,
+                              getValue(I.getArgOperand(0)),
+                              getValue(I.getArgOperand(1)),
+                              Condition));
+    return;
+  }
   case Intrinsic::fma:
     setValue(&I, DAG.getNode(
                      ISD::FMA, sdl, getValue(I.getArgOperand(0)).getValueType(),
                      getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)),
                      getValue(I.getArgOperand(2)), Flags));
     return;
-#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)                         \
-  case Intrinsic::INTRINSIC:
-#include "llvm/IR/ConstrainedOps.def"
-    visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));
-    return;
 #define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID:
 #include "llvm/IR/VPIntrinsics.def"
     visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));
@@ -7113,17 +7261,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
         convertStrToRoundingMode(cast<MDString>(MD)->getString());
 
     EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
-
-    // Propagate fast-math-flags from IR to node(s).
-    SDNodeFlags Flags;
-    Flags.copyFMF(*cast<FPMathOperator>(&I));
-    SelectionDAG::FlagInserter FlagsInserter(DAG, Flags);
-
-    SDValue Result;
-    Result = DAG.getNode(
-        ISD::FPTRUNC_ROUND, sdl, VT, getValue(I.getArgOperand(0)),
-        DAG.getTargetConstant((int)*RoundMode, sdl, MVT::i32));
-    setValue(&I, Result);
+    SDNodeFlags TruncFlags;
+    TruncFlags.copyFMF(*cast<FPMathOperator>(&I));
+    SelectionDAG::FlagInserter FlagsInserter(DAG, TruncFlags);
+    setValue(&I, DAG.getNode(ISD::FPTRUNC_ROUND, sdl, VT,
+                             getValue(I.getArgOperand(0)),
+                             DAG.getTargetConstant((int)*RoundMode, sdl,
+                                                   MVT::i32)));
 
     return;
   }
@@ -8538,69 +8682,88 @@ void SelectionDAGBuilder::pushFPOpOutChain(SDValue Result,
   }
 }
 
-void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
-    const ConstrainedFPIntrinsic &FPI) {
+void SelectionDAGBuilder::visitBundledFPIntrinsicAsStrict(
+    const IntrinsicInst &I) {
+  // Lower a new-style FP intrinsic (llvm.fadd etc.) with non-default
+  // fp.control/fp.except bundles to a STRICT_* SDNode.
+  // Mirrors visitConstrainedFPIntrinsic but reads from operand bundles.
   SDLoc sdl = getCurSDLoc();
 
-  // We do not need to serialize constrained FP intrinsics against
-  // each other or against (nonvolatile) loads, so they can be
-  // chained like loads.
-  fp::ExceptionBehavior EB = *FPI.getExceptionBehavior();
+  fp::ExceptionBehavior EB = I.getExceptionBehavior();
   SDValue Chain = getFPOperationRoot(EB);
+
+  Intrinsic::ID IID = I.getIntrinsicID();
+
+  // Collect value operands. For llvm.fcmp/fcmps the third arg is metadata.
+  unsigned NumValArgs =
+      (IID == Intrinsic::fcmp || IID == Intrinsic::fcmps) ? 2 : I.arg_size();
   SmallVector<SDValue, 4> Opers;
   Opers.push_back(Chain);
-  for (unsigned I = 0, E = FPI.getNonMetadataArgCount(); I != E; ++I)
-    Opers.push_back(getValue(FPI.getArgOperand(I)));
+  for (unsigned Idx = 0; Idx < NumValArgs; ++Idx)
+    Opers.push_back(getValue(I.getArgOperand(Idx)));
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
-  EVT VT = TLI.getValueType(DAG.getDataLayout(), FPI.getType());
+  EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   SDVTList VTs = DAG.getVTList(VT, MVT::Other);
 
   SDNodeFlags Flags;
-  if (EB == fp::ExceptionBehavior::ebIgnore)
+  if (EB == fp::ebIgnore)
     Flags.setNoFPExcept(true);
-
-  if (auto *FPOp = dyn_cast<FPMathOperator>(&FPI))
+  if (auto *FPOp = dyn_cast<FPMathOperator>(&I))
     Flags.copyFMF(*FPOp);
 
   unsigned Opcode;
-  switch (FPI.getIntrinsicID()) {
-  default: llvm_unreachable("Impossible intrinsic");  // Can't reach here.
-#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
-  case Intrinsic::INTRINSIC:                                                   \
-    Opcode = ISD::STRICT_##DAGN;                                               \
-    break;
-#include "llvm/IR/ConstrainedOps.def"
-  case Intrinsic::experimental_constrained_fmuladd: {
-    Opcode = ISD::STRICT_FMA;
-    // Break fmuladd into fmul and fadd.
-    if (TM.Options.AllowFPOpFusion == FPOpFusion::Strict ||
-        !TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {
-      Opers.pop_back();
-      SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers, Flags);
-      pushFPOpOutChain(Mul, EB);
-      Opcode = ISD::STRICT_FADD;
-      Opers.clear();
-      Opers.push_back(Mul.getValue(1));
-      Opers.push_back(Mul.getValue(0));
-      Opers.push_back(getValue(FPI.getArgOperand(2)));
-    }
-    break;
-  }
+  switch (IID) {
+  case Intrinsic::fadd:    Opcode = ISD::STRICT_FADD;       break;
+  case Intrinsic::fsub:    Opcode = ISD::STRICT_FSUB;       break;
+  case Intrinsic::fmul:    Opcode = ISD::STRICT_FMUL;       break;
+  case Intrinsic::fdiv:    Opcode = ISD::STRICT_FDIV;       break;
+  case Intrinsic::frem:    Opcode = ISD::STRICT_FREM;       break;
+  case Intrinsic::fma:     Opcode = ISD::STRICT_FMA;        break;
+  case Intrinsic::sqrt:    Opcode = ISD::STRICT_FSQRT;      break;
+  case Intrinsic::fptoui:  Opcode = ISD::STRICT_FP_TO_UINT; break;
+  case Intrinsic::fptosi:  Opcode = ISD::STRICT_FP_TO_SINT; break;
+  case Intrinsic::uitofp:  Opcode = ISD::STRICT_UINT_TO_FP; break;
+  case Intrinsic::sitofp:  Opcode = ISD::STRICT_SINT_TO_FP; break;
+  case Intrinsic::fptrunc: Opcode = ISD::STRICT_FP_ROUND;   break;
+  case Intrinsic::fpext:   Opcode = ISD::STRICT_FP_EXTEND;  break;
+  case Intrinsic::fcmp:    Opcode = ISD::STRICT_FSETCC;     break;
+  case Intrinsic::fcmps:   Opcode = ISD::STRICT_FSETCCS;    break;
+  default:
+    llvm_unreachable("Unhandled FP intrinsic in visitBundledFPIntrinsicAsStrict");
   }
 
-  // A few strict DAG nodes carry additional operands that are not
-  // set up by the default code above.
+  // Additional operands for specific opcodes.
   switch (Opcode) {
-  default: break;
+  default:
+    break;
   case ISD::STRICT_FP_ROUND:
     Opers.push_back(
         DAG.getTargetConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout())));
     break;
   case ISD::STRICT_FSETCC:
   case ISD::STRICT_FSETCCS: {
-    auto *FPCmp = dyn_cast<ConstrainedFPCmpIntrinsic>(&FPI);
-    ISD::CondCode Condition = getFCmpCondCode(FPCmp->getPredicate());
+    auto *MD = cast<MetadataAsValue>(I.getArgOperand(2))->getMetadata();
+    FCmpInst::Predicate Pred =
+        StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
+            .Case("oeq", FCmpInst::FCMP_OEQ)
+            .Case("ogt", FCmpInst::FCMP_OGT)
+            .Case("oge", FCmpInst::FCMP_OGE)
+            .Case("olt", FCmpInst::FCMP_OLT)
+            .Case("ole", FCmpInst::FCMP_OLE)
+            .Case("one", FCmpInst::FCMP_ONE)
+            .Case("ord", FCmpInst::FCMP_ORD)
+            .Case("uno", FCmpInst::FCMP_UNO)
+            .Case("ueq", FCmpInst::FCMP_UEQ)
+            .Case("ugt", FCmpInst::FCMP_UGT)
+            .Case("uge", FCmpInst::FCMP_UGE)
+            .Case("ult", FCmpInst::FCMP_ULT)
+            .Case("ule", FCmpInst::FCMP_ULE)
+            .Case("une", FCmpInst::FCMP_UNE)
+            .Default(FCmpInst::BAD_FCMP_PREDICATE);
+    assert(Pred != FCmpInst::BAD_FCMP_PREDICATE &&
+           "invalid predicate in llvm.fcmp/fcmps bundle-strict lowering");
+    ISD::CondCode Condition = getFCmpCondCode(Pred);
     if (DAG.isKnownNeverNaN(Opers[1]) && DAG.isKnownNeverNaN(Opers[2]))
       Condition = getFCmpCodeWithoutNaN(Condition);
     Opers.push_back(DAG.getCondCode(Condition));
@@ -8610,11 +8773,10 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
 
   SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers, Flags);
   pushFPOpOutChain(Result, EB);
-
-  SDValue FPResult = Result.getValue(0);
-  setValue(&FPI, FPResult);
+  setValue(&I, Result.getValue(0));
 }
 
+
 static unsigned getISDForVPIntrinsic(const VPIntrinsic &VPIntrin) {
   std::optional<unsigned> ResOPC;
   switch (VPIntrin.getIntrinsicID()) {
@@ -8882,7 +9044,6 @@ void SelectionDAGBuilder::visitVPCmp(const VPCmpIntrinsic &VPIntrin) {
 
   ISD::CondCode Condition;
   CmpInst::Predicate CondCode = VPIntrin.getPredicate();
-
   Value *Op1 = VPIntrin.getOperand(0);
   Value *Op2 = VPIntrin.getOperand(1);
   // #2 is the condition code
@@ -9616,7 +9777,6 @@ bool SelectionDAGBuilder::visitUnaryFloatCall(const CallInst &I,
 
   SDNodeFlags Flags;
   Flags.copyFMF(cast<FPMathOperator>(I));
-
   SDValue Tmp = getValue(I.getArgOperand(0));
   setValue(&I,
            DAG.getNode(Opcode, getCurSDLoc(), Tmp.getValueType(), Tmp, Flags));
@@ -9638,7 +9798,6 @@ bool SelectionDAGBuilder::visitBinaryFloatCall(const CallInst &I,
 
   SDNodeFlags Flags;
   Flags.copyFMF(cast<FPMathOperator>(I));
-
   SDValue Tmp0 = getValue(I.getArgOperand(0));
   SDValue Tmp1 = getValue(I.getArgOperand(1));
   EVT VT = Tmp0.getValueType();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index bab0509dd138f..8217ad8a6cf45 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -55,7 +55,6 @@ class CondBrInst;
 class CleanupPadInst;
 class CleanupReturnInst;
 class Constant;
-class ConstrainedFPIntrinsic;
 class DataLayout;
 class DIExpression;
 class DILocalVariable;
@@ -652,7 +651,7 @@ class SelectionDAGBuilder {
                                DebugLoc DbgLoc);
   void visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);
   void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
-  void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
+  void visitBundledFPIntrinsicAsStrict(const IntrinsicInst &I);
   void visitConvergenceControl(const CallInst &I, unsigned Intrinsic);
   void visitVectorHistogram(const CallInst &I, unsigned IntrinsicID);
   void visitVectorExtractLastActive(const CallInst &I, unsigned Intrinsic);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 2f1e3f2f3ff7a..191511a9f315e 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1206,10 +1206,57 @@ void TargetLoweringBase::initActions() {
                           ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
                          VT, Expand);
 
-      // Constrained floating-point operations default to expand.
-#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
-    setOperationAction(ISD::STRICT_##DAGN, VT, Expand);
-#include "llvm/IR/ConstrainedOps.def"
+            // Constrained floating-point operations default to expand.
+      setOperationAction({
+          ISD::STRICT_FADD,
+          ISD::STRICT_FSUB,
+          ISD::STRICT_FMUL,
+          ISD::STRICT_FDIV,
+          ISD::STRICT_FREM,
+          ISD::STRICT_FP_EXTEND,
+          ISD::STRICT_SINT_TO_FP,
+          ISD::STRICT_UINT_TO_FP,
+          ISD::STRICT_FP_TO_SINT,
+          ISD::STRICT_FP_TO_UINT,
+          ISD::STRICT_FP_ROUND,
+          ISD::STRICT_FSETCC,
+          ISD::STRICT_FSETCCS,
+          ISD::STRICT_FACOS,
+          ISD::STRICT_FASIN,
+          ISD::STRICT_FATAN,
+          ISD::STRICT_FATAN2,
+          ISD::STRICT_FCEIL,
+          ISD::STRICT_FCOS,
+          ISD::STRICT_FCOSH,
+          ISD::STRICT_FEXP,
+          ISD::STRICT_FEXP2,
+          ISD::STRICT_FFLOOR,
+          ISD::STRICT_FMA,
+          ISD::STRICT_FLOG,
+          ISD::STRICT_FLOG10,
+          ISD::STRICT_FLOG2,
+          ISD::STRICT_LRINT,
+          ISD::STRICT_LLRINT,
+          ISD::STRICT_LROUND,
+          ISD::STRICT_LLROUND,
+          ISD::STRICT_FMAXNUM,
+          ISD::STRICT_FMINNUM,
+          ISD::STRICT_FMAXIMUM,
+          ISD::STRICT_FMINIMUM,
+          ISD::STRICT_FNEARBYINT,
+          ISD::STRICT_FPOW,
+          ISD::STRICT_FPOWI,
+          ISD::STRICT_FLDEXP,
+          ISD::STRICT_FRINT,
+          ISD::STRICT_FROUND,
+          ISD::STRICT_FROUNDEVEN,
+          ISD::STRICT_FSIN,
+          ISD::STRICT_FSINH,
+          ISD::STRICT_FSQRT,
+          ISD::STRICT_FTAN,
+          ISD::STRICT_FTANH,
+          ISD::STRICT_FTRUNC
+      }, VT, Expand);
 
     // For most targets @llvm.get.dynamic.area.offset just returns 0.
     setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, VT, Expand);
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 2728897372009..18f44fc0890a8 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1488,6 +1488,13 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn,
         return true;
       break; // No other 'experimental.vector.*'.
     }
+    if (Name.consume_front("experimental.constrained.")) {
+      // All experimental_constrained_* intrinsics are superseded by new FP
+      // intrinsics with fp.control/fp.except operand bundles. Signal
+      // UpgradeIntrinsicCall to handle the replacement (NewFn=nullptr pattern).
+      NewFn = nullptr;
+      return true;
+    }
     if (Name.consume_front("experimental.stepvector.")) {
       Intrinsic::ID ID = Intrinsic::stepvector;
       rename(F);
@@ -4956,6 +4963,250 @@ static Value *upgradeVectorSplice(CallBase *CI, IRBuilder<> &Builder) {
                                   Builder.getInt32(std::abs(OffsetVal))});
 }
 
+/// Map an experimental_constrained_* operation name to the corresponding
+/// new FP intrinsic ID (llvm.fadd, llvm.sqrt, etc.).
+static Intrinsic::ID getNewFPIntrinsicForConstrainedName(StringRef OpName) {
+  return StringSwitch<Intrinsic::ID>(OpName)
+      .Case("fadd",    Intrinsic::fadd)
+      .Case("fsub",    Intrinsic::fsub)
+      .Case("fmul",    Intrinsic::fmul)
+      .Case("fdiv",    Intrinsic::fdiv)
+      .Case("frem",    Intrinsic::frem)
+      .Case("fma",     Intrinsic::fma)
+      .Case("fmuladd", Intrinsic::fmuladd)
+      .Case("fcmp",    Intrinsic::fcmp)
+      .Case("fcmps",   Intrinsic::fcmps)
+      .Case("fptoui",  Intrinsic::fptoui)
+      .Case("fptosi",  Intrinsic::fptosi)
+      .Case("uitofp",  Intrinsic::uitofp)
+      .Case("sitofp",  Intrinsic::sitofp)
+      .Case("fptrunc", Intrinsic::fptrunc)
+      .Case("fpext",   Intrinsic::fpext)
+      .Case("sqrt",    Intrinsic::sqrt)
+      .Case("powi",    Intrinsic::powi)
+      .Case("ldexp",   Intrinsic::ldexp)
+      .Case("sin",     Intrinsic::sin)
+      .Case("asin",    Intrinsic::asin)
+      .Case("cos",     Intrinsic::cos)
+      .Case("acos",    Intrinsic::acos)
+      .Case("tan",     Intrinsic::tan)
+      .Case("atan",    Intrinsic::atan)
+      .Case("atan2",   Intrinsic::atan2)
+      .Case("sinh",    Intrinsic::sinh)
+      .Case("cosh",    Intrinsic::cosh)
+      .Case("tanh",    Intrinsic::tanh)
+      .Case("pow",     Intrinsic::pow)
+      .Case("log",     Intrinsic::log)
+      .Case("log10",   Intrinsic::log10)
+      .Case("log2",    Intrinsic::log2)
+      .Case("exp",     Intrinsic::exp)
+      .Case("exp2",    Intrinsic::exp2)
+      .Case("rint",    Intrinsic::rint)
+      .Case("nearbyint", Intrinsic::nearbyint)
+      .Case("lrint",   Intrinsic::lrint)
+      .Case("llrint",  Intrinsic::llrint)
+      .Case("ceil",    Intrinsic::ceil)
+      .Case("floor",   Intrinsic::floor)
+      .Case("round",   Intrinsic::round)
+      .Case("roundeven", Intrinsic::roundeven)
+      .Case("trunc",   Intrinsic::trunc)
+      .Case("lround",  Intrinsic::lround)
+      .Case("llround", Intrinsic::llround)
+      .Case("minnum",  Intrinsic::minnum)
+      .Case("maxnum",  Intrinsic::maxnum)
+      .Case("minimum", Intrinsic::minimum)
+      .Case("maximum", Intrinsic::maximum)
+      .Default(Intrinsic::not_intrinsic);
+}
+
+/// Upgrade a call to an experimental_constrained_* intrinsic to either:
+///   - A plain IR instruction (fadd, fcmp, fptoui, etc.) when rounding and
+///     exception behavior are at their defaults (Dynamic + ebStrict), or
+///   - The corresponding new FP intrinsic (llvm.fadd, llvm.fcmp, etc.) with
+///     fp.control and/or fp.except operand bundles when non-default.
+// Read a rounding-mode or exception-behavior string from a MetadataAsValue arg.
+static StringRef getConstrainedFPMetaStr(Value *V) {
+  if (auto *MAV = dyn_cast<MetadataAsValue>(V))
+    if (auto *MDS = dyn_cast<MDString>(MAV->getMetadata()))
+      return MDS->getString();
+  return {};
+}
+
+// Parse the FCmp predicate from the metadata arg at position 2 of a constrained
+// fcmp intrinsic.  Mirrors getFPPredicateFromMD in IntrinsicInst.cpp.
+static FCmpInst::Predicate getConstrainedFCmpPred(CallBase *CI) {
+  return StringSwitch<FCmpInst::Predicate>(
+             getConstrainedFPMetaStr(CI->getArgOperand(2)))
+      .Case("oeq", FCmpInst::FCMP_OEQ)
+      .Case("ogt", FCmpInst::FCMP_OGT)
+      .Case("oge", FCmpInst::FCMP_OGE)
+      .Case("olt", FCmpInst::FCMP_OLT)
+      .Case("ole", FCmpInst::FCMP_OLE)
+      .Case("one", FCmpInst::FCMP_ONE)
+      .Case("ord", FCmpInst::FCMP_ORD)
+      .Case("uno", FCmpInst::FCMP_UNO)
+      .Case("ueq", FCmpInst::FCMP_UEQ)
+      .Case("ugt", FCmpInst::FCMP_UGT)
+      .Case("uge", FCmpInst::FCMP_UGE)
+      .Case("ult", FCmpInst::FCMP_ULT)
+      .Case("ule", FCmpInst::FCMP_ULE)
+      .Case("une", FCmpInst::FCMP_UNE)
+      .Default(FCmpInst::BAD_FCMP_PREDICATE);
+}
+
+static Value *upgradeConstrainedFPIntrinsicCall(CallBase *CI, StringRef Name,
+                                                IRBuilder<> &Builder) {
+  LLVMContext &Ctx = CI->getContext();
+  Type *RetTy = CI->getType();
+
+  // Extract the operation name (part before first '.' in Name).
+  // e.g., "fadd.f32" -> "fadd", "sqrt.f64" -> "sqrt"
+  StringRef OpName = Name.split('.').first;
+
+  // Determine whether this intrinsic carries a rounding-mode metadata arg.
+  // Operations WITHOUT rounding mode:
+  // fpext, fptosi, fptoui, fcmp, fcmps, ceil, floor, trunc, round, roundeven,
+  // lround, llround, maxnum, minnum, maximum, minimum
+  bool HasRM = !StringSwitch<bool>(OpName)
+      .Cases({"fpext", "fptosi", "fptoui"}, true)
+      .Cases({"fcmp", "fcmps"}, true)
+      .Cases({"ceil", "floor", "trunc"}, true)
+      .Cases({"round", "roundeven"}, true)
+      .Cases({"lround", "llround"}, true)
+      .Cases({"maxnum", "minnum", "maximum", "minimum"}, true)
+      .Default(false);
+
+  // Parse rounding mode (second-to-last arg when present).
+  std::optional<RoundingMode> RM;
+  if (HasRM) {
+    StringRef RMS = getConstrainedFPMetaStr(CI->getArgOperand(CI->arg_size() - 2));
+    if (RMS.empty())
+      return nullptr; // malformed
+    RM = convertStrToRoundingMode(RMS);
+    if (!RM)
+      return nullptr;
+  }
+
+  // Parse exception behavior (always the last metadata arg).
+  StringRef EBS = getConstrainedFPMetaStr(CI->getArgOperand(CI->arg_size() - 1));
+  if (EBS.empty())
+    return nullptr; // malformed
+  std::optional<fp::ExceptionBehavior> EB = convertStrToExceptionBehavior(EBS);
+  if (!EB)
+    return nullptr;
+
+  bool DefaultRM = !RM || *RM == RoundingMode::Dynamic;
+  bool DefaultEB = *EB == fp::ebStrict;
+
+  // Collect the non-metadata value args.
+  // Layout: [value args...] [predicate?] [rounding mode?] [exception behavior]
+  bool IsFCmp  = (OpName == "fcmp");
+  bool IsFCmps = (OpName == "fcmps");
+  bool IsCompare = IsFCmp || IsFCmps;
+  unsigned NArgs = CI->arg_size() - 1; // always has EB
+  if (HasRM)
+    NArgs -= 1;
+  if (IsCompare)
+    NArgs -= 1; // predicate at arg[2] is metadata
+
+  SmallVector<Value *, 4> Args;
+  for (unsigned I = 0; I < NArgs; I++)
+    Args.push_back(CI->getArgOperand(I));
+
+  if (DefaultRM && DefaultEB) {
+    // Emit a plain IR instruction for operations that have one.
+    if (OpName == "fadd")
+      return Builder.CreateFAdd(Args[0], Args[1], CI->getName());
+    if (OpName == "fsub")
+      return Builder.CreateFSub(Args[0], Args[1], CI->getName());
+    if (OpName == "fmul")
+      return Builder.CreateFMul(Args[0], Args[1], CI->getName());
+    if (OpName == "fdiv")
+      return Builder.CreateFDiv(Args[0], Args[1], CI->getName());
+    if (OpName == "frem")
+      return Builder.CreateFRem(Args[0], Args[1], CI->getName());
+    // fcmp/fcmps with default EB (strict) still emits llvm.fcmp/llvm.fcmps with
+    // an explicit fp.except.strict bundle.  A plain `fcmp` would be treated as
+    // side-effect-free by the optimizer and could be DCE'd even in a strictfp
+    // function, losing the FP exception semantics.  The explicit bundle keeps
+    // the call alive and tells the backend to use the strict lowering path.
+    // Fall through for both.
+    if (OpName == "fptoui")
+      return Builder.CreateFPToUI(Args[0], RetTy, CI->getName());
+    if (OpName == "fptosi")
+      return Builder.CreateFPToSI(Args[0], RetTy, CI->getName());
+    if (OpName == "uitofp")
+      return Builder.CreateUIToFP(Args[0], RetTy, CI->getName());
+    if (OpName == "sitofp")
+      return Builder.CreateSIToFP(Args[0], RetTy, CI->getName());
+    if (OpName == "fptrunc")
+      return Builder.CreateFPTrunc(Args[0], RetTy, CI->getName());
+    if (OpName == "fpext")
+      return Builder.CreateFPExt(Args[0], RetTy, CI->getName());
+    // All other ops (math intrinsics): fall through to emit as plain intrinsic.
+  }
+
+  Intrinsic::ID NewID = getNewFPIntrinsicForConstrainedName(OpName);
+  if (NewID == Intrinsic::not_intrinsic)
+    return nullptr;
+
+  SmallVector<OperandBundleDef, 2> Bundles;
+  if (!DefaultRM)
+    addFPRoundingBundle(Ctx, Bundles, *RM);
+  // Always add an explicit fp.except bundle for fcmp/fcmps so that the call
+  // cannot be silently removed as dead code (a plain `fcmp` has no side
+  // effects in LLVM's model even inside a strictfp function).
+  if (!DefaultEB || IsCompare)
+    addFPExceptionBundle(Ctx, Bundles, *EB);
+
+  // For fcmp/fcmps: append the predicate as a trailing metadata arg (matches
+  // int_fcmp / int_fcmps signatures which take (float, float, metadata_pred)).
+  if (IsCompare) {
+    FCmpInst::Predicate Pred = getConstrainedFCmpPred(CI);
+    auto *PredMD = MDString::get(Ctx, CmpInst::getPredicateName(Pred));
+    Args.push_back(MetadataAsValue::get(Ctx, PredMD));
+  }
+
+  // Build the overload type list.  Most FP intrinsics are overloaded on the
+  // float return/operand type; a few additionally overload on a second type.
+  SmallVector<Type *, 2> Tys;
+  Intrinsic::ID TmpID = NewID;
+  switch (TmpID) {
+  // Comparisons: overloaded on input float type (return is i1).
+  case Intrinsic::fcmp:
+  case Intrinsic::fcmps:
+    Tys = {Args[0]->getType()};
+    break;
+  // Conversions/rounding where return int type and input float type differ.
+  case Intrinsic::fptoui:
+  case Intrinsic::fptosi:
+  case Intrinsic::uitofp:
+  case Intrinsic::sitofp:
+  case Intrinsic::fptrunc:
+  case Intrinsic::fpext:
+  // int_l{l}rint / int_l{l}round: [anyint], [anyfloat] — two overloaded types.
+  case Intrinsic::lrint:
+  case Intrinsic::llrint:
+  case Intrinsic::lround:
+  case Intrinsic::llround:
+    Tys = {RetTy, Args[0]->getType()};
+    break;
+  // Integer-exponent ops: float type + distinct int type.
+  case Intrinsic::powi:
+  case Intrinsic::ldexp:
+    Tys = {RetTy, Args[1]->getType()};
+    break;
+  default:
+    // All other FP ops: overloaded solely on the float type.
+    Tys = {RetTy};
+    break;
+  }
+
+  // Bundle-aware overload: (ID, Types, Args, Bundles, FMFSource, Name).
+  return Builder.CreateIntrinsic(NewID, Tys, Args, Bundles, nullptr,
+                                 CI->getName());
+}
+
 static Value *upgradeConvertIntrinsicCall(StringRef Name, CallBase *CI,
                                           Function *F, IRBuilder<> &Builder) {
   if (Name.starts_with("to.fp16")) {
@@ -5001,6 +5252,7 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
     bool IsARM = Name.consume_front("arm.");
     bool IsAMDGCN = Name.consume_front("amdgcn.");
     bool IsDbg = Name.consume_front("dbg.");
+    bool IsConstrainedFP = Name.consume_front("experimental.constrained.");
     bool IsOldSplice =
         (Name.consume_front("experimental.vector.splice") ||
          Name.consume_front("vector.splice")) &&
@@ -5021,6 +5273,12 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
       Rep = upgradeAMDGCNIntrinsicCall(Name, CI, F, Builder);
     } else if (IsDbg) {
       upgradeDbgIntrinsicToDbgRecord(Name, CI);
+    } else if (IsConstrainedFP) {
+      Rep = upgradeConstrainedFPIntrinsicCall(CI, Name, Builder);
+      // If upgrade returned nullptr the intrinsic has malformed metadata;
+      // leave it in place so the verifier can diagnose it properly.
+      if (!Rep)
+        return;
     } else if (IsOldSplice) {
       Rep = upgradeVectorSplice(CI, Builder);
     } else if (Name.consume_front("convert.")) {
@@ -5759,7 +6017,9 @@ void llvm::UpgradeCallsToIntrinsic(Function *F) {
         UpgradeIntrinsicCall(CB, NewFn);
 
     // Remove old function, no longer used, from the module.
-    if (F != NewFn)
+    // Guard with use_empty() in case any call site could not be upgraded (e.g.,
+    // malformed metadata) and was left in place for the verifier to diagnose.
+    if (F != NewFn && F->use_empty())
       F->eraseFromParent();
   }
 }
@@ -6355,7 +6615,8 @@ struct StrictFPUpgradeVisitor : public InstVisitor<StrictFPUpgradeVisitor> {
   void visitCallBase(CallBase &Call) {
     if (!Call.isStrictFP())
       return;
-    if (isa<ConstrainedFPIntrinsic>(&Call))
+    if (auto *II = dyn_cast<IntrinsicInst>(&Call);
+        II && Intrinsic::isConstrainedFPIntrinsic(II->getIntrinsicID()))
       return;
     // If we get here, the caller doesn't have the strictfp attribute
     // but this callsite does. Replace the strictfp attribute with nobuiltin.
diff --git a/llvm/lib/IR/FPEnv.cpp b/llvm/lib/IR/FPEnv.cpp
index c41d7b3181a37..433f5d7f38dc8 100644
--- a/llvm/lib/IR/FPEnv.cpp
+++ b/llvm/lib/IR/FPEnv.cpp
@@ -14,9 +14,6 @@
 
 #include "llvm/IR/FPEnv.h"
 #include "llvm/ADT/StringSwitch.h"
-#include "llvm/IR/Instruction.h"
-#include "llvm/IR/IntrinsicInst.h"
-#include "llvm/IR/Intrinsics.h"
 #include <optional>
 
 using namespace llvm;
@@ -35,6 +32,17 @@ llvm::convertStrToRoundingMode(StringRef RoundingArg) {
       .Default(std::nullopt);
 }
 
+std::optional<RoundingMode> llvm::convertBundleToRoundingMode(StringRef RoundingArg) {
+  return StringSwitch<std::optional<RoundingMode>>(RoundingArg)
+      .Case("dyn", RoundingMode::Dynamic)
+      .Case("rte", RoundingMode::NearestTiesToEven)
+      .Case("rmm", RoundingMode::NearestTiesToAway)
+      .Case("rtn", RoundingMode::TowardNegative)
+      .Case("rtp", RoundingMode::TowardPositive)
+      .Case("rtz", RoundingMode::TowardZero)
+      .Default(std::nullopt);
+}
+
 std::optional<StringRef>
 llvm::convertRoundingModeToStr(RoundingMode UseRounding) {
   std::optional<StringRef> RoundingStr;
@@ -63,6 +71,33 @@ llvm::convertRoundingModeToStr(RoundingMode UseRounding) {
   return RoundingStr;
 }
 
+std::optional<StringRef> llvm::convertRoundingModeToBundle(RoundingMode UseRounding) {
+  std::optional<StringRef> RoundingStr;
+  switch (UseRounding) {
+  case RoundingMode::Dynamic:
+    RoundingStr = "dyn";
+    break;
+  case RoundingMode::NearestTiesToEven:
+    RoundingStr = "rte";
+    break;
+  case RoundingMode::NearestTiesToAway:
+    RoundingStr = "rmm";
+    break;
+  case RoundingMode::TowardNegative:
+    RoundingStr = "rtn";
+    break;
+  case RoundingMode::TowardPositive:
+    RoundingStr = "rtp";
+    break;
+  case RoundingMode::TowardZero:
+    RoundingStr = "rtz";
+    break;
+  default:
+    break;
+  }
+  return RoundingStr;
+}
+
 std::optional<fp::ExceptionBehavior>
 llvm::convertStrToExceptionBehavior(StringRef ExceptionArg) {
   return StringSwitch<std::optional<fp::ExceptionBehavior>>(ExceptionArg)
@@ -72,6 +107,15 @@ llvm::convertStrToExceptionBehavior(StringRef ExceptionArg) {
       .Default(std::nullopt);
 }
 
+std::optional<fp::ExceptionBehavior>
+llvm::convertBundleToExceptionBehavior(StringRef ExceptionArg) {
+  return StringSwitch<std::optional<fp::ExceptionBehavior>>(ExceptionArg)
+      .Case("ignore", fp::ebIgnore)
+      .Case("maytrap", fp::ebMayTrap)
+      .Case("strict", fp::ebStrict)
+      .Default(std::nullopt);
+}
+
 std::optional<StringRef>
 llvm::convertExceptionBehaviorToStr(fp::ExceptionBehavior UseExcept) {
   std::optional<StringRef> ExceptStr;
@@ -89,43 +133,20 @@ llvm::convertExceptionBehaviorToStr(fp::ExceptionBehavior UseExcept) {
   return ExceptStr;
 }
 
-Intrinsic::ID llvm::getConstrainedIntrinsicID(const Instruction &Instr) {
-  Intrinsic::ID IID = Intrinsic::not_intrinsic;
-  switch (Instr.getOpcode()) {
-  case Instruction::FCmp:
-    // Unlike other instructions FCmp can be mapped to one of two intrinsic
-    // functions. We choose the non-signaling variant.
-    IID = Intrinsic::experimental_constrained_fcmp;
+std::optional<StringRef>
+llvm::convertExceptionBehaviorToBundle(fp::ExceptionBehavior UseExcept) {
+  std::optional<StringRef> ExceptStr;
+  switch (UseExcept) {
+  case fp::ebStrict:
+    ExceptStr = "strict";
     break;
-
-    // Instructions
-#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)                         \
-  case Instruction::NAME:                                                      \
-    IID = Intrinsic::INTRINSIC;                                                \
-    break;
-#define FUNCTION(NAME, NARG, ROUND_MODE, INTRINSIC)
-#define CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)
-#include "llvm/IR/ConstrainedOps.def"
-
-  // Intrinsic calls.
-  case Instruction::Call:
-    if (auto *IntrinCall = dyn_cast<IntrinsicInst>(&Instr)) {
-      switch (IntrinCall->getIntrinsicID()) {
-#define FUNCTION(NAME, NARG, ROUND_MODE, INTRINSIC)                            \
-  case Intrinsic::NAME:                                                        \
-    IID = Intrinsic::INTRINSIC;                                                \
-    break;
-#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)
-#define CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)
-#include "llvm/IR/ConstrainedOps.def"
-      default:
-        break;
-      }
-    }
+  case fp::ebIgnore:
+    ExceptStr = "ignore";
     break;
-  default:
+  case fp::ebMayTrap:
+    ExceptStr = "maytrap";
     break;
   }
-
-  return IID;
+  return ExceptStr;
 }
+
diff --git a/llvm/lib/IR/IRBuilder.cpp b/llvm/lib/IR/IRBuilder.cpp
index 8b8f8e68ee2b9..6aa92a439a88b 100644
--- a/llvm/lib/IR/IRBuilder.cpp
+++ b/llvm/lib/IR/IRBuilder.cpp
@@ -186,6 +186,66 @@ IRBuilderBase::createCallHelper(Function *Callee, ArrayRef<Value *> Ops,
   return CI;
 }
 
+CallInst *IRBuilderBase::CreateCall(FunctionType *FTy, Value *Callee,
+                                    ArrayRef<Value *> Args,
+                                    ArrayRef<OperandBundleDef> OpBundles,
+                                    const Twine &Name, MDNode *FPMathTag) {
+  assert(std::count_if(OpBundles.begin(), OpBundles.end(),
+                       [](const OperandBundleDef &Item) {
+                         return Item.getTag() == "fp.control";
+                       }) <= 1);
+  assert(std::count_if(OpBundles.begin(), OpBundles.end(),
+                       [](const OperandBundleDef &Item) {
+                         return Item.getTag() == "fp.except";
+                       }) <= 1);
+
+  ArrayRef<OperandBundleDef> ActualBundlesRef = OpBundles;
+  SmallVector<OperandBundleDef, 2> ActualBundles;
+
+  // If the builder is in strictfp mode and has non-default options (like
+  // non-dynamic rounding), add corresponding operand bundle. If such bundle is
+  // already present, assume it overwrites defaults.
+  bool NeedUpdateMemoryEffects = false;
+  if (IsFPConstrained) {
+    if (const auto *Func = dyn_cast<Function>(Callee)) {
+      if (Intrinsic::ID ID = Func->getIntrinsicID()) {
+        if (IntrinsicInst::isFloatingPointOperation(ID)) {
+          MemoryEffects FME = Func->getMemoryEffects();
+          NeedUpdateMemoryEffects = !FME.doesAccessInaccessibleMem();
+          bool NeedRound = DefaultConstrainedRounding != RoundingMode::Dynamic;
+          bool NeedExcept = DefaultConstrainedExcept != fp::ebStrict;
+          for (const auto &Item : OpBundles) {
+            if (NeedRound && Item.getTag() == "fp.control")
+              NeedRound = false;
+            else if (NeedExcept && Item.getTag() == "fp.except")
+              NeedExcept = false;
+            ActualBundles.push_back(Item);
+          }
+          if (NeedRound)
+            createFPRoundingBundle(ActualBundles, DefaultConstrainedRounding);
+          if (NeedExcept)
+            createFPExceptionBundle(ActualBundles, DefaultConstrainedExcept);
+          ActualBundlesRef = ActualBundles;
+        }
+      }
+    }
+  }
+
+  // If the call accesses FPE, update memory effects accordingly.
+  CallInst *CI = CallInst::Create(FTy, Callee, Args, ActualBundlesRef);
+  if (NeedUpdateMemoryEffects) {
+    MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();
+    auto A = Attribute::getWithMemoryEffects(getContext(), ME);
+    CI->addFnAttr(A);
+  }
+
+  if (IsFPConstrained)
+    setConstrainedFPCallAttr(CI);
+  if (isa<FPMathOperator>(CI))
+    setFPAttrs(CI, FPMathTag, FMF);
+  return Insert(CI, Name);
+}
+
 static Value *CreateVScaleMultiple(IRBuilderBase &B, Type *Ty, uint64_t Scale) {
   Value *VScale = B.CreateVScale(Ty);
   if (Scale == 1)
@@ -951,55 +1011,15 @@ CallInst *IRBuilderBase::CreateIntrinsic(Type *RetTy, Intrinsic::ID ID,
   return createCallHelper(Fn, Args, Name, FMFSource);
 }
 
-CallInst *IRBuilderBase::CreateConstrainedFPBinOp(
-    Intrinsic::ID ID, Value *L, Value *R, FMFSource FMFSource,
-    const Twine &Name, MDNode *FPMathTag, std::optional<RoundingMode> Rounding,
-    std::optional<fp::ExceptionBehavior> Except) {
-  Value *RoundingV = getConstrainedFPRounding(Rounding);
-  Value *ExceptV = getConstrainedFPExcept(Except);
-
-  FastMathFlags UseFMF = FMFSource.get(FMF);
-
-  CallInst *C = CreateIntrinsic(ID, {L->getType()},
-                                {L, R, RoundingV, ExceptV}, nullptr, Name);
-  setConstrainedFPCallAttr(C);
-  setFPAttrs(C, FPMathTag, UseFMF);
-  return C;
-}
-
-CallInst *IRBuilderBase::CreateConstrainedFPIntrinsic(
-    Intrinsic::ID ID, ArrayRef<Type *> Types, ArrayRef<Value *> Args,
-    FMFSource FMFSource, const Twine &Name, MDNode *FPMathTag,
-    std::optional<RoundingMode> Rounding,
-    std::optional<fp::ExceptionBehavior> Except) {
-  Value *RoundingV = getConstrainedFPRounding(Rounding);
-  Value *ExceptV = getConstrainedFPExcept(Except);
-
-  FastMathFlags UseFMF = FMFSource.get(FMF);
-
-  llvm::SmallVector<Value *, 5> ExtArgs(Args);
-  ExtArgs.push_back(RoundingV);
-  ExtArgs.push_back(ExceptV);
-
-  CallInst *C = CreateIntrinsic(ID, Types, ExtArgs, nullptr, Name);
-  setConstrainedFPCallAttr(C);
-  setFPAttrs(C, FPMathTag, UseFMF);
-  return C;
-}
-
-CallInst *IRBuilderBase::CreateConstrainedFPUnroundedBinOp(
-    Intrinsic::ID ID, Value *L, Value *R, FMFSource FMFSource,
-    const Twine &Name, MDNode *FPMathTag,
-    std::optional<fp::ExceptionBehavior> Except) {
-  Value *ExceptV = getConstrainedFPExcept(Except);
-
-  FastMathFlags UseFMF = FMFSource.get(FMF);
-
-  CallInst *C =
-      CreateIntrinsic(ID, {L->getType()}, {L, R, ExceptV}, nullptr, Name);
-  setConstrainedFPCallAttr(C);
-  setFPAttrs(C, FPMathTag, UseFMF);
-  return C;
+CallInst *IRBuilderBase::CreateIntrinsic(Intrinsic::ID ID,
+                                         ArrayRef<Type *> Types,
+                                         ArrayRef<Value *> Args,
+                                         ArrayRef<OperandBundleDef> OpBundles,
+                                         Instruction *FMFSource,
+                                         const Twine &Name) {
+  Module *M = BB->getModule();
+  Function *Fn = Intrinsic::getOrInsertDeclaration(M, ID, Types);
+  return createCallHelper(Fn, Args, Name, FMFSource, OpBundles);
 }
 
 Value *IRBuilderBase::CreateNAryOp(unsigned Opc, ArrayRef<Value *> Ops,
@@ -1017,38 +1037,19 @@ Value *IRBuilderBase::CreateNAryOp(unsigned Opc, ArrayRef<Value *> Ops,
   llvm_unreachable("Unexpected opcode!");
 }
 
-CallInst *IRBuilderBase::CreateConstrainedFPCast(
-    Intrinsic::ID ID, Value *V, Type *DestTy, FMFSource FMFSource,
-    const Twine &Name, MDNode *FPMathTag, std::optional<RoundingMode> Rounding,
-    std::optional<fp::ExceptionBehavior> Except) {
-  Value *ExceptV = getConstrainedFPExcept(Except);
-
-  FastMathFlags UseFMF = FMFSource.get(FMF);
-
-  CallInst *C;
-  if (Intrinsic::hasConstrainedFPRoundingModeOperand(ID)) {
-    Value *RoundingV = getConstrainedFPRounding(Rounding);
-    C = CreateIntrinsic(ID, {DestTy, V->getType()}, {V, RoundingV, ExceptV},
-                        nullptr, Name);
-  } else
-    C = CreateIntrinsic(ID, {DestTy, V->getType()}, {V, ExceptV}, nullptr,
-                        Name);
-
-  setConstrainedFPCallAttr(C);
-
-  if (isa<FPMathOperator>(C))
-    setFPAttrs(C, FPMathTag, UseFMF);
-  return C;
-}
-
 Value *IRBuilderBase::CreateFCmpHelper(CmpInst::Predicate P, Value *LHS,
                                        Value *RHS, const Twine &Name,
                                        MDNode *FPMathTag, FMFSource FMFSource,
                                        bool IsSignaling) {
-  if (IsFPConstrained) {
-    auto ID = IsSignaling ? Intrinsic::experimental_constrained_fcmps
-                          : Intrinsic::experimental_constrained_fcmp;
-    return CreateConstrainedFPCmp(ID, P, LHS, RHS, Name);
+  if (IsFPConstrained && hasNonDefaultFPConstraints()) {
+    // Emit llvm.fcmp with the predicate encoded as a metadata string argument.
+    // The auto-bundle block in CreateCall injects fp.control/fp.except bundles.
+    // (IsSignaling is not yet distinguished in the new intrinsic form; the
+    //  fp.except bundle controls whether exceptions are raised.)
+    Value *PredMD = MetadataAsValue::get(
+        Context, MDString::get(Context, CmpInst::getPredicateName(P)));
+    return CreateIntrinsic(Intrinsic::fcmp, {LHS->getType()},
+                           {LHS, RHS, PredMD}, FMFSource, Name);
   }
 
   if (auto *V = Folder.FoldCmp(P, LHS, RHS))
@@ -1058,33 +1059,6 @@ Value *IRBuilderBase::CreateFCmpHelper(CmpInst::Predicate P, Value *LHS,
       Name);
 }
 
-CallInst *IRBuilderBase::CreateConstrainedFPCmp(
-    Intrinsic::ID ID, CmpInst::Predicate P, Value *L, Value *R,
-    const Twine &Name, std::optional<fp::ExceptionBehavior> Except) {
-  Value *PredicateV = getConstrainedFPPredicate(P);
-  Value *ExceptV = getConstrainedFPExcept(Except);
-
-  CallInst *C = CreateIntrinsic(ID, {L->getType()},
-                                {L, R, PredicateV, ExceptV}, nullptr, Name);
-  setConstrainedFPCallAttr(C);
-  return C;
-}
-
-CallInst *IRBuilderBase::CreateConstrainedFPCall(
-    Function *Callee, ArrayRef<Value *> Args, const Twine &Name,
-    std::optional<RoundingMode> Rounding,
-    std::optional<fp::ExceptionBehavior> Except) {
-  llvm::SmallVector<Value *, 6> UseArgs(Args);
-
-  if (Intrinsic::hasConstrainedFPRoundingModeOperand(Callee->getIntrinsicID()))
-    UseArgs.push_back(getConstrainedFPRounding(Rounding));
-  UseArgs.push_back(getConstrainedFPExcept(Except));
-
-  CallInst *C = CreateCall(Callee, UseArgs, Name);
-  setConstrainedFPCallAttr(C);
-  return C;
-}
-
 Value *IRBuilderBase::CreateSelectWithUnknownProfile(Value *C, Value *True,
                                                      Value *False,
                                                      StringRef PassName,
@@ -1397,6 +1371,20 @@ CallInst *IRBuilderBase::CreateDereferenceableAssumption(Value *PtrValue,
                           {DereferenceableOpB});
 }
 
+void IRBuilderBase::createFPRoundingBundle(
+    SmallVectorImpl<OperandBundleDef> &Bundles,
+    std::optional<RoundingMode> Rounding) {
+  addFPRoundingBundle(Context, Bundles,
+                      Rounding.value_or(DefaultConstrainedRounding));
+}
+
+void IRBuilderBase::createFPExceptionBundle(
+    SmallVectorImpl<OperandBundleDef> &Bundles,
+    std::optional<fp::ExceptionBehavior> Except) {
+  addFPExceptionBundle(Context, Bundles,
+                       Except.value_or(DefaultConstrainedExcept));
+}
+
 IRBuilderDefaultInserter::~IRBuilderDefaultInserter() = default;
 IRBuilderCallbackInserter::~IRBuilderCallbackInserter() = default;
 IRBuilderFolder::~IRBuilderFolder() = default;
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 8a220c48acac8..7b04bed84e52d 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -615,6 +615,8 @@ bool CallBase::hasReadingOperandBundles() const {
   // ptrauth) forces a callsite to be at least readonly.
   return hasOperandBundlesOtherThan({LLVMContext::OB_ptrauth,
                                      LLVMContext::OB_kcfi,
+                                     LLVMContext::OB_fp_control,
+                                     LLVMContext::OB_fp_except,
                                      LLVMContext::OB_convergencectrl,
                                      LLVMContext::OB_deactivation_symbol}) &&
          getIntrinsicID() != Intrinsic::assume;
@@ -624,15 +626,167 @@ bool CallBase::hasClobberingOperandBundles() const {
   return hasOperandBundlesOtherThan(
              {LLVMContext::OB_deopt, LLVMContext::OB_funclet,
               LLVMContext::OB_ptrauth, LLVMContext::OB_kcfi,
+              LLVMContext::OB_fp_control, LLVMContext::OB_fp_except,
               LLVMContext::OB_convergencectrl,
               LLVMContext::OB_deactivation_symbol}) &&
          getIntrinsicID() != Intrinsic::assume;
 }
 
+RoundingMode CallBase::getRoundingMode() const {
+  // Try reading rounding mode from FP bundle.
+  std::optional<RoundingMode> RM;
+  if (auto ControlBundle = getOperandBundle(LLVMContext::OB_fp_control)) {
+    for (auto &U : ControlBundle->Inputs) {
+      Value *V = U.get();
+      if (auto *MDV = dyn_cast<MetadataAsValue>(V)) {
+        Metadata *MD = MDV->getMetadata();
+        if (auto *MDS = dyn_cast<MDString>(MD))
+          if (auto RM = convertBundleToRoundingMode(MDS->getString()))
+            return *RM;
+      }
+    }
+  }
+
+  // No FP bundle, try to guess from the current mode.
+  if (getParent())
+    if (auto *F = getFunction(); F)
+      return F->getAttributes().hasFnAttr(Attribute::StrictFP)
+                 ? RoundingMode::Dynamic
+                 : RoundingMode::NearestTiesToEven;
+
+  // Isolated call. Assume default environment.
+  return RoundingMode::NearestTiesToEven;
+}
+
+fp::ExceptionBehavior CallBase::getExceptionBehavior() const {
+  // Try determining exception behavior from FP bundle.
+  std::optional<fp::ExceptionBehavior> EB;
+  if (auto ExceptionBundle = getOperandBundle(LLVMContext::OB_fp_except)) {
+    Value *V = ExceptionBundle->Inputs.front();
+    Metadata *MD = cast<MetadataAsValue>(V)->getMetadata();
+    EB = convertBundleToExceptionBehavior(cast<MDString>(MD)->getString());
+  }
+  if (EB)
+    return *EB;
+
+  // No FP bundle, try to guess from the current mode.
+  if (getParent())
+    if (auto *F = getFunction(); F)
+      return F->getAttributes().hasFnAttr(Attribute::StrictFP) ? fp::ebStrict
+                                                               : fp::ebIgnore;
+
+  // Isolated call. Assume default environment.
+  return fp::ebIgnore;
+}
+
+/// Returns the input denormal behavior specified by operand bundles.
+///
+/// Searches for the bundle operand that specifies denormal behavior for the
+/// given type, or for any type if the argument is \a nullptr.
+///
+/// \param FPSem - pointer to the FP semantics of a call argument type. If
+///                \a nullptr, searches for a common denormal mode
+///                specification, specified by the operand with prefix
+///                "denorm.in=".
+std::optional<DenormalMode::DenormalModeKind>
+CallBase::getInputDenormModeFromBundle(const fltSemantics *FPSem) const {
+  if (auto ControlBundle = getOperandBundle(LLVMContext::OB_fp_control)) {
+    if (FPSem)
+      if (auto Result = getDenormModeBundle(*ControlBundle, true, FPSem))
+        return Result;
+    if (auto Result = getDenormModeBundle(*ControlBundle, true, nullptr))
+      return Result;
+  }
+  return std::nullopt;
+}
+
+/// Returns the output denormal behavior specified by operand bundles.
+///
+/// Searches for the bundle operand that specifies denormal behavior for the
+/// given type, or for any type if the argument is \a nullptr.
+///
+/// \param FPSem - pointer to the FP semantics of the call result type. If
+///                \a nullptr, searches for a common denormal mode
+///                specification, specified by the operand with prefix
+///                "denorm.out=".
+std::optional<DenormalMode::DenormalModeKind>
+CallBase::getOutputDenormModeFromBundle(const fltSemantics *FPSem) const {
+  if (auto ControlBundle = getOperandBundle(LLVMContext::OB_fp_control)) {
+    if (FPSem)
+      if (auto Result = getDenormModeBundle(*ControlBundle, false, FPSem))
+        return Result;
+    if (auto Result = getDenormModeBundle(*ControlBundle, false, nullptr))
+      return Result;
+  }
+  return std::nullopt;
+}
+
+/// Returns the input denormal behavior to be used for the call evaluation.
+///
+/// Searches operand bundles first for matching denormal behavior
+/// specifications. If not found in bundles, falls back to checking function
+/// attributes.
+///
+/// \param FPSem - pointer to the FP semantics of the call argument type. If
+///                \a nullptr, searches for a common denormal mode
+///                specification, using the operand with prefix "denorm.in=".
+std::optional<DenormalMode::DenormalModeKind>
+CallBase::getInputDenormMode(const fltSemantics *FPSem) const {
+  if (auto Result = getInputDenormModeFromBundle(FPSem))
+    return Result;
+
+  if (!getParent())
+    return std::nullopt;
+  const Function *F = getFunction();
+  if (!F)
+    return std::nullopt;
+
+  if (FPSem)
+    return F->getDenormalMode(*FPSem).Input;
+  return F->getDenormalFPEnv().DefaultMode.Input;
+}
+
+/// Returns the output denormal behavior to be used for the call evaluation.
+///
+/// Searches operand bundles first for matching denormal behavior
+/// specifications. If not found in bundles, falls back to checking function
+/// attributes.
+///
+/// \param FPSem - pointer to the FP semantics of the call result type. If
+///                \a nullptr, searches for a common denormal mode
+///                specification, using the operand with prefix "denorm.out=".
+std::optional<DenormalMode::DenormalModeKind>
+CallBase::getOutputDenormMode(const fltSemantics *FPSem) const {
+  if (auto Result = getOutputDenormModeFromBundle(FPSem))
+    return Result;
+
+  if (!getParent())
+    return std::nullopt;
+  const Function *F = getFunction();
+  if (!F)
+    return std::nullopt;
+
+  if (FPSem)
+    return F->getDenormalMode(*FPSem).Output;
+  return F->getDenormalFPEnv().DefaultMode.Output;
+}
+
+MemoryEffects CallBase::getFloatingPointMemoryEffects() const {
+  if (Intrinsic::ID IntrID = getIntrinsicID())
+    if (const BasicBlock *BB = getParent())
+      if (const Function *F = BB->getParent())
+        if (F->hasFnAttribute(Attribute::StrictFP))
+          if (IntrinsicInst::isFloatingPointOperation(IntrID)) {
+            return MemoryEffects::inaccessibleMemOnly();
+          }
+  return MemoryEffects::none();
+}
+
 MemoryEffects CallBase::getMemoryEffects() const {
   MemoryEffects ME = getAttributes().getMemoryEffects();
   if (auto *Fn = dyn_cast<Function>(getCalledOperand())) {
     MemoryEffects FnME = Fn->getMemoryEffects();
+    FnME |= getFloatingPointMemoryEffects();
     if (hasOperandBundles()) {
       // TODO: Add a method to get memory effects for operand bundles instead.
       if (hasReadingOperandBundles())
@@ -741,6 +895,136 @@ bool CallBase::hasArgumentWithAdditionalReturnCaptureComponents() const {
   return false;
 }
 
+std::optional<StringRef> llvm::getBundleOperandByPrefix(OperandBundleUse Bundle,
+                                                      StringRef Prefix) {
+  for (const auto &Item : Bundle.Inputs) {
+    Metadata *MD = cast<MetadataAsValue>(Item.get())->getMetadata();
+    if (const auto *MDS = dyn_cast<MDString>(MD)) {
+      StringRef Str = MDS->getString();
+      if (Str.consume_front(Prefix))
+        return Str;
+    }
+  }
+  return std::nullopt;
+}
+
+void llvm::addOperandToBundleTag(LLVMContext &Ctx,
+                               SmallVectorImpl<OperandBundleDef> &Bundles,
+                               StringRef Tag, size_t PrefixSize,
+                               StringRef Val) {
+  assert(PrefixSize > 0 && "Unexpected prefix size");
+  assert(PrefixSize < Val.size() && "Invalid prefix size");
+  StringRef Prefix = Val.take_front(PrefixSize);
+
+  // Find a bundle with the specified tag.
+  OperandBundleDef *Bundle = nullptr;
+  for (OperandBundleDef &OB : Bundles) {
+    if (OB.getTag() == Tag) {
+      Bundle = &OB;
+      break;
+    }
+  }
+
+  // If no such bundle are found, create new one with the single value.
+  if (!Bundle) {
+    auto *MStr = MDString::get(Ctx, Val);
+    auto *MD = MetadataAsValue::get(Ctx, MStr);
+    SmallVector<Value *, 1> BundleValues(1, MD);
+    Bundles.emplace_back(Tag.str(), BundleValues);
+    return;
+  }
+
+  // If the bundle is found, search its values for those started with the given
+  // prefix.
+  SmallVector<Value *, 4> Values(Bundle->inputs());
+  for (auto i = Values.begin(), e = Values.end(); i != e; ++i) {
+    if (auto *MV = dyn_cast<MetadataAsValue>(*i)) {
+      if (auto *MD = dyn_cast<MDString>(MV->getMetadata())) {
+        StringRef Str = MD->getString();
+        if (Str == Val)
+          // Already in the values.
+          return;
+        if (Str.starts_with(Prefix)) {
+          Values.erase(i);
+          break;
+        }
+      }
+    }
+  }
+
+  auto *ValMD = MDString::get(Ctx, Val);
+  auto *MD = MetadataAsValue::get(Ctx, ValMD);
+  Values.push_back(MD);
+  Bundles.emplace_back("fp.control", Values);
+}
+
+void llvm::addFPRoundingBundle(LLVMContext &Ctx,
+                               SmallVectorImpl<OperandBundleDef> &Bundles,
+                               RoundingMode Rounding) {
+  std::optional<StringRef> RndStr = convertRoundingModeToBundle(Rounding);
+  assert(RndStr && "Garbage rounding mode!");
+  auto *RoundingMDS = MDString::get(Ctx, *RndStr);
+  auto *RM = MetadataAsValue::get(Ctx, RoundingMDS);
+  Bundles.emplace_back("fp.control", RM);
+}
+
+void llvm::addFPExceptionBundle(LLVMContext &Ctx,
+                                SmallVectorImpl<OperandBundleDef> &Bundles,
+                                fp::ExceptionBehavior Except) {
+  std::optional<StringRef> ExcStr = convertExceptionBehaviorToBundle(Except);
+  assert(ExcStr && "Garbage exception behavior!");
+  auto *ExceptMDS = MDString::get(Ctx, *ExcStr);
+  auto *EB = MetadataAsValue::get(Ctx, ExceptMDS);
+  Bundles.emplace_back("fp.except", EB);
+}
+
+static StringRef getBundledDenormPrefix(const fltSemantics *FPSem, bool Input) {
+  // Only f32 is supported now.
+  unsigned TypeIndex = FPSem == &APFloat::IEEEsingle();
+  if (FPSem && !TypeIndex)
+    return StringRef();
+  static const StringRef Prefix[2][2] = {{"denorm.out=", "denorm.in="},
+                                         {"denorm.f32.out=", "denorm.f32.in="}};
+  return Prefix[TypeIndex][Input];
+}
+
+std::optional<DenormalMode::DenormalModeKind>
+llvm::getDenormModeBundle(const OperandBundleUse &Control, bool Input,
+                          const fltSemantics *FPSem) {
+  assert(Control.getTagID() == LLVMContext::OB_fp_control);
+  StringRef Prefix = getBundledDenormPrefix(FPSem, Input);
+  if (Prefix.empty())
+    return std::nullopt;
+  auto DenormOperand = getBundleOperandByPrefix(Control, Prefix);
+  if (DenormOperand)
+    return parseDenormalKindFromOperandBundle(*DenormOperand);
+  return std::nullopt;
+}
+
+void llvm::addInputDenormBundle(LLVMContext &Ctx,
+                                SmallVectorImpl<OperandBundleDef> &Bundles,
+                                DenormalMode::DenormalModeKind Mode) {
+  std::optional<StringRef> DenormValue = printDenormalForOperandBundle(Mode);
+  if (!DenormValue)
+    return;
+  std::string Prefix = "denorm.in=";
+  std::string DenormItem = Prefix + DenormValue->str();
+
+  addOperandToBundleTag(Ctx, Bundles, "fp.control", Prefix.size(), DenormItem);
+}
+
+void llvm::addOutputDenormBundle(LLVMContext &Ctx,
+                                 SmallVectorImpl<OperandBundleDef> &Bundles,
+                                 DenormalMode::DenormalModeKind Mode) {
+  std::optional<StringRef> DenormValue = printDenormalForOperandBundle(Mode);
+  if (!DenormValue)
+    return;
+  std::string Prefix = "denorm.out=";
+  std::string DenormItem = Prefix + DenormValue->str();
+
+  addOperandToBundleTag(Ctx, Bundles, "fp.control", Prefix.size(), DenormItem);
+}
+
 //===----------------------------------------------------------------------===//
 //                        CallInst Implementation
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/IR/IntrinsicInst.cpp b/llvm/lib/IR/IntrinsicInst.cpp
index 5b3e3cf45397f..446da39d61eb4 100644
--- a/llvm/lib/IR/IntrinsicInst.cpp
+++ b/llvm/lib/IR/IntrinsicInst.cpp
@@ -67,6 +67,17 @@ bool IntrinsicInst::mayLowerToFunctionCall(Intrinsic::ID IID) {
   }
 }
 
+bool IntrinsicInst::isFloatingPointOperation(Intrinsic::ID IID) {
+  switch (IID) {
+#define FP_OP(NAME, D) case Intrinsic::NAME:
+#define FP_INTRINSIC(NAME) case Intrinsic::NAME:
+#include "llvm/IR/FloatingPointOps.def"
+    return true;
+  default:
+    return false;
+  }
+}
+
 //===----------------------------------------------------------------------===//
 /// DbgVariableIntrinsic - This is the common base class for debug info
 /// intrinsics for variables.
@@ -274,44 +285,6 @@ void InstrProfCallsite::setCallee(Value *Callee) {
   setArgOperand(4, Callee);
 }
 
-std::optional<RoundingMode> ConstrainedFPIntrinsic::getRoundingMode() const {
-  unsigned NumOperands = arg_size();
-  Metadata *MD = nullptr;
-  auto *MAV = dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2));
-  if (MAV)
-    MD = MAV->getMetadata();
-  if (!MD || !isa<MDString>(MD))
-    return std::nullopt;
-  return convertStrToRoundingMode(cast<MDString>(MD)->getString());
-}
-
-std::optional<fp::ExceptionBehavior>
-ConstrainedFPIntrinsic::getExceptionBehavior() const {
-  unsigned NumOperands = arg_size();
-  Metadata *MD = nullptr;
-  auto *MAV = dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1));
-  if (MAV)
-    MD = MAV->getMetadata();
-  if (!MD || !isa<MDString>(MD))
-    return std::nullopt;
-  return convertStrToExceptionBehavior(cast<MDString>(MD)->getString());
-}
-
-bool ConstrainedFPIntrinsic::isDefaultFPEnvironment() const {
-  std::optional<fp::ExceptionBehavior> Except = getExceptionBehavior();
-  if (Except) {
-    if (*Except != fp::ebIgnore)
-      return false;
-  }
-
-  std::optional<RoundingMode> Rounding = getRoundingMode();
-  if (Rounding) {
-    if (*Rounding != RoundingMode::NearestTiesToEven)
-      return false;
-  }
-
-  return true;
-}
 
 static FCmpInst::Predicate getFPPredicateFromMD(const Value *Op) {
   Metadata *MD = cast<MetadataAsValue>(Op)->getMetadata();
@@ -335,28 +308,6 @@ static FCmpInst::Predicate getFPPredicateFromMD(const Value *Op) {
       .Default(FCmpInst::BAD_FCMP_PREDICATE);
 }
 
-FCmpInst::Predicate ConstrainedFPCmpIntrinsic::getPredicate() const {
-  return getFPPredicateFromMD(getArgOperand(2));
-}
-
-unsigned ConstrainedFPIntrinsic::getNonMetadataArgCount() const {
-  // All constrained fp intrinsics have "fpexcept" metadata.
-  unsigned NumArgs = arg_size() - 1;
-
-  // Some intrinsics have "round" metadata.
-  if (Intrinsic::hasConstrainedFPRoundingModeOperand(getIntrinsicID()))
-    NumArgs -= 1;
-
-  // Compare intrinsics take their predicate as metadata.
-  if (isa<ConstrainedFPCmpIntrinsic>(this))
-    NumArgs -= 1;
-
-  return NumArgs;
-}
-
-bool ConstrainedFPIntrinsic::classof(const IntrinsicInst *I) {
-  return Intrinsic::isConstrainedFPIntrinsic(I->getIntrinsicID());
-}
 
 ElementCount VPIntrinsic::getStaticVectorLength() const {
   auto GetVectorLengthOfType = [](const Type *T) -> ElementCount {
@@ -549,19 +500,6 @@ constexpr static bool doesVPHaveNoFunctionalEquivalent(Intrinsic::ID ID) {
                 getFunctionalIntrinsicIDForVP(Intrinsic::VPID));
 #include "llvm/IR/VPIntrinsics.def"
 
-// Equivalent non-predicated constrained intrinsic
-std::optional<Intrinsic::ID>
-VPIntrinsic::getConstrainedIntrinsicIDForVP(Intrinsic::ID ID) {
-  switch (ID) {
-  default:
-    break;
-#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID:
-#define VP_PROPERTY_CONSTRAINEDFP(CID) return Intrinsic::CID;
-#define END_REGISTER_VP_INTRINSIC(VPID) break;
-#include "llvm/IR/VPIntrinsics.def"
-  }
-  return std::nullopt;
-}
 
 Intrinsic::ID VPIntrinsic::getForOpcode(unsigned IROPC) {
   switch (IROPC) {
diff --git a/llvm/lib/IR/Intrinsics.cpp b/llvm/lib/IR/Intrinsics.cpp
index 0220d74ca9f12..81dabd7742161 100644
--- a/llvm/lib/IR/Intrinsics.cpp
+++ b/llvm/lib/IR/Intrinsics.cpp
@@ -821,27 +821,17 @@ Function *Intrinsic::getDeclarationIfExists(Module *M, ID id,
 #include "llvm/IR/IntrinsicImpl.inc"
 
 bool Intrinsic::isConstrainedFPIntrinsic(ID QID) {
-  switch (QID) {
-#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)                         \
-  case Intrinsic::INTRINSIC:
-#include "llvm/IR/ConstrainedOps.def"
-#undef INSTRUCTION
-    return true;
-  default:
-    return false;
-  }
+  // llvm.fcmps is the new-form signaling FP compare intrinsic.  It has
+  // IntrInaccessibleMemOnly (always strict by default) and uses fp.except
+  // bundle for exception observability — treat it as a constrained intrinsic
+  // for the purposes of dead-instruction analysis.
+  return QID == Intrinsic::fcmps;
 }
 
 bool Intrinsic::hasConstrainedFPRoundingModeOperand(Intrinsic::ID QID) {
-  switch (QID) {
-#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)                         \
-  case Intrinsic::INTRINSIC:                                                   \
-    return ROUND_MODE == 1;
-#include "llvm/IR/ConstrainedOps.def"
-#undef INSTRUCTION
-  default:
-    return false;
-  }
+  // All experimental_constrained_* intrinsics have been removed from
+  // Intrinsics.td. No constrained FP intrinsic IDs exist; always false.
+  return false;
 }
 
 using DeferredIntrinsicMatchPair =
diff --git a/llvm/lib/IR/LLVMContext.cpp b/llvm/lib/IR/LLVMContext.cpp
index 10aba759185a7..4246f33eb8d51 100644
--- a/llvm/lib/IR/LLVMContext.cpp
+++ b/llvm/lib/IR/LLVMContext.cpp
@@ -57,6 +57,10 @@ static StringRef knownBundleName(unsigned BundleTagID) {
     return "align";
   case LLVMContext::OB_deactivation_symbol:
     return "deactivation-symbol";
+  case LLVMContext::OB_fp_control:
+    return "fp.control";
+  case LLVMContext::OB_fp_except:
+    return "fp.except";
   default:
     llvm_unreachable("unknown bundle id");
   }
diff --git a/llvm/lib/IR/Type.cpp b/llvm/lib/IR/Type.cpp
index 498e78d4b0cc8..f5be53b80dcf7 100644
--- a/llvm/lib/IR/Type.cpp
+++ b/llvm/lib/IR/Type.cpp
@@ -107,17 +107,32 @@ bool Type::containsNonLocalTargetExtType() const {
   return containsNonLocalTargetExtType(Visited);
 }
 
-const fltSemantics &Type::getFltSemantics() const {
+const fltSemantics *Type::hasFltSemantics() const {
   switch (getTypeID()) {
-  case HalfTyID: return APFloat::IEEEhalf();
-  case BFloatTyID: return APFloat::BFloat();
-  case FloatTyID: return APFloat::IEEEsingle();
-  case DoubleTyID: return APFloat::IEEEdouble();
-  case X86_FP80TyID: return APFloat::x87DoubleExtended();
-  case FP128TyID: return APFloat::IEEEquad();
-  case PPC_FP128TyID: return APFloat::PPCDoubleDouble();
-  default: llvm_unreachable("Invalid floating type");
+  case HalfTyID:
+    return &APFloat::IEEEhalf();
+  case BFloatTyID:
+    return &APFloat::BFloat();
+  case FloatTyID:
+    return &APFloat::IEEEsingle();
+  case DoubleTyID:
+    return &APFloat::IEEEdouble();
+  case X86_FP80TyID:
+    return &APFloat::x87DoubleExtended();
+  case FP128TyID:
+    return &APFloat::IEEEquad();
+  case PPC_FP128TyID:
+    return &APFloat::PPCDoubleDouble();
+  default:
+    break;
   }
+  return nullptr;
+}
+
+const fltSemantics &Type::getFltSemantics() const {
+  if (auto *FltSem = hasFltSemantics())
+    return *FltSem;
+  llvm_unreachable("Invalid floating type");
 }
 
 bool Type::isScalableTargetExtTy() const {
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index d4ade9c7ce534..4bd528a02322f 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -49,6 +49,7 @@
 
 #include "llvm/IR/Verifier.h"
 #include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
@@ -58,6 +59,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/BinaryFormat/Dwarf.h"
 #include "llvm/IR/Argument.h"
@@ -76,6 +78,7 @@
 #include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DebugLoc.h"
+#include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/EHPersonalities.h"
@@ -611,7 +614,6 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
   void visitUserOp1(Instruction &I);
   void visitUserOp2(Instruction &I) { visitUserOp1(I); }
   void visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call);
-  void visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI);
   void visitVPIntrinsic(VPIntrinsic &VPI);
   void visitDbgLabelIntrinsic(StringRef Kind, DbgLabelInst &DLI);
   void visitAtomicCmpXchgInst(AtomicCmpXchgInst &CXI);
@@ -4087,7 +4089,8 @@ void Verifier::visitCallBase(CallBase &Call) {
        FoundGCTransitionBundle = false, FoundCFGuardTargetBundle = false,
        FoundPreallocatedBundle = false, FoundGCLiveBundle = false,
        FoundPtrauthBundle = false, FoundKCFIBundle = false,
-       FoundAttachedCallBundle = false;
+       FoundAttachedCallBundle = false, FoundFpeControlBundle = false,
+       FoundFpeExceptBundle = false;
   for (unsigned i = 0, e = Call.getNumOperandBundles(); i < e; ++i) {
     OperandBundleUse BU = Call.getOperandBundleAt(i);
     uint32_t Tag = BU.getTagID();
@@ -4150,6 +4153,76 @@ void Verifier::visitCallBase(CallBase &Call) {
             "Multiple \"clang.arc.attachedcall\" operand bundles", Call);
       FoundAttachedCallBundle = true;
       verifyAttachedCallBundle(Call, BU);
+    } else if (Tag == LLVMContext::OB_fp_control) {
+      Check(!FoundFpeControlBundle, "Multiple \"fp.control\" operand bundles",
+            Call);
+      bool FoundRoundingMode = false;
+      bool FoundInDenormalMode = false;
+      bool FoundOutDenormalMode = false;
+      bool FoundF32InDenormalMode = false;
+      bool FoundF32OutDenormalMode = false;
+      for (auto &U : BU.Inputs) {
+        Value *V = U.get();
+        Check(isa<MetadataAsValue>(V),
+              "Value of a \"fp.control\" bundle operand must be a metadata",
+              Call);
+        Metadata *MD = cast<MetadataAsValue>(V)->getMetadata();
+        Check(isa<MDString>(MD),
+              "Value of a \"fp.control\" bundle operand must be a string",
+              Call);
+        StringRef Item = cast<MDString>(MD)->getString();
+        if (convertBundleToRoundingMode(Item)) {
+          Check(!FoundRoundingMode, "Rounding mode is specified more that once",
+                Call);
+          FoundRoundingMode = true;
+        } else if (Item.consume_front("denorm.in=")) {
+          Check(!FoundInDenormalMode,
+                "Input denormal mode is specified more that once", Call);
+          FoundInDenormalMode = true;
+          Check(parseDenormalKindFromOperandBundle(Item),
+                "Invalid input denormal mode", Call);
+        } else if (Item.consume_front("denorm.out=")) {
+          Check(!FoundOutDenormalMode,
+                "Output denormal mode is specified more that once", Call);
+          FoundOutDenormalMode = true;
+          Check(parseDenormalKindFromOperandBundle(Item),
+                "Invalid output denormal mode", Call);
+        } else if (Item.consume_front("denorm.f32.in=")) {
+          Check(!FoundF32InDenormalMode,
+                "F32 input denormal mode is specified more than once", Call);
+          FoundF32InDenormalMode = true;
+          Check(parseDenormalKindFromOperandBundle(Item),
+                "Invalid F32 input denormal mode", Call);
+        } else if (Item.consume_front("denorm.f32.out=")) {
+          Check(!FoundF32OutDenormalMode,
+                "F32 output denormal mode is specified more than once", Call);
+          FoundF32OutDenormalMode = true;
+          Check(parseDenormalKindFromOperandBundle(Item),
+                "Invalid F32 output denormal mode", Call);
+        } else {
+          CheckFailed("Unrecognized value in \"fp.control\" bundle operand",
+                      Call);
+        }
+      }
+      FoundFpeControlBundle = true;
+    } else if (Tag == LLVMContext::OB_fp_except) {
+      Check(!FoundFpeExceptBundle, "Multiple \"fp.except\" operand bundles",
+            Call);
+      Check(BU.Inputs.size() == 1,
+            "Expected exactly one \"fp.except\" bundle operand", Call);
+      auto *V = dyn_cast<MetadataAsValue>(BU.Inputs.front());
+      Check(V, "Value of a \"fp.except\" bundle operand must be a metadata",
+            Call);
+      auto *MDS = dyn_cast<MDString>(V->getMetadata());
+      Check(MDS, "Value of a \"fp.except\" bundle operand must be a string",
+            Call);
+      auto EB = convertBundleToExceptionBehavior(MDS->getString());
+      Check(
+          EB.has_value(),
+          "Value of a \"fp.except\" bundle operand is not a correct exception "
+          "behavior",
+          Call);
+      FoundFpeExceptBundle = true;
     }
   }
 
@@ -6157,12 +6230,6 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
 #undef BEGIN_REGISTER_VP_INTRINSIC
     visitVPIntrinsic(cast<VPIntrinsic>(Call));
     break;
-#define INSTRUCTION(NAME, NARGS, ROUND_MODE, INTRINSIC)                        \
-  case Intrinsic::INTRINSIC:
-#include "llvm/IR/ConstrainedOps.def"
-#undef INSTRUCTION
-    visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));
-    break;
   case Intrinsic::dbg_declare: // llvm.dbg.declare
   case Intrinsic::dbg_value:   // llvm.dbg.value
   case Intrinsic::dbg_assign:  // llvm.dbg.assign
@@ -7606,140 +7673,6 @@ void Verifier::visitVPIntrinsic(VPIntrinsic &VPI) {
   }
 }
 
-void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
-  unsigned NumOperands = FPI.getNonMetadataArgCount();
-  bool HasRoundingMD =
-      Intrinsic::hasConstrainedFPRoundingModeOperand(FPI.getIntrinsicID());
-
-  // Add the expected number of metadata operands.
-  NumOperands += (1 + HasRoundingMD);
-
-  // Compare intrinsics carry an extra predicate metadata operand.
-  if (isa<ConstrainedFPCmpIntrinsic>(FPI))
-    NumOperands += 1;
-  Check((FPI.arg_size() == NumOperands),
-        "invalid arguments for constrained FP intrinsic", &FPI);
-
-  switch (FPI.getIntrinsicID()) {
-  case Intrinsic::experimental_constrained_lrint:
-  case Intrinsic::experimental_constrained_llrint: {
-    Type *ValTy = FPI.getArgOperand(0)->getType();
-    Type *ResultTy = FPI.getType();
-    Check(!ValTy->isVectorTy() && !ResultTy->isVectorTy(),
-          "Intrinsic does not support vectors", &FPI);
-    break;
-  }
-
-  case Intrinsic::experimental_constrained_lround:
-  case Intrinsic::experimental_constrained_llround: {
-    Type *ValTy = FPI.getArgOperand(0)->getType();
-    Type *ResultTy = FPI.getType();
-    Check(!ValTy->isVectorTy() && !ResultTy->isVectorTy(),
-          "Intrinsic does not support vectors", &FPI);
-    break;
-  }
-
-  case Intrinsic::experimental_constrained_fcmp:
-  case Intrinsic::experimental_constrained_fcmps: {
-    auto Pred = cast<ConstrainedFPCmpIntrinsic>(&FPI)->getPredicate();
-    Check(CmpInst::isFPPredicate(Pred),
-          "invalid predicate for constrained FP comparison intrinsic", &FPI);
-    break;
-  }
-
-  case Intrinsic::experimental_constrained_fptosi:
-  case Intrinsic::experimental_constrained_fptoui: {
-    Value *Operand = FPI.getArgOperand(0);
-    ElementCount SrcEC;
-    Check(Operand->getType()->isFPOrFPVectorTy(),
-          "Intrinsic first argument must be floating point", &FPI);
-    if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
-      SrcEC = cast<VectorType>(OperandT)->getElementCount();
-    }
-
-    Operand = &FPI;
-    Check(SrcEC.isNonZero() == Operand->getType()->isVectorTy(),
-          "Intrinsic first argument and result disagree on vector use", &FPI);
-    Check(Operand->getType()->isIntOrIntVectorTy(),
-          "Intrinsic result must be an integer", &FPI);
-    if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
-      Check(SrcEC == cast<VectorType>(OperandT)->getElementCount(),
-            "Intrinsic first argument and result vector lengths must be equal",
-            &FPI);
-    }
-    break;
-  }
-
-  case Intrinsic::experimental_constrained_sitofp:
-  case Intrinsic::experimental_constrained_uitofp: {
-    Value *Operand = FPI.getArgOperand(0);
-    ElementCount SrcEC;
-    Check(Operand->getType()->isIntOrIntVectorTy(),
-          "Intrinsic first argument must be integer", &FPI);
-    if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
-      SrcEC = cast<VectorType>(OperandT)->getElementCount();
-    }
-
-    Operand = &FPI;
-    Check(SrcEC.isNonZero() == Operand->getType()->isVectorTy(),
-          "Intrinsic first argument and result disagree on vector use", &FPI);
-    Check(Operand->getType()->isFPOrFPVectorTy(),
-          "Intrinsic result must be a floating point", &FPI);
-    if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
-      Check(SrcEC == cast<VectorType>(OperandT)->getElementCount(),
-            "Intrinsic first argument and result vector lengths must be equal",
-            &FPI);
-    }
-    break;
-  }
-
-  case Intrinsic::experimental_constrained_fptrunc:
-  case Intrinsic::experimental_constrained_fpext: {
-    Value *Operand = FPI.getArgOperand(0);
-    Type *OperandTy = Operand->getType();
-    Value *Result = &FPI;
-    Type *ResultTy = Result->getType();
-    Check(OperandTy->isFPOrFPVectorTy(),
-          "Intrinsic first argument must be FP or FP vector", &FPI);
-    Check(ResultTy->isFPOrFPVectorTy(),
-          "Intrinsic result must be FP or FP vector", &FPI);
-    Check(OperandTy->isVectorTy() == ResultTy->isVectorTy(),
-          "Intrinsic first argument and result disagree on vector use", &FPI);
-    if (OperandTy->isVectorTy()) {
-      Check(cast<VectorType>(OperandTy)->getElementCount() ==
-                cast<VectorType>(ResultTy)->getElementCount(),
-            "Intrinsic first argument and result vector lengths must be equal",
-            &FPI);
-    }
-    if (FPI.getIntrinsicID() == Intrinsic::experimental_constrained_fptrunc) {
-      Check(OperandTy->getScalarSizeInBits() > ResultTy->getScalarSizeInBits(),
-            "Intrinsic first argument's type must be larger than result type",
-            &FPI);
-    } else {
-      Check(OperandTy->getScalarSizeInBits() < ResultTy->getScalarSizeInBits(),
-            "Intrinsic first argument's type must be smaller than result type",
-            &FPI);
-    }
-    break;
-  }
-
-  default:
-    break;
-  }
-
-  // If a non-metadata argument is passed in a metadata slot then the
-  // error will be caught earlier when the incorrect argument doesn't
-  // match the specification in the intrinsic call table. Thus, no
-  // argument type check is needed here.
-
-  Check(FPI.getExceptionBehavior().has_value(),
-        "invalid exception behavior argument", &FPI);
-  if (HasRoundingMD) {
-    Check(FPI.getRoundingMode().has_value(), "invalid rounding mode argument",
-          &FPI);
-  }
-}
-
 void Verifier::verifyFragmentExpression(const DbgVariableRecord &DVR) {
   DILocalVariable *V = dyn_cast_or_null<DILocalVariable>(DVR.getRawVariable());
   DIExpression *E = dyn_cast_or_null<DIExpression>(DVR.getRawExpression());
diff --git a/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp b/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
index 7600b3a695a4e..e3596406ec21d 100644
--- a/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
@@ -304,7 +304,7 @@ class SPIRVEmitIntrinsics
   bool postprocessTypes(Module &M);
   bool processFunctionPointers(Module &M);
   void parseFunDeclarations(Module &M);
-  void useRoundingMode(ConstrainedFPIntrinsic *FPI, IRBuilder<> &B);
+  void useRoundingMode(IntrinsicInst *FPI, IRBuilder<> &B);
   bool processMaskedMemIntrinsic(IntrinsicInst &I);
   bool convertMaskedMemIntrinsics(Module &M);
   void preprocessBoolVectorBitcasts(Function &F);
@@ -1646,13 +1646,11 @@ Instruction *SPIRVEmitIntrinsics::visitCallInst(CallInst &Call) {
 }
 
 // Use a tip about rounding mode to create a decoration.
-void SPIRVEmitIntrinsics::useRoundingMode(ConstrainedFPIntrinsic *FPI,
+void SPIRVEmitIntrinsics::useRoundingMode(IntrinsicInst *FPI,
                                           IRBuilder<> &B) {
-  std::optional<RoundingMode> RM = FPI->getRoundingMode();
-  if (!RM.has_value())
-    return;
+  RoundingMode RM = FPI->getRoundingMode();
   unsigned RoundingModeDeco = std::numeric_limits<unsigned>::max();
-  switch (RM.value()) {
+  switch (RM) {
   default:
     // ignore unknown rounding modes
     break;
@@ -3203,7 +3201,8 @@ bool SPIRVEmitIntrinsics::runOnFunction(Function &Func) {
     if (Postpone && !GR->findAssignPtrTypeInstr(I))
       insertAssignPtrTypeIntrs(I, B, true);
 
-    if (auto *FPI = dyn_cast<ConstrainedFPIntrinsic>(I))
+    if (auto *FPI = dyn_cast<IntrinsicInst>(I);
+        FPI && FPI->getOperandBundle(LLVMContext::OB_fp_control).has_value())
       useRoundingMode(FPI, B);
   }
 
diff --git a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
index a3b44ad6d31d5..7d7e35a1f5296 100644
--- a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
@@ -23,9 +23,11 @@
 #include "SPIRVTargetMachine.h"
 #include "SPIRVUtils.h"
 #include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/CodeGen/IntrinsicLowering.h"
+#include "llvm/IR/FPEnv.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
@@ -347,21 +349,6 @@ static void lowerFunnelShifts(IntrinsicInst *FSHIntrinsic) {
   FSHIntrinsic->setCalledFunction(FSHFunc);
 }
 
-static void lowerConstrainedFPCmpIntrinsic(
-    ConstrainedFPCmpIntrinsic *ConstrainedCmpIntrinsic,
-    SmallVector<Instruction *> &EraseFromParent) {
-  if (!ConstrainedCmpIntrinsic)
-    return;
-  // Extract the floating-point values being compared
-  Value *LHS = ConstrainedCmpIntrinsic->getArgOperand(0);
-  Value *RHS = ConstrainedCmpIntrinsic->getArgOperand(1);
-  FCmpInst::Predicate Pred = ConstrainedCmpIntrinsic->getPredicate();
-  IRBuilder<> Builder(ConstrainedCmpIntrinsic);
-  Value *FCmp = Builder.CreateFCmp(Pred, LHS, RHS);
-  ConstrainedCmpIntrinsic->replaceAllUsesWith(FCmp);
-  EraseFromParent.push_back(dyn_cast<Instruction>(ConstrainedCmpIntrinsic));
-}
-
 static void lowerExpectAssume(IntrinsicInst *II) {
   // If we cannot use the SPV_KHR_expect_assume extension, then we need to
   // ignore the intrinsic and move on. It should be removed later on by LLVM.
@@ -405,24 +392,87 @@ static bool toSpvLifetimeIntrinsic(IntrinsicInst *II, Intrinsic::ID NewID) {
   return true;
 }
 
+// Attach a SPIRV FPRoundingMode decoration (via spv_assign_decoration) to
+// Inst for the rounding mode RM. Does nothing if RM is Dynamic or unknown.
+static void attachFPRoundingModeDecoration(Instruction *Inst, RoundingMode RM,
+                                           IRBuilder<> &Builder) {
+  unsigned SPIRVMode = std::numeric_limits<unsigned>::max();
+  switch (RM) {
+  case RoundingMode::NearestTiesToEven:
+    SPIRVMode = SPIRV::FPRoundingMode::FPRoundingMode::RTE;
+    break;
+  case RoundingMode::TowardNegative:
+    SPIRVMode = SPIRV::FPRoundingMode::FPRoundingMode::RTN;
+    break;
+  case RoundingMode::TowardPositive:
+    SPIRVMode = SPIRV::FPRoundingMode::FPRoundingMode::RTP;
+    break;
+  case RoundingMode::TowardZero:
+    SPIRVMode = SPIRV::FPRoundingMode::FPRoundingMode::RTZ;
+    break;
+  default:
+    return;
+  }
+  LLVMContext &Ctx = Inst->getContext();
+  Type *Int32Ty = Type::getInt32Ty(Ctx);
+  MDNode *RMNode = MDNode::get(
+      Ctx, {ConstantAsMetadata::get(
+                ConstantInt::get(Int32Ty, SPIRV::Decoration::FPRoundingMode)),
+            ConstantAsMetadata::get(ConstantInt::get(Int32Ty, SPIRVMode))});
+  Builder.SetInsertPoint(Inst->getNextNode());
+  Builder.CreateIntrinsic(
+      Intrinsic::spv_assign_decoration, {Inst->getType()},
+      {Inst, MetadataAsValue::get(Ctx, MDNode::get(Ctx, {RMNode}))});
+}
+
+// Lower a new-style FP binary intrinsic (e.g. @llvm.fmul.f32) with an optional
+// fp.control bundle to a plain FP instruction. If the bundle specifies a
+// non-dynamic rounding mode, also attach a SPIRV FPRoundingMode decoration to
+// the emitted instruction so the rounding mode is preserved.
 static void
-lowerConstrainedFmuladd(IntrinsicInst *II,
+lowerNewFPBinopForSPIRV(CallInst *CI, Instruction::BinaryOps PlainOpc,
                         SmallVector<Instruction *> &EraseFromParent) {
-  auto *FPI = cast<ConstrainedFPIntrinsic>(II);
-  Value *A = FPI->getArgOperand(0);
-  Value *Mul = FPI->getArgOperand(1);
-  Value *Add = FPI->getArgOperand(2);
-  IRBuilder<> Builder(II->getParent());
-  Builder.SetInsertPoint(II);
-  std::optional<RoundingMode> Rounding = FPI->getRoundingMode();
-  Value *Product = Builder.CreateFMul(A, Mul, II->getName() + ".mul");
-  Value *Result = Builder.CreateConstrainedFPBinOp(
-      Intrinsic::experimental_constrained_fadd, Product, Add, {},
-      II->getName() + ".add", nullptr, Rounding);
-  II->replaceAllUsesWith(Result);
-  EraseFromParent.push_back(II);
+  IRBuilder<> Builder(CI->getParent());
+  Builder.SetInsertPoint(CI);
+  Value *A = CI->getArgOperand(0);
+  Value *B = CI->getArgOperand(1);
+  Value *Result = Builder.CreateBinOp(PlainOpc, A, B, CI->getName());
+
+  // If a non-dynamic rounding mode is specified via fp.control bundle, attach
+  // a SPIRV FPRoundingMode decoration to the new plain instruction.
+  RoundingMode RM = CI->getRoundingMode();
+  if (CI->getOperandBundle(LLVMContext::OB_fp_control).has_value() &&
+      RM != RoundingMode::Dynamic)
+    attachFPRoundingModeDecoration(cast<Instruction>(Result), RM, Builder);
+
+  CI->replaceAllUsesWith(Result);
+  EraseFromParent.push_back(CI);
 }
 
+// Lower @llvm.fmuladd with an optional fp.control bundle. Only called when the
+// bundle carries a non-dynamic rounding mode (checked by the caller). We expand
+// to fmul + fadd and attach FPRoundingMode decorations to both instructions so
+// the rounding mode is preserved in the SPIRV output.
+static void
+lowerNewFmuladd(CallInst *CI,
+                SmallVector<Instruction *> &EraseFromParent) {
+  IRBuilder<> Builder(CI->getParent());
+  Builder.SetInsertPoint(CI);
+  Value *A = CI->getArgOperand(0);
+  Value *Mul = CI->getArgOperand(1);
+  Value *Add = CI->getArgOperand(2);
+  Value *Product = Builder.CreateFMul(A, Mul, CI->getName() + ".mul");
+  Value *Result = Builder.CreateFAdd(Product, Add, CI->getName() + ".add");
+
+  RoundingMode RM = CI->getRoundingMode();
+  attachFPRoundingModeDecoration(cast<Instruction>(Product), RM, Builder);
+  attachFPRoundingModeDecoration(cast<Instruction>(Result), RM, Builder);
+
+  CI->replaceAllUsesWith(Result);
+  EraseFromParent.push_back(CI);
+}
+
+
 // Substitutes calls to LLVM intrinsics with either calls to SPIR-V intrinsics
 // or calls to proper generated functions. Returns True if F was modified.
 bool SPIRVPrepareFunctions::substituteIntrinsicCalls(Function *F) {
@@ -480,16 +530,71 @@ bool SPIRVPrepareFunctions::substituteIntrinsicCalls(Function *F) {
         lowerPtrAnnotation(II);
         Changed = true;
         break;
-      case Intrinsic::experimental_constrained_fmuladd:
-        lowerConstrainedFmuladd(II, EraseFromParent);
+      // New-style FP intrinsics (produced by auto-upgrade of constrained ops):
+      // lower to a plain FP instruction. Non-default rounding modes encoded in
+      // the fp.control bundle are preserved via a SPIRV FPRoundingMode
+      // decoration emitted by lowerNewFPBinopForSPIRV.
+      case Intrinsic::fadd:
+        lowerNewFPBinopForSPIRV(Call, Instruction::FAdd, EraseFromParent);
+        Changed = true;
+        break;
+      case Intrinsic::fsub:
+        lowerNewFPBinopForSPIRV(Call, Instruction::FSub, EraseFromParent);
+        Changed = true;
+        break;
+      case Intrinsic::fmul:
+        lowerNewFPBinopForSPIRV(Call, Instruction::FMul, EraseFromParent);
+        Changed = true;
+        break;
+      case Intrinsic::fdiv:
+        lowerNewFPBinopForSPIRV(Call, Instruction::FDiv, EraseFromParent);
         Changed = true;
         break;
-      case Intrinsic::experimental_constrained_fcmp:
-      case Intrinsic::experimental_constrained_fcmps:
-        lowerConstrainedFPCmpIntrinsic(dyn_cast<ConstrainedFPCmpIntrinsic>(II),
-                                       EraseFromParent);
+      case Intrinsic::frem:
+        lowerNewFPBinopForSPIRV(Call, Instruction::FRem, EraseFromParent);
         Changed = true;
         break;
+      case Intrinsic::fmuladd:
+        // Only lower fmuladd when it carries a non-default fp.control bundle
+        // (non-dynamic rounding mode).  Without bundles, let SPIRV handle it
+        // natively so it can emit OpExtInst Fma via GLSL.std.450.
+        if (Call->getOperandBundle(LLVMContext::OB_fp_control).has_value() &&
+            Call->getRoundingMode() != RoundingMode::Dynamic) {
+          lowerNewFmuladd(Call, EraseFromParent);
+          Changed = true;
+        }
+        break;
+      case Intrinsic::fcmps: {
+        // Signaling FP compare – SPIRV has no separate signaling compare
+        // instruction; lower to a plain fcmp.
+        Value *LHS = Call->getArgOperand(0);
+        Value *RHS = Call->getArgOperand(1);
+        auto *PredMD =
+            cast<MetadataAsValue>(Call->getArgOperand(2))->getMetadata();
+        FCmpInst::Predicate Pred = StringSwitch<FCmpInst::Predicate>(
+                                       cast<MDString>(PredMD)->getString())
+                                       .Case("oeq", FCmpInst::FCMP_OEQ)
+                                       .Case("ogt", FCmpInst::FCMP_OGT)
+                                       .Case("oge", FCmpInst::FCMP_OGE)
+                                       .Case("olt", FCmpInst::FCMP_OLT)
+                                       .Case("ole", FCmpInst::FCMP_OLE)
+                                       .Case("one", FCmpInst::FCMP_ONE)
+                                       .Case("ord", FCmpInst::FCMP_ORD)
+                                       .Case("uno", FCmpInst::FCMP_UNO)
+                                       .Case("ueq", FCmpInst::FCMP_UEQ)
+                                       .Case("ugt", FCmpInst::FCMP_UGT)
+                                       .Case("uge", FCmpInst::FCMP_UGE)
+                                       .Case("ult", FCmpInst::FCMP_ULT)
+                                       .Case("ule", FCmpInst::FCMP_ULE)
+                                       .Case("une", FCmpInst::FCMP_UNE)
+                                       .Default(FCmpInst::BAD_FCMP_PREDICATE);
+        IRBuilder<> Builder(Call);
+        Value *FCmp = Builder.CreateFCmp(Pred, LHS, RHS, Call->getName());
+        Call->replaceAllUsesWith(FCmp);
+        EraseFromParent.push_back(Call);
+        Changed = true;
+        break;
+      }
       default:
         if (TM.getTargetTriple().getVendor() == Triple::AMD ||
             any_of(SPVAllowUnknownIntrinsics, [II](auto &&Prefix) {
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index aa3f0dcc3c534..574485a5c3674 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -2011,7 +2011,7 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
   // prevents it from being removed. In some cases however the side effect is
   // actually absent. To detect this case, call SimplifyConstrainedFPCall. If it
   // returns a replacement, the call may be removed.
-  if (CI.use_empty() && isa<ConstrainedFPIntrinsic>(CI)) {
+  if (CI.use_empty() && Intrinsic::isConstrainedFPIntrinsic(CI.getIntrinsicID())) {
     if (simplifyConstrainedFPCall(&CI, SQ.getWithInstruction(&CI)))
       return eraseInstFromFunction(CI);
   }
@@ -3271,7 +3271,8 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
     ///
     // TODO: If we cared, should insert a canonicalize for x
     Value *SelectCond, *SelectLHS, *SelectRHS;
-    if (match(II->getArgOperand(1),
+    if (!II->use_empty() &&
+        match(II->getArgOperand(1),
               m_OneUse(m_Select(m_Value(SelectCond), m_Value(SelectLHS),
                                 m_Value(SelectRHS))))) {
       Value *NewLdexp = nullptr;
diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
index ff3d55e5747e5..22404c17dcf56 100644
--- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
+++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
@@ -109,28 +109,42 @@ struct SimpleValue {
     if (CallInst *CI = dyn_cast<CallInst>(Inst)) {
       if (Function *F = CI->getCalledFunction()) {
         switch (F->getIntrinsicID()) {
-        case Intrinsic::experimental_constrained_fadd:
-        case Intrinsic::experimental_constrained_fsub:
-        case Intrinsic::experimental_constrained_fmul:
-        case Intrinsic::experimental_constrained_fdiv:
-        case Intrinsic::experimental_constrained_frem:
-        case Intrinsic::experimental_constrained_fptosi:
-        case Intrinsic::experimental_constrained_sitofp:
-        case Intrinsic::experimental_constrained_fptoui:
-        case Intrinsic::experimental_constrained_uitofp:
-        case Intrinsic::experimental_constrained_fcmp:
-        case Intrinsic::experimental_constrained_fcmps: {
-          auto *CFP = cast<ConstrainedFPIntrinsic>(CI);
-          if (CFP->getExceptionBehavior() &&
-              CFP->getExceptionBehavior() == fp::ebStrict)
+        // New-form FP intrinsics (llvm.fadd, llvm.fsub, etc.) with fp.control
+        // and/or fp.except operand bundles follow the same CSE rules as their
+        // constrained predecessors:
+        //   - ebStrict or absent exception behavior → no CSE
+        //   - Dynamic or absent rounding mode → no CSE (unknown mode)
+        //   - Fixed non-strict EB + known RM → CSE allowed
+        case Intrinsic::fadd:
+        case Intrinsic::fsub:
+        case Intrinsic::fmul:
+        case Intrinsic::fdiv:
+        case Intrinsic::frem:
+        case Intrinsic::fptosi:
+        case Intrinsic::fptoui:
+        case Intrinsic::sitofp:
+        case Intrinsic::uitofp:
+        case Intrinsic::fcmp: {
+          // If there are no fp bundles at all, the call is memory(none) and
+          // the generic doesNotAccessMemory() path below handles it.
+          bool HasControl =
+              CI->getOperandBundle(LLVMContext::OB_fp_control).has_value();
+          bool HasExcept =
+              CI->getOperandBundle(LLVMContext::OB_fp_except).has_value();
+          if (!HasControl && !HasExcept)
+            break; // fall through to doesNotAccessMemory() check
+          // ebStrict means exceptions matter; don't CSE.
+          if (CI->getExceptionBehavior() == fp::ebStrict)
             return false;
-          // Since we CSE across function calls we must not allow
-          // the rounding mode to change.
-          if (CFP->getRoundingMode() &&
-              CFP->getRoundingMode() == RoundingMode::Dynamic)
+          // Dynamic rounding mode means result depends on runtime mode; don't CSE.
+          if (CI->getRoundingMode() == RoundingMode::Dynamic)
             return false;
           return true;
         }
+        case Intrinsic::fcmps:
+          // Signaling compare; CSE is safe when exceptions are not strict.
+          // Without an fp.except bundle in non-strictfp code, ebIgnore applies.
+          return CI->getExceptionBehavior() != fp::ebStrict;
         }
       }
       return CI->doesNotAccessMemory() &&
@@ -1517,10 +1531,12 @@ bool EarlyCSE::processNode(DomTreeNode *Node) {
 
     // If this is a simple instruction that we can value number, process it.
     if (SimpleValue::canHandle(&Inst)) {
-      if ([[maybe_unused]] auto *CI = dyn_cast<ConstrainedFPIntrinsic>(&Inst)) {
+      if ([[maybe_unused]] auto *CI = dyn_cast<IntrinsicInst>(&Inst);
+          CI && Intrinsic::isConstrainedFPIntrinsic(CI->getIntrinsicID())) {
         assert(CI->getExceptionBehavior() != fp::ebStrict &&
                "Unexpected ebStrict from SimpleValue::canHandle()");
-        assert((!CI->getRoundingMode() ||
+        // fcmps has no rounding mode; only check RM for operations that do.
+        assert((CI->getIntrinsicID() == Intrinsic::fcmps ||
                 CI->getRoundingMode() != RoundingMode::Dynamic) &&
                "Unexpected dynamic rounding from SimpleValue::canHandle()");
       }
diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index 8bf941cf19cd9..131c15b0351ac 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -458,73 +458,11 @@ struct PruningFunctionCloner {
 
 Instruction *
 PruningFunctionCloner::cloneInstruction(BasicBlock::const_iterator II) {
-  const Instruction &OldInst = *II;
-  Instruction *NewInst = nullptr;
-  if (HostFuncIsStrictFP) {
-    Intrinsic::ID CIID = getConstrainedIntrinsicID(OldInst);
-    if (CIID != Intrinsic::not_intrinsic) {
-      // Instead of cloning the instruction, a call to constrained intrinsic
-      // should be created.
-      // Assume the first arguments of constrained intrinsics are the same as
-      // the operands of original instruction.
-
-      // Determine overloaded types of the intrinsic.
-      SmallVector<Type *, 2> TParams;
-      SmallVector<Intrinsic::IITDescriptor, 8> Descriptor;
-      getIntrinsicInfoTableEntries(CIID, Descriptor);
-      for (unsigned I = 0, E = Descriptor.size(); I != E; ++I) {
-        Intrinsic::IITDescriptor Operand = Descriptor[I];
-        switch (Operand.Kind) {
-        case Intrinsic::IITDescriptor::Overloaded:
-          if (Operand.getOverloadKind() !=
-              Intrinsic::IITDescriptor::AK_MatchType) {
-            if (I == 0)
-              TParams.push_back(OldInst.getType());
-            else
-              TParams.push_back(OldInst.getOperand(I - 1)->getType());
-          }
-          break;
-        case Intrinsic::IITDescriptor::SameVecWidth:
-          ++I;
-          break;
-        default:
-          break;
-        }
-      }
-
-      // Create intrinsic call.
-      LLVMContext &Ctx = NewFunc->getContext();
-      Function *IFn = Intrinsic::getOrInsertDeclaration(NewFunc->getParent(),
-                                                        CIID, TParams);
-      SmallVector<Value *, 4> Args;
-      unsigned NumOperands = OldInst.getNumOperands();
-      if (isa<CallInst>(OldInst))
-        --NumOperands;
-      for (unsigned I = 0; I < NumOperands; ++I) {
-        Value *Op = OldInst.getOperand(I);
-        Args.push_back(Op);
-      }
-      if (const auto *CmpI = dyn_cast<FCmpInst>(&OldInst)) {
-        FCmpInst::Predicate Pred = CmpI->getPredicate();
-        StringRef PredName = FCmpInst::getPredicateName(Pred);
-        Args.push_back(MetadataAsValue::get(Ctx, MDString::get(Ctx, PredName)));
-      }
-
-      // The last arguments of a constrained intrinsic are metadata that
-      // represent rounding mode (absents in some intrinsics) and exception
-      // behavior. The inlined function uses default settings.
-      if (Intrinsic::hasConstrainedFPRoundingModeOperand(CIID))
-        Args.push_back(
-            MetadataAsValue::get(Ctx, MDString::get(Ctx, "round.tonearest")));
-      Args.push_back(
-          MetadataAsValue::get(Ctx, MDString::get(Ctx, "fpexcept.ignore")));
-
-      NewInst = CallInst::Create(IFn, Args, OldInst.getName() + ".strict");
-    }
-  }
-  if (!NewInst)
-    NewInst = II->clone();
-  return NewInst;
+  // Plain FP instructions cloned into a strictfp function are semantically
+  // correct without conversion — the strictfp attribute on the function
+  // governs their behavior. The old path that replaced them with
+  // experimental_constrained_* intrinsics is no longer needed.
+  return II->clone();
 }
 
 /// The specified block is found to be reachable, clone it and
@@ -1003,12 +941,10 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
 /// constant arguments cause a significant amount of code in the callee to be
 /// dead.  Since this doesn't produce an exact copy of the input, it can't be
 /// used for things like CloneFunction or CloneModule.
-void llvm::CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
-                                     ValueToValueMapTy &VMap,
-                                     bool ModuleLevelChanges,
-                                     SmallVectorImpl<ReturnInst *> &Returns,
-                                     const char *NameSuffix,
-                                     ClonedCodeInfo &CodeInfo) {
+void llvm::CloneAndPruneFunctionInto(
+    Function *NewFunc, const Function *OldFunc, ValueToValueMapTy &VMap,
+    bool ModuleLevelChanges, SmallVectorImpl<ReturnInst *> &Returns,
+    const char *NameSuffix, ClonedCodeInfo &CodeInfo) {
   CloneAndPruneIntoFromInst(NewFunc, OldFunc, &OldFunc->front().front(), VMap,
                             ModuleLevelChanges, Returns, NameSuffix, CodeInfo);
 }
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 81decd7f9c33b..7a2eaf496d652 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -503,10 +503,10 @@ bool llvm::wouldInstructionBeTriviallyDead(const Instruction *I,
       return false;
     }
 
-    if (auto *FPI = dyn_cast<ConstrainedFPIntrinsic>(I)) {
-      std::optional<fp::ExceptionBehavior> ExBehavior =
-          FPI->getExceptionBehavior();
-      return *ExBehavior != fp::ebStrict;
+    if (auto *II = dyn_cast<IntrinsicInst>(I);
+        II && Intrinsic::isConstrainedFPIntrinsic(II->getIntrinsicID())) {
+      // llvm.fcmps uses the fp.except operand bundle for exception behavior.
+      return II->getExceptionBehavior() != fp::ebStrict;
     }
   }
 
diff --git a/llvm/test/Analysis/CostModel/ARM/mve-intrinsic-cost-kinds.ll b/llvm/test/Analysis/CostModel/ARM/mve-intrinsic-cost-kinds.ll
index b3ad818df394e..933ff04a1174a 100644
--- a/llvm/test/Analysis/CostModel/ARM/mve-intrinsic-cost-kinds.ll
+++ b/llvm/test/Analysis/CostModel/ARM/mve-intrinsic-cost-kinds.ll
@@ -78,8 +78,8 @@ define void @log2(float %a, <16 x float> %va) {
 
 define void @constrained_fadd(float %a, <16 x float> %va) strictfp {
 ; CHECK-LABEL: 'constrained_fadd'
-; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-; CHECK-NEXT:  Cost Model: Found costs of 48 for: %t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+; CHECK-NEXT:  Cost Model: Found costs of 1 for: %s1 = call float @llvm.fadd.f32(float %a, float %a) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:  Cost Model: Found costs of 48 for: %t2 = call <16 x float> @llvm.fadd.v16f32(<16 x float> %va, <16 x float> %va) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll b/llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll
index 8ed8b2e78e87e..7a463a5297819 100644
--- a/llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll
+++ b/llvm/test/Analysis/CostModel/X86/intrinsic-cost-kinds.ll
@@ -180,23 +180,23 @@ define void @log2(float %a, <16 x float> %va) {
 
 define void @constrained_fadd(float %a, <16 x float> %va) strictfp {
 ; THRU-LABEL: 'constrained_fadd'
-; THRU-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-; THRU-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+; THRU-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s1 = call float @llvm.fadd.f32(float %a, float %a) [ "fp.except"(metadata !"ignore") ]
+; THRU-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t2 = call <16 x float> @llvm.fadd.v16f32(<16 x float> %va, <16 x float> %va) [ "fp.except"(metadata !"ignore") ]
 ; THRU-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; LATE-LABEL: 'constrained_fadd'
-; LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-; LATE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+; LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s1 = call float @llvm.fadd.f32(float %a, float %a) [ "fp.except"(metadata !"ignore") ]
+; LATE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t2 = call <16 x float> @llvm.fadd.v16f32(<16 x float> %va, <16 x float> %va) [ "fp.except"(metadata !"ignore") ]
 ; LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
 ; SIZE-LABEL: 'constrained_fadd'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s1 = call float @llvm.fadd.f32(float %a, float %a) [ "fp.except"(metadata !"ignore") ]
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t2 = call <16 x float> @llvm.fadd.v16f32(<16 x float> %va, <16 x float> %va) [ "fp.except"(metadata !"ignore") ]
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
 ; SIZE_LATE-LABEL: 'constrained_fadd'
-; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %va, <16 x float> %va, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %s1 = call float @llvm.fadd.f32(float %a, float %a) [ "fp.except"(metadata !"ignore") ]
+; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %t2 = call <16 x float> @llvm.fadd.v16f32(<16 x float> %va, <16 x float> %va) [ "fp.except"(metadata !"ignore") ]
 ; SIZE_LATE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %s = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Assembler/fp-intrinsics-attr.bc b/llvm/test/Assembler/fp-intrinsics-attr.bc
new file mode 100644
index 0000000000000000000000000000000000000000..12882b9879cec21b8b70b1d0e5bd046e9749d6dc
GIT binary patch
literal 3960
zcmZWr4^R_l7XR{xY?QbO8f{|!xDo2mIf_d_Neo;Qf;Kp61I0Ufwr7@*)Kmx|35d3J
z5|R>B^rEGfYi&m><Mih0jb5?pl)Fnvq^PMP6-$qd2Le`mR;k)vJLk;Z+aN*lo8;Sl
z?|a|-e((3*+ub64=;|^dLMaHLN*<Z9_NBicdGo)WFGzYbiW$ljq~IY$vJgsTDi9NF
z(&3MJ)A#Cn*(Dr-QoWDi=RKXm70cPXDm}uL%%Yv?{*0&m21Qk-LqP5qR&@yWn$8wA
z^U`MRQC0a3*Mez_4P_plSh1g3)yqxGP-|E16r5)$%luqrxv`u$d`i2@>E(%)&HVkp
zQg(>MCAhLGTHy>T(pLqa7RXk(dt3Gd6~)c1Y-JuoI|zj40j9-Pw&1TtO(kh%Fzv1D
zHX3HnTC%byrKoHPmsrRUG27FTuZ*pTwx at GB%+;lS@`#{_Tgq0PBcho;!AD$uEX`At
z_Fg*!p;8e-Qt&Bh8U7y(EQBvF?xyefp=Z!`^ngJy8qj^6DK(4RM9{z4k|QGt%VD9%
zjUJ$%Sz6|I=!P$}Kr3MD=$~+TarAKy^$8D&?X(9t4wb;UKsB1zWGW)oaGPf_7-*ql
z4Sv at 1`wzd`82+UzzTngIci$CXUH$pZyPdCpD8IOYTJg&5%1;9oNQaPs_K_F at L`D~1
z7)DHOLZl}GnOaV8cuaOEsnVj5;NLjG=6WFJ$>BjB#g%9BW(g8FCUp#?&-z`9kU*R%
zD=38Kit$i}7C~`59Bjlpe$89rK at h{k!9}9VA6aiBhIUE=5XtE39ZMKYPf|q#v&>t|
zWTPYs8rbHn_t at M-#U(6hU<+PWr3w%`xswL=k4<w4#7(MbV1K>@z^1vNfjJHXFd}J1
z1A7Pi6fwx8iUzim55P=G6%A})(Yl$8ZC(k$9%pR99`Xsslr{kNIAeAFPcZhNc=b%i
z!aJigfz5sQ>`Y+mmtLF+Y**EXDKmgwiD`i`x}+utF!srD?}C}YTwgr_O#8(XsXTq*
z31G7qoqGb<b5Ann at 0iKhZ|10HFg91h-q{c36s<()J3yPu^+=;5Qmc?r*rHeYBdtXq
zY-=OtYOmv6NBr*v_<awa%PW$1hh-NINv&i`!ptX$u?l<jg{`c at SYxZ(fxXVfwi*Hs
z2e!?@YG at 2Nh=4ejhTjU}BP8EW@<%7|2_v2>h{~=iq=O1sHzl`{J2YKBO`l%f$tvjb
z75G`2*o3gbguT8A%ik1m!t4n+f&qsKYskbJeog_mTRee{+(4Zu;A{_UZTDu~p1^Mp
z<3EP*TO_Np_mFh#kTjN<<COHALK^kT&QY8zLA|<HuWnDwry{C0y{d(!X^Sj{LmxD0
zF0(YpBD^nb0Y^o^>F~KXVU4DMTNH2~42Unh$iI#Fca8X15Fgd at M+p2DL_uWA;$eBW
zPTsGO#=Y|Ruykxh_8miR6_u)v=~o=^X=28Vhk=*^TN<&(_R=-?gM69+`D^lXl=Ox|
zb~P;9&XCi96|xD6Q~s7u6^m$&=?j672i*l at dQA(6k=8S_A=1DWz{AD1Wn#{D at 5`eC
z{8k7bH{$om^l-mUJ~r~y;E3!h05FI($9#pIK8Zh~YSpV_k%CiUb#DY>EQlF8>$n)8
zssjQm0^*?_{FaIT{V->O;E$91af1Jo5w{EQQ4$C7-Z1CpFg|9)e<stf-J+!3UfJNt
zQ at 0ed>k3&^A%hIwP%N?%A85Mtg{N5RHlI3H(FNIuV7CEILtu*|AU^*seuw0b4C6Po
z{F?{iln8_6`WS;)Hnu~l&<wl#AeCOfPCXm}>IxxKF>Umq?A#&Quugt8EVX8sNqmIP
zFCUzYlX``(UWB;}fsF?PZ+Kwu at is<5YowqpqUzJHfO3ifV?2Xmei0=Hs29SrGOy6O
z=2a5EL-6kmbM6 at V_i0!7_%P=>!5`^^U=M434V$p7+<>@j;i-rQ>aatviUEf#b<kH3
zV-*0IZ4ut8IXzIiP*I1Z+r9F;49H{8aSl`<_<>yH&Rs4FI6T<4kFl+pkg{;6PIZi>
z>58ZxFblgPE7}0trZ^${CqoYG^#Fn4Idt}^=DP_eh(8*lo8=zq^ZF8*qGN>mcV%MR
zpiln<&->@#nmXH=34MTVt8laf_cU$lzDUz`Y0sfKH*|EHLfn(}wIFZQgnw&<vgMD_
z#moQJh~Gcv at zMb4u4)XxIdn}`I1WTqoeWKfudpq`+zbsWjgLs<gmjFe=Nn#mHzn;m
zB<-Cn1S-tn-v{!5OAnsfGMSK+iz=Xh5SlCk<)DG;gJ(l8OOurewbKvYsBkep4)q1K
zVB(L!487aOxfkSrPvCb#c&gCl!SY2=IM}veV2de`vO^v6N!nPy23H|gA^l*z1aBMq
zQIEw6I((V~(8R35gT4Z2=wO6?sZp5-eFN8YZqo$)%9#cK#n1N!OK1C2BJgyswbWT1
zmPWJ1;m&rsoi?)@3}Cc~(EMWJu)r<y2P4fQDdILUx;e9)yc`u0uvz{&ZS2zX#F^I8
zo#cs at vWuGeTxq2d5nRp=X|bNiL+r42FYBgwO1DPOn+-Z*FYC1y9m7M2GMvUaIho+f
zmw|tx;H01?Q!o}@7Cnfx0^VG8s97X}Bg2a^@$yOCK2=|{NC-Ne5*rfum`Fz&#ypZr
z6Mg6|sXIQU3rVI9eIt1&DVoyx!&7!z-a~(jF*7-JF5#QR66)27c?o>1=kb`S#Y<z(
zaymFxG8&>GE>LTIxuU7lI~WUjIV2s;O8hc0C+T=|(y?jUu_Wm at Zki%vf|JCmN at 9(A
z+k7f+vk1O2n!<IdE&@B{NX=3S${;^IMLfe>@2|^Yto>55PN7(P^!?om-KFx;CF at Nm
zcMtWL(cJ#ae>=ky_79a1gL`!^|KZKAd;DZ*z!*C^RPMd0+P<L#wHtp>?LWF<UN^B;
zpYtF4>NE92`S!e$AJ(J|a8J~fK7U-b&j0n1B15 at pD7B2d%;ZJv&ovJl0=qZ#UlM5j
z at slayd419byZ+2I(NFeg*d*`7AKBis!m=MOcMb at n%LY!qBzv)V8S^iaeQAm>-9uN3
zirH^VR(nMIi2+WP!QdJ^DV`twG2Z#?{CD^K>$FbH^k%7Qa#Pk9)Ys(qj8+&vkPM`$
z%X5cM9To6S#IL*1yuQKDYJSTuAN^dd;5l-Hu6v)EQB(Rio#Vo$^nuUki+7m^8)^$V
zBxCn0l1s-`tJV&6t$%v2ei^HD<n(Dl3hM?Z*TAiPf$PiTor%*8q=R=p{lS(>U|hd9
zvz1lS_LePX@=A^|1YB0B2-}yL`jv_=ESb0drNMV>@he*wiTqwsD?{Ao6(3+M?C>u9
zL?;FZFzSfOZ)hbhSnNmNB)=A4fp3eCV0|vD?H5{Ut+$)2s~c*vEsYL~(^g}#yQ%7I
zbFCd3kh0k=RoPZ^jl-2~U7nXPv(`|L+}BrARbcUO%~j`gCk03C78@;O+j5nOoi3X_
zxoEDP_UNXj_0*$A)T36^V`d;Ilg?nzrlbISrUfvY764Gvs=D^oNAGjUN2O`cz)5le
z?xz==Hv4p>b_?aK-1?}dI<XFcCwEsrQ~@7TY}HVW_WGJ>(Pn>4q#A8Ck1szan=Q8L
z<X&raZLKpYRXc0z?Nv1I)2b=mBj-<AO`%ChmIljoF5FHi4cdb;H=E2RU$wc3{Cu;O
zqH>m1<tmjbD%VP>^ORM2>Rgq&(yYo~PUWlfmCGts7PrOaUaE7sEH#zY7Uxo{<5^$_
VIC0s`u4gH?+nHTm4Yfy){{sc-+=BoB

literal 0
HcmV?d00001

diff --git a/llvm/test/Assembler/fp-intrinsics-attr.ll b/llvm/test/Assembler/fp-intrinsics-attr.ll
index 5b9a44710763e..edacb42a3bc8a 100644
--- a/llvm/test/Assembler/fp-intrinsics-attr.ll
+++ b/llvm/test/Assembler/fp-intrinsics-attr.ll
@@ -1,11 +1,63 @@
 ; RUN: llvm-as < %s | llvm-dis | FileCheck %s
 
-; Test to verify that constrained intrinsics all have the strictfp attribute.
-; Ordering is from Intrinsics.td.
+; Test to verify that constrained intrinsics are auto-upgraded on bitcode load.
+; With default rounding mode (dynamic) and exception behavior (strict), arithmetic
+; ops become plain instructions and math intrinsics become non-constrained calls.
+; fcmps (signaling compare) is upgraded to llvm.fcmps with no fp.except bundle
+; (strict is the default in a strictfp function).
 
 define void @func(double %a, double %b, double %c, i32 %i) strictfp {
 ; CHECK-LABEL: define void @func
 ; CHECK-SAME: (double [[A:%.*]], double [[B:%.*]], double [[C:%.*]], i32 [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK:      [[ADD:%.*]] = fadd double [[A]], [[B]]
+; CHECK-NEXT: [[SUB:%.*]] = fsub double [[A]], [[B]]
+; CHECK-NEXT: [[MUL:%.*]] = fmul double [[A]], [[B]]
+; CHECK-NEXT: [[DIV:%.*]] = fdiv double [[A]], [[B]]
+; CHECK-NEXT: [[REM:%.*]] = frem double [[A]], [[B]]
+; CHECK-NEXT: {{.*}} = call double @llvm.fma.f64(double [[A]], double [[B]], double [[C]])
+; CHECK-NEXT: {{.*}} = call double @llvm.fmuladd.f64(double [[A]], double [[B]], double [[C]])
+; CHECK-NEXT: {{.*}} = fptosi double [[A]] to i32
+; CHECK-NEXT: {{.*}} = fptoui double [[A]] to i32
+; CHECK-NEXT: {{.*}} = sitofp i32 [[I]] to double
+; CHECK-NEXT: {{.*}} = uitofp i32 [[I]] to double
+; CHECK-NEXT: [[FPTRUNC:%.*]] = fptrunc double [[A]] to float
+; CHECK-NEXT: {{.*}} = fpext float [[FPTRUNC]] to double
+; CHECK-NEXT: {{.*}} = call double @llvm.sqrt.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.powi.f64.i32(double [[A]], i32 [[I]])
+; CHECK-NEXT: {{.*}} = call double @llvm.sin.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.cos.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.tan.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.acos.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.asin.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.atan.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.atan2.f64(double [[A]], double [[B]])
+; CHECK-NEXT: {{.*}} = call double @llvm.cosh.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.sinh.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.tanh.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.pow.f64(double [[A]], double [[B]])
+; CHECK-NEXT: {{.*}} = call double @llvm.log.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.log10.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.log2.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.exp.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.exp2.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.rint.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.nearbyint.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call i32 @llvm.lrint.i32.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call i64 @llvm.llrint.i64.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.maxnum.f64(double [[A]], double [[B]])
+; CHECK-NEXT: {{.*}} = call double @llvm.minnum.f64(double [[A]], double [[B]])
+; CHECK-NEXT: {{.*}} = call double @llvm.maximum.f64(double [[A]], double [[B]])
+; CHECK-NEXT: {{.*}} = call double @llvm.minimum.f64(double [[A]], double [[B]])
+; CHECK-NEXT: {{.*}} = call double @llvm.ceil.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.floor.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call i32 @llvm.lround.i32.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call i64 @llvm.llround.i64.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.round.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.roundeven.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = call double @llvm.trunc.f64(double [[A]])
+; CHECK-NEXT: {{.*}} = fcmp oeq double [[A]], [[B]]
+; CHECK-NEXT: {{.*}} = call i1 @llvm.fcmps.f64(double [[A]], double [[B]], metadata !"oeq")
+; CHECK-NEXT: ret void
 
   %add = call double @llvm.experimental.constrained.fadd.f64(
                                                double %a, double %b,
@@ -229,154 +281,98 @@ define void @func(double %a, double %b, double %c, i32 %i) strictfp {
                                                metadata !"oeq",
                                                metadata !"fpexcept.strict")
 
-; CHECK: ret void
   ret void
 }
 
-declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fadd.f64({{.*}}) #[[ATTR1:[0-9]+]]
+; fcmps is auto-upgraded to the new llvm.fcmps intrinsic (3 args: float, float, metadata pred).
+; Plain intrinsic declarations are emitted for the upgraded calls.
+; CHECK-DAG: declare i1 @llvm.fcmps.f64(double, double, metadata) #[[ATTR1:[0-9]+]]
+; CHECK-DAG: declare double @llvm.fma.f64(double, double, double)
+; CHECK-DAG: declare double @llvm.fmuladd.f64(double, double, double)
+; CHECK-DAG: declare double @llvm.sqrt.f64(double)
+; CHECK-DAG: declare double @llvm.powi.f64.i32(double, i32)
+; CHECK-DAG: declare double @llvm.sin.f64(double)
+; CHECK-DAG: declare double @llvm.cos.f64(double)
+; CHECK-DAG: declare double @llvm.tan.f64(double)
+; CHECK-DAG: declare double @llvm.asin.f64(double)
+; CHECK-DAG: declare double @llvm.acos.f64(double)
+; CHECK-DAG: declare double @llvm.atan.f64(double)
+; CHECK-DAG: declare double @llvm.atan2.f64(double, double)
+; CHECK-DAG: declare double @llvm.sinh.f64(double)
+; CHECK-DAG: declare double @llvm.cosh.f64(double)
+; CHECK-DAG: declare double @llvm.tanh.f64(double)
+; CHECK-DAG: declare double @llvm.pow.f64(double, double)
+; CHECK-DAG: declare double @llvm.log.f64(double)
+; CHECK-DAG: declare double @llvm.log10.f64(double)
+; CHECK-DAG: declare double @llvm.log2.f64(double)
+; CHECK-DAG: declare double @llvm.exp.f64(double)
+; CHECK-DAG: declare double @llvm.exp2.f64(double)
+; CHECK-DAG: declare double @llvm.rint.f64(double)
+; CHECK-DAG: declare double @llvm.nearbyint.f64(double)
+; CHECK-DAG: declare i32 @llvm.lrint.i32.f64(double)
+; CHECK-DAG: declare i64 @llvm.llrint.i64.f64(double)
+; CHECK-DAG: declare double @llvm.maxnum.f64(double, double)
+; CHECK-DAG: declare double @llvm.minnum.f64(double, double)
+; CHECK-DAG: declare double @llvm.maximum.f64(double, double)
+; CHECK-DAG: declare double @llvm.minimum.f64(double, double)
+; CHECK-DAG: declare double @llvm.ceil.f64(double)
+; CHECK-DAG: declare double @llvm.floor.f64(double)
+; CHECK-DAG: declare i32 @llvm.lround.i32.f64(double)
+; CHECK-DAG: declare i64 @llvm.llround.i64.f64(double)
+; CHECK-DAG: declare double @llvm.round.f64(double)
+; CHECK-DAG: declare double @llvm.roundeven.f64(double)
+; CHECK-DAG: declare double @llvm.trunc.f64(double)
 
+declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
 declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fsub.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fmul.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fdiv.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.frem.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fma.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.fmuladd.f64(double, double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fmuladd.f64({{.*}}) #[[ATTR1]]
-
 declare i32 @llvm.experimental.constrained.fptosi.i32.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.fptosi.i32.f64({{.*}}) #[[ATTR1]]
-
 declare i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.fptoui.i32.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.sitofp.f64.i32(i32, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.sitofp.f64.i32({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.uitofp.f64.i32(i32, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.uitofp.f64.i32({{.*}}) #[[ATTR1]]
-
 declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fptrunc.f32.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)
-; CHECK: @llvm.experimental.constrained.fpext.f64.f32({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.sqrt.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.powi.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.sin.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.cos.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.tan.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.tan.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.asin.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.asin.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.acos.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.acos.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.atan.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.atan.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.atan2.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.atan2.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.sinh.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.sinh.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.cosh.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.cosh.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.tanh.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.tanh.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.pow.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.log.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.log10.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.log2.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.exp.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.exp2.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.rint.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.nearbyint.f64({{.*}}) #[[ATTR1]]
-
 declare i32 @llvm.experimental.constrained.lrint.i32.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.lrint.i32.f64({{.*}}) #[[ATTR1]]
-
 declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.llrint.i64.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.maxnum.f64(double, double, metadata)
-; CHECK: @llvm.experimental.constrained.maxnum.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.minnum.f64(double, double, metadata)
-; CHECK: @llvm.experimental.constrained.minnum.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.maximum.f64(double, double, metadata)
-; CHECK: @llvm.experimental.constrained.maximum.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.minimum.f64(double, double, metadata)
-; CHECK: @llvm.experimental.constrained.minimum.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.ceil.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.ceil.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.floor.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.floor.f64({{.*}}) #[[ATTR1]]
-
 declare i32 @llvm.experimental.constrained.lround.i32.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.lround.i32.f64({{.*}}) #[[ATTR1]]
-
 declare i64 @llvm.experimental.constrained.llround.i64.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.llround.i64.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.round.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.round.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.roundeven.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.roundeven.f64({{.*}}) #[[ATTR1]]
-
 declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
-; CHECK: @llvm.experimental.constrained.trunc.f64({{.*}}) #[[ATTR1]]
-
 declare i1 @llvm.experimental.constrained.fcmp.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fcmp.f64({{.*}}) #[[ATTR1]]
-
 declare i1 @llvm.experimental.constrained.fcmps.f64(double, double, metadata, metadata)
-; CHECK: @llvm.experimental.constrained.fcmps.f64({{.*}}) #[[ATTR1]]
 
+; The function retains its strictfp attribute after upgrade.
 ; CHECK: attributes #[[ATTR0]] = {{{.*}} strictfp {{.*}}}
-; CHECK: attributes #[[ATTR1]] = { {{.*}} strictfp {{.*}} }
-
+; The fcmps declaration gets willreturn + inaccessiblemem attributes.
+; CHECK: attributes #[[ATTR1]] = { nocreateundeforpoison nounwind willreturn memory(inaccessiblemem: readwrite) }
diff --git a/llvm/test/Assembler/fp-intrinsics-nondefault.ll b/llvm/test/Assembler/fp-intrinsics-nondefault.ll
new file mode 100644
index 0000000000000..e6a53acf464fa
--- /dev/null
+++ b/llvm/test/Assembler/fp-intrinsics-nondefault.ll
@@ -0,0 +1,257 @@
+; RUN: llvm-as < %s | llvm-dis | FileCheck %s
+;
+; Test auto-upgrade of experimental.constrained.* intrinsics with non-default
+; rounding modes and/or exception behaviors.  These produce new FP intrinsics
+; with fp.control and/or fp.except operand bundles.
+;
+; Terminology:
+;   Default  RM = round.dynamic   -> no fp.control bundle
+;   Default  EB = fpexcept.strict -> no fp.except  bundle
+;   Non-default RM (rte/rtz/rtp/rtn) -> fp.control bundle
+;   Non-default EB (ignore/maytrap)  -> fp.except  bundle
+
+; ===========================================================================
+; 1.  Non-default rounding modes (all 4)
+; ===========================================================================
+
+; CHECK-LABEL: define float @rte(
+; CHECK:         call float @llvm.fadd.f32(float %a, float %b) [ "fp.control"(metadata !"rte") ]
+define float @rte(float %a, float %b) strictfp {
+  %r = call float @llvm.experimental.constrained.fadd.f32(
+      float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; CHECK-LABEL: define float @rtz(
+; CHECK:         call float @llvm.fadd.f32(float %a, float %b) [ "fp.control"(metadata !"rtz") ]
+define float @rtz(float %a, float %b) strictfp {
+  %r = call float @llvm.experimental.constrained.fadd.f32(
+      float %a, float %b, metadata !"round.towardzero", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; CHECK-LABEL: define float @rtp(
+; CHECK:         call float @llvm.fadd.f32(float %a, float %b) [ "fp.control"(metadata !"rtp") ]
+define float @rtp(float %a, float %b) strictfp {
+  %r = call float @llvm.experimental.constrained.fadd.f32(
+      float %a, float %b, metadata !"round.upward", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; CHECK-LABEL: define float @rtn(
+; CHECK:         call float @llvm.fadd.f32(float %a, float %b) [ "fp.control"(metadata !"rtn") ]
+define float @rtn(float %a, float %b) strictfp {
+  %r = call float @llvm.experimental.constrained.fadd.f32(
+      float %a, float %b, metadata !"round.downward", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; ===========================================================================
+; 2.  Non-default exception behaviors (ignore and maytrap)
+; ===========================================================================
+
+; CHECK-LABEL: define double @eb_ignore(
+; CHECK:         call double @llvm.sqrt.f64(double %a) [ "fp.except"(metadata !"ignore") ]
+define double @eb_ignore(double %a) strictfp {
+  %r = call double @llvm.experimental.constrained.sqrt.f64(
+      double %a, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+  ret double %r
+}
+
+; CHECK-LABEL: define double @eb_maytrap(
+; CHECK:         call double @llvm.sqrt.f64(double %a) [ "fp.except"(metadata !"maytrap") ]
+define double @eb_maytrap(double %a) strictfp {
+  %r = call double @llvm.experimental.constrained.sqrt.f64(
+      double %a, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+  ret double %r
+}
+
+; ===========================================================================
+; 3.  Both non-default RM and EB
+; ===========================================================================
+
+; CHECK-LABEL: define float @rte_maytrap(
+; CHECK:         call float @llvm.fmul.f32(float %a, float %b) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+define float @rte_maytrap(float %a, float %b) strictfp {
+  %r = call float @llvm.experimental.constrained.fmul.f32(
+      float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+  ret float %r
+}
+
+; CHECK-LABEL: define float @rtz_ignore(
+; CHECK:         call float @llvm.fdiv.f32(float %a, float %b) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+define float @rtz_ignore(float %a, float %b) strictfp {
+  %r = call float @llvm.experimental.constrained.fdiv.f32(
+      float %a, float %b, metadata !"round.towardzero", metadata !"fpexcept.ignore")
+  ret float %r
+}
+
+; ===========================================================================
+; 4.  Ops without rounding mode (fptoui, fpext, ceil) with non-default EB
+; ===========================================================================
+
+; CHECK-LABEL: define i32 @fptoui_maytrap(
+; CHECK:         call i32 @llvm.fptoui.i32.f64(double %a) [ "fp.except"(metadata !"maytrap") ]
+define i32 @fptoui_maytrap(double %a) strictfp {
+  %r = call i32 @llvm.experimental.constrained.fptoui.i32.f64(
+      double %a, metadata !"fpexcept.maytrap")
+  ret i32 %r
+}
+
+; CHECK-LABEL: define double @fpext_ignore(
+; CHECK:         call double @llvm.fpext.f64.f32(float %a) [ "fp.except"(metadata !"ignore") ]
+define double @fpext_ignore(float %a) strictfp {
+  %r = call double @llvm.experimental.constrained.fpext.f64.f32(
+      float %a, metadata !"fpexcept.ignore")
+  ret double %r
+}
+
+; CHECK-LABEL: define double @ceil_maytrap(
+; CHECK:         call double @llvm.ceil.f64(double %a) [ "fp.except"(metadata !"maytrap") ]
+define double @ceil_maytrap(double %a) strictfp {
+  %r = call double @llvm.experimental.constrained.ceil.f64(
+      double %a, metadata !"fpexcept.maytrap")
+  ret double %r
+}
+
+; ===========================================================================
+; 5.  Three-operand ops (fma / fmuladd) with non-default RM
+; ===========================================================================
+
+; CHECK-LABEL: define float @fma_rtp(
+; CHECK:         call float @llvm.fma.f32(float %a, float %b, float %c) [ "fp.control"(metadata !"rtp") ]
+define float @fma_rtp(float %a, float %b, float %c) strictfp {
+  %r = call float @llvm.experimental.constrained.fma.f32(
+      float %a, float %b, float %c, metadata !"round.upward", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; CHECK-LABEL: define float @fmuladd_rtn(
+; CHECK:         call float @llvm.fmuladd.f32(float %a, float %b, float %c) [ "fp.control"(metadata !"rtn") ]
+define float @fmuladd_rtn(float %a, float %b, float %c) strictfp {
+  %r = call float @llvm.experimental.constrained.fmuladd.f32(
+      float %a, float %b, float %c, metadata !"round.downward", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; ===========================================================================
+; 6.  fcmp with non-default EB and all predicates
+;     (no RM for fcmp; EB non-default → fp.except bundle)
+; ===========================================================================
+
+; CHECK-LABEL: define i1 @fcmp_oeq_maytrap(
+; CHECK:         call i1 @llvm.fcmp.f32(float %a, float %b, metadata !"oeq") [ "fp.except"(metadata !"maytrap") ]
+define i1 @fcmp_oeq_maytrap(float %a, float %b) strictfp {
+  %r = call i1 @llvm.experimental.constrained.fcmp.f32(
+      float %a, float %b, metadata !"oeq", metadata !"fpexcept.maytrap")
+  ret i1 %r
+}
+
+; CHECK-LABEL: define i1 @fcmp_ogt_ignore(
+; CHECK:         call i1 @llvm.fcmp.f32(float %a, float %b, metadata !"ogt") [ "fp.except"(metadata !"ignore") ]
+define i1 @fcmp_ogt_ignore(float %a, float %b) strictfp {
+  %r = call i1 @llvm.experimental.constrained.fcmp.f32(
+      float %a, float %b, metadata !"ogt", metadata !"fpexcept.ignore")
+  ret i1 %r
+}
+
+; CHECK-LABEL: define i1 @fcmp_one_maytrap(
+; CHECK:         call i1 @llvm.fcmp.f32(float %a, float %b, metadata !"one") [ "fp.except"(metadata !"maytrap") ]
+define i1 @fcmp_one_maytrap(float %a, float %b) strictfp {
+  %r = call i1 @llvm.experimental.constrained.fcmp.f32(
+      float %a, float %b, metadata !"one", metadata !"fpexcept.maytrap")
+  ret i1 %r
+}
+
+; CHECK-LABEL: define i1 @fcmp_ult_maytrap(
+; CHECK:         call i1 @llvm.fcmp.f32(float %a, float %b, metadata !"ult") [ "fp.except"(metadata !"maytrap") ]
+define i1 @fcmp_ult_maytrap(float %a, float %b) strictfp {
+  %r = call i1 @llvm.experimental.constrained.fcmp.f32(
+      float %a, float %b, metadata !"ult", metadata !"fpexcept.maytrap")
+  ret i1 %r
+}
+
+; CHECK-LABEL: define i1 @fcmp_uno_maytrap(
+; CHECK:         call i1 @llvm.fcmp.f32(float %a, float %b, metadata !"uno") [ "fp.except"(metadata !"maytrap") ]
+define i1 @fcmp_uno_maytrap(float %a, float %b) strictfp {
+  %r = call i1 @llvm.experimental.constrained.fcmp.f32(
+      float %a, float %b, metadata !"uno", metadata !"fpexcept.maytrap")
+  ret i1 %r
+}
+
+; ===========================================================================
+; 7.  Conversion ops with non-default RM
+; ===========================================================================
+
+; CHECK-LABEL: define double @sitofp_rte(
+; CHECK:         call double @llvm.sitofp.f64.i32(i32 %i) [ "fp.control"(metadata !"rte") ]
+define double @sitofp_rte(i32 %i) strictfp {
+  %r = call double @llvm.experimental.constrained.sitofp.f64.i32(
+      i32 %i, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  ret double %r
+}
+
+; CHECK-LABEL: define float @fptrunc_rtz(
+; CHECK:         call float @llvm.fptrunc.f32.f64(double %a) [ "fp.control"(metadata !"rtz") ]
+define float @fptrunc_rtz(double %a) strictfp {
+  %r = call float @llvm.experimental.constrained.fptrunc.f32.f64(
+      double %a, metadata !"round.towardzero", metadata !"fpexcept.strict")
+  ret float %r
+}
+
+; CHECK-LABEL: define i64 @lrint_rtn(
+; CHECK:         call i64 @llvm.lrint.i64.f64(double %a) [ "fp.control"(metadata !"rtn") ]
+define i64 @lrint_rtn(double %a) strictfp {
+  %r = call i64 @llvm.experimental.constrained.lrint.i64.f64(
+      double %a, metadata !"round.downward", metadata !"fpexcept.strict")
+  ret i64 %r
+}
+
+; CHECK-LABEL: define i64 @llrint_rtp(
+; CHECK:         call i64 @llvm.llrint.i64.f64(double %a) [ "fp.control"(metadata !"rtp") ]
+define i64 @llrint_rtp(double %a) strictfp {
+  %r = call i64 @llvm.experimental.constrained.llrint.i64.f64(
+      double %a, metadata !"round.upward", metadata !"fpexcept.strict")
+  ret i64 %r
+}
+
+; ===========================================================================
+; 8.  ldexp (two-type overload) and powi (int exponent) with non-default RM
+; ===========================================================================
+
+; CHECK-LABEL: define double @ldexp_rte(
+; CHECK:         call double @llvm.ldexp.f64.i32(double %a, i32 %n) [ "fp.control"(metadata !"rte") ]
+define double @ldexp_rte(double %a, i32 %n) strictfp {
+  %r = call double @llvm.experimental.constrained.ldexp.f64(
+      double %a, i32 %n, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  ret double %r
+}
+
+; CHECK-LABEL: define double @powi_rtz(
+; CHECK:         call double @llvm.powi.f64.i32(double %a, i32 %n) [ "fp.control"(metadata !"rtz") ]
+define double @powi_rtz(double %a, i32 %n) strictfp {
+  %r = call double @llvm.experimental.constrained.powi.f64(
+      double %a, i32 %n, metadata !"round.towardzero", metadata !"fpexcept.strict")
+  ret double %r
+}
+
+; ===========================================================================
+; Declarations required for the old-form intrinsics used above
+; ===========================================================================
+
+declare float @llvm.experimental.constrained.fadd.f32(float, float, metadata, metadata)
+declare float @llvm.experimental.constrained.fmul.f32(float, float, metadata, metadata)
+declare float @llvm.experimental.constrained.fdiv.f32(float, float, metadata, metadata)
+declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
+declare float @llvm.experimental.constrained.fmuladd.f32(float, float, float, metadata, metadata)
+declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
+declare i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata)
+declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)
+declare double @llvm.experimental.constrained.ceil.f64(double, metadata)
+declare double @llvm.experimental.constrained.sitofp.f64.i32(i32, metadata, metadata)
+declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)
+declare i64 @llvm.experimental.constrained.lrint.i64.f64(double, metadata, metadata)
+declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)
+declare double @llvm.experimental.constrained.ldexp.f64(double, i32, metadata, metadata)
+declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)
+declare i1 @llvm.experimental.constrained.fcmp.f32(float, float, metadata, metadata)
diff --git a/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll b/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll
index 01e5b3f6673ae..6c7e8f7c7d0ed 100644
--- a/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll
+++ b/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll
@@ -1,19 +1,38 @@
 ; RUN: llvm-as < %s | llvm-bcanalyzer -dump -disable-histogram | FileCheck %s
 
+; COM: Check that all built-in and user-defined bundle tags are serialized.
 ; CHECK:  <OPERAND_BUNDLE_TAGS_BLOCK
+; COM: "deopt"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "funclet"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "gc-transition"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "cfguardtarget"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "preallocated"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "gc-live"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "clang.arc.attachedcall"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "ptrauth"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "kcfi"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "convergencectrl"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "align"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "deactivation-symbol"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "fp.control"
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "fp.except"
+; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "foo" (user-defined, from call below)
+; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; COM: "bar" (user-defined, from call below)
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
 ; CHECK-NEXT:  </OPERAND_BUNDLE_TAGS_BLOCK
 
diff --git a/llvm/test/CodeGen/AArch64/arm64-cvt-simd-fptoi-strictfp.ll b/llvm/test/CodeGen/AArch64/arm64-cvt-simd-fptoi-strictfp.ll
index 69df7372f1639..c04135096f9de 100644
--- a/llvm/test/CodeGen/AArch64/arm64-cvt-simd-fptoi-strictfp.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-cvt-simd-fptoi-strictfp.ll
@@ -5,19 +5,6 @@
 ; RUN: llc < %s -mtriple aarch64-unknown-unknown -mattr=+sve,+neon,+fullfp16,+fprcvt -force-streaming-compatible | FileCheck %s --check-prefixes=CHECK-SVE
 ; RUN: llc < %s -mtriple aarch64-unknown-unknown -global-isel -global-isel-abort=2 -mattr=+fprcvt,+fullfp16 2>&1  | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
-; CHECK-GI: warning: Instruction selection used fallback path for fptosi_i32_f16_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptosi_i64_f16_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptosi_i64_f32_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptosi_i32_f64_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptosi_i64_f64_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptosi_i32_f32_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptoui_i32_f16_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptoui_i64_f16_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptoui_i64_f32_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptoui_i32_f64_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptoui_i64_f64_simd
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fptoui_i32_f32_simd
-
 ;
 ; FPTOI strictfp
 ;
diff --git a/llvm/test/CodeGen/AArch64/arm64-vmul.ll b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
index 70c02a2a20353..7b37b88a60055 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vmul.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
@@ -2,31 +2,25 @@
 ; RUN: llc -mtriple=aarch64-none-elf -mattr=+aes < %s | FileCheck %s --check-prefixes=CHECK,CHECK-SD
 ; RUN: llc -mtriple=aarch64-none-elf -mattr=+aes -global-isel -global-isel-abort=2 2>&1 < %s | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
-; CHECK-GI:      warning: Instruction selection used fallback path for sqdmulh_1s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_2s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_4s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_2d
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_commuted_neg_2s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_commuted_neg_4s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_commuted_neg_2d
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_indexed_2s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_indexed_4s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_indexed_2d
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_indexed_2s_strict
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_indexed_4s_strict
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_indexed_2d_strict
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmla_indexed_scalar_2s_strict
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmla_indexed_scalar_4s_strict
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmla_indexed_scalar_2d_strict
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for sqdmulh_lane_1s
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for scalar_fmls_from_extract_v4f32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for scalar_fmls_from_extract_v2f32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for scalar_fmls_from_extract_v2f64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v2f32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v2f32_1
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v4f32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v4f32_1
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v2f64
+; CHECK-GI:       warning: Instruction selection used fallback path for sqdmulh_1s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_2s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_4s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_2d
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_commuted_neg_2s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_commuted_neg_4s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_commuted_neg_2d
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_indexed_2s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_indexed_4s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_indexed_2d
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqdmulh_lane_1s
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for scalar_fmls_from_extract_v4f32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for scalar_fmls_from_extract_v2f32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for scalar_fmls_from_extract_v2f64
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v2f32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v2f32_1
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v4f32
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v4f32_1
+; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fmls_with_fneg_before_extract_v2f64
 
 define <8 x i16> @smull8h(ptr %A, ptr %B) nounwind {
 ; CHECK-LABEL: smull8h:
@@ -1811,8 +1805,8 @@ entry:
   ret i64 %vqdmulls_s32.i
 }
 
-define i64 @sqdmlal_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
-; CHECK-SD-LABEL: sqdmlal_lane_1d_v2i32:
+define i64 @sqdmlal_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
+; CHECK-SD-LABEL: sqdmlal_lane_1d:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    fmov s1, w1
 ; CHECK-SD-NEXT:    fmov d2, x0
@@ -1821,7 +1815,7 @@ define i64 @sqdmlal_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
 ; CHECK-SD-NEXT:    fmov x0, d2
 ; CHECK-SD-NEXT:    ret
 ;
-; CHECK-GI-LABEL: sqdmlal_lane_1d_v2i32:
+; CHECK-GI-LABEL: sqdmlal_lane_1d:
 ; CHECK-GI:       // %bb.0:
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
 ; CHECK-GI-NEXT:    fmov s1, w1
@@ -1835,9 +1829,11 @@ define i64 @sqdmlal_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
   %res = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %A, i64 %prod)
   ret i64 %res
 }
+declare i64 @llvm.aarch64.neon.sqdmulls.scalar(i32, i32)
+declare i64 @llvm.aarch64.neon.sqadd.i64(i64, i64)
 
-define i64 @sqdmlsl_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
-; CHECK-SD-LABEL: sqdmlsl_lane_1d_v2i32:
+define i64 @sqdmlsl_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
+; CHECK-SD-LABEL: sqdmlsl_lane_1d:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    fmov s1, w1
 ; CHECK-SD-NEXT:    fmov d2, x0
@@ -1846,7 +1842,7 @@ define i64 @sqdmlsl_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
 ; CHECK-SD-NEXT:    fmov x0, d2
 ; CHECK-SD-NEXT:    ret
 ;
-; CHECK-GI-LABEL: sqdmlsl_lane_1d_v2i32:
+; CHECK-GI-LABEL: sqdmlsl_lane_1d:
 ; CHECK-GI:       // %bb.0:
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
 ; CHECK-GI-NEXT:    fmov s1, w1
@@ -1862,33 +1858,6 @@ define i64 @sqdmlsl_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
 }
 declare i64 @llvm.aarch64.neon.sqsub.i64(i64, i64)
 
-define i64 @sqdmlal_lane_1d_v4i32(i64 %A, i32 %B, <4 x i32> %C) nounwind {
-; CHECK-LABEL: sqdmlal_lane_1d_v4i32:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    fmov s1, w1
-; CHECK-NEXT:    fmov d2, x0
-; CHECK-NEXT:    sqdmlal d2, s1, v0.s[1]
-; CHECK-NEXT:    fmov x0, d2
-; CHECK-NEXT:    ret
-  %rhs = extractelement <4 x i32> %C, i32 1
-  %prod = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %B, i32 %rhs)
-  %res = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %A, i64 %prod)
-  ret i64 %res
-}
-
-define i64 @sqdmlsl_lane_1d_v4i32(i64 %A, i32 %B, <4 x i32> %C) nounwind {
-; CHECK-LABEL: sqdmlsl_lane_1d_v4i32:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    fmov s1, w1
-; CHECK-NEXT:    fmov d2, x0
-; CHECK-NEXT:    sqdmlsl d2, s1, v0.s[1]
-; CHECK-NEXT:    fmov x0, d2
-; CHECK-NEXT:    ret
-  %rhs = extractelement <4 x i32> %C, i32 1
-  %prod = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %B, i32 %rhs)
-  %res = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %A, i64 %prod)
-  ret i64 %res
-}
 
 define <4 x i32> @umlal_lane_4s(<4 x i16> %A, <4 x i16> %B, <4 x i32> %C) nounwind {
 ; CHECK-LABEL: umlal_lane_4s:
diff --git a/llvm/test/CodeGen/AArch64/floatdp_1source.ll b/llvm/test/CodeGen/AArch64/floatdp_1source.ll
index c3e8362ea2690..b5443a1e04b55 100644
--- a/llvm/test/CodeGen/AArch64/floatdp_1source.ll
+++ b/llvm/test/CodeGen/AArch64/floatdp_1source.ll
@@ -3,6 +3,8 @@
 
 declare float @llvm.fabs.f32(float) readonly
 declare double @llvm.fabs.f64(double) readonly
+declare float @llvm.fneg.f32(float)
+declare double @llvm.fneg.f64(double)
 
 declare float @llvm.sqrt.f32(float %Val)
 declare double @llvm.sqrt.f64(double %Val)
@@ -204,3 +206,24 @@ define float @conv_d_f(double %v) {
   %r = fptrunc double %v to float
   ret float %r
 }
+
+; llvm.fneg intrinsic lowers to the same fneg instruction as plain 'fneg' IR.
+; The fp.control operand bundle (when used) only affects backends that support
+; per-instruction FTZ (e.g. NVPTX), not AArch64.
+define float @fneg_f_intrinsic(float %v) {
+; CHECK-LABEL: fneg_f_intrinsic:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    fneg s0, s0
+; CHECK-NEXT:    ret
+  %r = call float @llvm.fneg.f32(float %v)
+  ret float %r
+}
+
+define double @fneg_d_intrinsic(double %v) {
+; CHECK-LABEL: fneg_d_intrinsic:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    fneg d0, d0
+; CHECK-NEXT:    ret
+  %r = call double @llvm.fneg.f64(double %v)
+  ret double %r
+}
diff --git a/llvm/test/CodeGen/AArch64/floatdp_2source.ll b/llvm/test/CodeGen/AArch64/floatdp_2source.ll
index c2f977ce53ed7..3f38d709c3051 100644
--- a/llvm/test/CodeGen/AArch64/floatdp_2source.ll
+++ b/llvm/test/CodeGen/AArch64/floatdp_2source.ll
@@ -3,6 +3,75 @@
 @varfloat = global float 0.0
 @vardouble = global double 0.0
 
+; llvm.fadd/fsub/fmul intrinsics lower to the same instructions as plain
+; fadd/fsub/fmul — the fp.control operand bundle only affects backends that
+; support per-instruction FTZ (e.g. NVPTX), not AArch64.
+
+define float @fadd_f_intrinsic(float %a, float %b) {
+; CHECK-LABEL: fadd_f_intrinsic:
+; CHECK: fadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
+  %r = call float @llvm.fadd.f32(float %a, float %b)
+  ret float %r
+}
+
+define float @fsub_f_intrinsic(float %a, float %b) {
+; CHECK-LABEL: fsub_f_intrinsic:
+; CHECK: fsub {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
+  %r = call float @llvm.fsub.f32(float %a, float %b)
+  ret float %r
+}
+
+define float @fmul_f_intrinsic(float %a, float %b) {
+; CHECK-LABEL: fmul_f_intrinsic:
+; CHECK: fmul {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
+  %r = call float @llvm.fmul.f32(float %a, float %b)
+  ret float %r
+}
+
+define double @fadd_d_intrinsic(double %a, double %b) {
+; CHECK-LABEL: fadd_d_intrinsic:
+; CHECK: fadd {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
+  %r = call double @llvm.fadd.f64(double %a, double %b)
+  ret double %r
+}
+
+define double @fsub_d_intrinsic(double %a, double %b) {
+; CHECK-LABEL: fsub_d_intrinsic:
+; CHECK: fsub {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
+  %r = call double @llvm.fsub.f64(double %a, double %b)
+  ret double %r
+}
+
+define double @fmul_d_intrinsic(double %a, double %b) {
+; CHECK-LABEL: fmul_d_intrinsic:
+; CHECK: fmul {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
+  %r = call double @llvm.fmul.f64(double %a, double %b)
+  ret double %r
+}
+
+declare float @llvm.fadd.f32(float, float)
+declare float @llvm.fsub.f32(float, float)
+declare float @llvm.fmul.f32(float, float)
+declare double @llvm.fadd.f64(double, double)
+declare double @llvm.fsub.f64(double, double)
+declare double @llvm.fmul.f64(double, double)
+declare float @llvm.fma.f32(float, float, float)
+declare double @llvm.fma.f64(double, double, double)
+
+define float @fma_f_intrinsic(float %a, float %b, float %c) {
+; CHECK-LABEL: fma_f_intrinsic:
+; CHECK: fmadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
+  %r = call float @llvm.fma.f32(float %a, float %b, float %c)
+  ret float %r
+}
+
+define double @fma_d_intrinsic(double %a, double %b, double %c) {
+; CHECK-LABEL: fma_d_intrinsic:
+; CHECK: fmadd {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
+  %r = call double @llvm.fma.f64(double %a, double %b, double %c)
+  ret double %r
+}
+
 define void @testfloat() {
 ; CHECK-LABEL: testfloat:
   %val1 = load float, ptr @varfloat
diff --git a/llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll b/llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll
index 368fa0a0cfae9..33ffdf90da564 100644
--- a/llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll
+++ b/llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll
@@ -9,49 +9,12 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for frem_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i32_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i32_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i64_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i64_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f16_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f16_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f16_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f16_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f16_i128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f16_i128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for powi_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sin_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cos_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tan_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for asin_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for acos_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan2_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sinh_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cosh_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tanh_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for pow_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log10_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log2_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp2_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lrint_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llrint_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lround_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llround_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for roundeven_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_f16
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ldexp_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_olt_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ole_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ogt_f16
@@ -77,7 +40,6 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmps_ueq_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmps_une_f16
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptrunc_f16_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fpext_f32_f16
 
 ; Check that constrained fp intrinsics are correctly lowered.
 
@@ -478,8 +440,8 @@ define half @atan2_f16(half %x, half %y) #0 {
 ; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    fcvt s1, h1
 ; CHECK-NEXT:    fcvt s0, h0
+; CHECK-NEXT:    fcvt s1, h1
 ; CHECK-NEXT:    bl atan2f
 ; CHECK-NEXT:    fcvt h0, s0
 ; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
@@ -539,8 +501,8 @@ define half @pow_f16(half %x, half %y) #0 {
 ; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    fcvt s1, h1
 ; CHECK-NEXT:    fcvt s0, h0
+; CHECK-NEXT:    fcvt s1, h1
 ; CHECK-NEXT:    bl powf
 ; CHECK-NEXT:    fcvt h0, s0
 ; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
diff --git a/llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll b/llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll
index 0b05e00a1b0db..9c0d2ea9495c2 100644
--- a/llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll
+++ b/llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll
@@ -6,79 +6,33 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sub_v4f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_v4f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_v4i32_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_v4i32_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_v4i64_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_v4i64_v4f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_v4f32_v4i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_v4f32_v4i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_v4f32_v4i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_v4f32_v4i64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for roundeven_v4f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_v4f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_v4f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmps_v4f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for add_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sub_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_v2i32_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_v2i32_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_v2i64_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_v2i64_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_v2f64_v2i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_v2f64_v2i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_v2f64_v2i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_v2f64_v2i64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for roundeven_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmps_v2f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for add_v1f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sub_v1f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_v1f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_v1i32_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_v1i32_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_v1i64_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_v1i64_v1f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_v1f64_v1i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_v1f64_v1i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_v1f64_v1i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_v1f64_v1i64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for roundeven_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_v1f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_v1f61
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmps_v1f61
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptrunc_v2f32_v2f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fpext_v2f64_v2f32
 
 ; Check that constrained fp vector intrinsics are correctly lowered.
 
@@ -149,25 +103,41 @@ define <4 x i32> @fptoui_v4i32_v4f32(<4 x float> %x) #0 {
 }
 
 define <4 x i64> @fptosi_v4i64_v4f32(<4 x float> %x) #0 {
-; CHECK-LABEL: fptosi_v4i64_v4f32:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    fcvtl2 v1.2d, v0.4s
-; CHECK-NEXT:    fcvtl v0.2d, v0.2s
-; CHECK-NEXT:    fcvtzs v1.2d, v1.2d
-; CHECK-NEXT:    fcvtzs v0.2d, v0.2d
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptosi_v4i64_v4f32:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    fcvtl2 v1.2d, v0.4s
+; CHECK-SD-NEXT:    fcvtl v0.2d, v0.2s
+; CHECK-SD-NEXT:    fcvtzs v1.2d, v1.2d
+; CHECK-SD-NEXT:    fcvtzs v0.2d, v0.2d
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptosi_v4i64_v4f32:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    fcvtl v1.2d, v0.2s
+; CHECK-GI-NEXT:    fcvtl2 v2.2d, v0.4s
+; CHECK-GI-NEXT:    fcvtzs v0.2d, v1.2d
+; CHECK-GI-NEXT:    fcvtzs v1.2d, v2.2d
+; CHECK-GI-NEXT:    ret
   %val = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32(<4 x float> %x, metadata !"fpexcept.strict") #0
   ret <4 x i64> %val
 }
 
 define <4 x i64> @fptoui_v4i64_v4f32(<4 x float> %x) #0 {
-; CHECK-LABEL: fptoui_v4i64_v4f32:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    fcvtl2 v1.2d, v0.4s
-; CHECK-NEXT:    fcvtl v0.2d, v0.2s
-; CHECK-NEXT:    fcvtzu v1.2d, v1.2d
-; CHECK-NEXT:    fcvtzu v0.2d, v0.2d
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptoui_v4i64_v4f32:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    fcvtl2 v1.2d, v0.4s
+; CHECK-SD-NEXT:    fcvtl v0.2d, v0.2s
+; CHECK-SD-NEXT:    fcvtzu v1.2d, v1.2d
+; CHECK-SD-NEXT:    fcvtzu v0.2d, v0.2d
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptoui_v4i64_v4f32:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    fcvtl v1.2d, v0.2s
+; CHECK-GI-NEXT:    fcvtl2 v2.2d, v0.4s
+; CHECK-GI-NEXT:    fcvtzu v0.2d, v1.2d
+; CHECK-GI-NEXT:    fcvtzu v1.2d, v2.2d
+; CHECK-GI-NEXT:    ret
   %val = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f32(<4 x float> %x, metadata !"fpexcept.strict") #0
   ret <4 x i64> %val
 }
@@ -695,21 +665,35 @@ define <1 x double> @fma_v1f64(<1 x double> %x, <1 x double> %y, <1 x double> %z
 }
 
 define <1 x i32> @fptosi_v1i32_v1f64(<1 x double> %x) #0 {
-; CHECK-LABEL: fptosi_v1i32_v1f64:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    fcvtzs w8, d0
-; CHECK-NEXT:    fmov s0, w8
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptosi_v1i32_v1f64:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 def $q0
+; CHECK-SD-NEXT:    fcvtzs v0.2d, v0.2d
+; CHECK-SD-NEXT:    xtn v0.2s, v0.2d
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptosi_v1i32_v1f64:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    fcvtzs w8, d0
+; CHECK-GI-NEXT:    fmov s0, w8
+; CHECK-GI-NEXT:    ret
   %val = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f64(<1 x double> %x, metadata !"fpexcept.strict") #0
   ret <1 x i32> %val
 }
 
 define <1 x i32> @fptoui_v1i32_v1f64(<1 x double> %x) #0 {
-; CHECK-LABEL: fptoui_v1i32_v1f64:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    fcvtzu w8, d0
-; CHECK-NEXT:    fmov s0, w8
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptoui_v1i32_v1f64:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 def $q0
+; CHECK-SD-NEXT:    fcvtzu v0.2d, v0.2d
+; CHECK-SD-NEXT:    xtn v0.2s, v0.2d
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptoui_v1i32_v1f64:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    fcvtzu w8, d0
+; CHECK-GI-NEXT:    fmov s0, w8
+; CHECK-GI-NEXT:    ret
   %val = call <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f64(<1 x double> %x, metadata !"fpexcept.strict") #0
   ret <1 x i32> %val
 }
@@ -993,6 +977,3 @@ declare <1 x i1> @llvm.experimental.constrained.fcmps.v1f64(<1 x double>, <1 x d
 declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata, metadata)
 declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata)
 
-;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
-; CHECK-GI: {{.*}}
-; CHECK-SD: {{.*}}
diff --git a/llvm/test/CodeGen/AArch64/fp-intrinsics.ll b/llvm/test/CodeGen/AArch64/fp-intrinsics.ll
index 6f5719c8443d9..f757f92b077f7 100644
--- a/llvm/test/CodeGen/AArch64/fp-intrinsics.ll
+++ b/llvm/test/CodeGen/AArch64/fp-intrinsics.ll
@@ -7,50 +7,12 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for frem_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i32_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i32_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i64_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i64_f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f32_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f32_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f32_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f32_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f32_i128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f32_i128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for powi_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sin_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cos_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tan_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for asin_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for acos_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan2_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sinh_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cosh_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tanh_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for pow_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log10_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log2_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp2_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lrint_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llrint_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maximum_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minimum_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lround_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llround_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for roundeven_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_olt_f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ole_f32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ogt_f32
@@ -80,50 +42,12 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for frem_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i32_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i32_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i64_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i64_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f64_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f64_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f64_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f64_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f64_i128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f64_i128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for powi_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sin_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cos_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tan_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for asin_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for acos_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan2_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sinh_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cosh_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tanh_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for pow_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log10_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log2_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp2_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lrint_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llrint_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maximum_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minimum_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lround_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llround_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for roundeven_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_olt_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ole_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ogt_f64
@@ -153,47 +77,12 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for mul_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for div_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for frem_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fma_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i32_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i32_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptosi_i64_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptoui_i64_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f128_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f128_i32
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f128_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f128_i64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sitofp_f128_i128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for uitofp_f128_i128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sqrt_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for powi_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sin_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cos_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tan_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for asin_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for acos_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan2_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sinh_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cosh_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tanh_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for pow_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log10_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log2_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp2_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for rint_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for nearbyint_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lrint_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llrint_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for maxnum_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for minnum_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for ceil_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for floor_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for lround_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for llround_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for round_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for trunc_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_olt_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ole_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fcmp_ogt_f128
@@ -221,25 +110,6 @@
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptrunc_f32_f64
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptrunc_f32_f128
 ; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fptrunc_f64_f128
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fpext_f64_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fpext_f128_f32
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for fpext_f128_f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sin_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cos_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tan_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for asin_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for acos_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for atan2_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for sinh_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for cosh_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for tanh_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for pow_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log2_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for log10_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp_v1f64
-; CHECK-GI-NEXT:  warning: Instruction selection used fallback path for exp2_v1f64
 
 ; Check that constrained fp intrinsics are correctly lowered.
 
@@ -413,12 +283,7 @@ define float @sqrt_f32(float %x) #0 {
 define float @powi_f32(float %x, i32 %y) #0 {
 ; CHECK-LABEL: powi_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __powisf2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b __powisf2
   %val = call float @llvm.experimental.constrained.powi.f32(float %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -426,12 +291,7 @@ define float @powi_f32(float %x, i32 %y) #0 {
 define float @sin_f32(float %x) #0 {
 ; CHECK-LABEL: sin_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sinf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b sinf
   %val = call float @llvm.experimental.constrained.sin.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -439,12 +299,7 @@ define float @sin_f32(float %x) #0 {
 define float @cos_f32(float %x) #0 {
 ; CHECK-LABEL: cos_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl cosf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b cosf
   %val = call float @llvm.experimental.constrained.cos.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -452,12 +307,7 @@ define float @cos_f32(float %x) #0 {
 define float @tan_f32(float %x) #0 {
 ; CHECK-LABEL: tan_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl tanf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b tanf
   %val = call float @llvm.experimental.constrained.tan.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -465,12 +315,7 @@ define float @tan_f32(float %x) #0 {
 define float @asin_f32(float %x) #0 {
 ; CHECK-LABEL: asin_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl asinf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b asinf
   %val = call float @llvm.experimental.constrained.asin.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -478,12 +323,7 @@ define float @asin_f32(float %x) #0 {
 define float @acos_f32(float %x) #0 {
 ; CHECK-LABEL: acos_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl acosf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b acosf
   %val = call float @llvm.experimental.constrained.acos.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -491,12 +331,7 @@ define float @acos_f32(float %x) #0 {
 define float @atan_f32(float %x) #0 {
 ; CHECK-LABEL: atan_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl atanf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b atanf
   %val = call float @llvm.experimental.constrained.atan.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -504,12 +339,7 @@ define float @atan_f32(float %x) #0 {
 define float @atan2_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: atan2_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl atan2f
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b atan2f
   %val = call float @llvm.experimental.constrained.atan2.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -517,12 +347,7 @@ define float @atan2_f32(float %x, float %y) #0 {
 define float @sinh_f32(float %x) #0 {
 ; CHECK-LABEL: sinh_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sinhf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b sinhf
   %val = call float @llvm.experimental.constrained.sinh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -530,12 +355,7 @@ define float @sinh_f32(float %x) #0 {
 define float @cosh_f32(float %x) #0 {
 ; CHECK-LABEL: cosh_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl coshf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b coshf
   %val = call float @llvm.experimental.constrained.cosh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -543,12 +363,7 @@ define float @cosh_f32(float %x) #0 {
 define float @tanh_f32(float %x) #0 {
 ; CHECK-LABEL: tanh_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl tanhf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b tanhf
   %val = call float @llvm.experimental.constrained.tanh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -556,12 +371,7 @@ define float @tanh_f32(float %x) #0 {
 define float @pow_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: pow_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl powf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b powf
   %val = call float @llvm.experimental.constrained.pow.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -569,12 +379,7 @@ define float @pow_f32(float %x, float %y) #0 {
 define float @log_f32(float %x) #0 {
 ; CHECK-LABEL: log_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl logf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b logf
   %val = call float @llvm.experimental.constrained.log.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -582,12 +387,7 @@ define float @log_f32(float %x) #0 {
 define float @log10_f32(float %x) #0 {
 ; CHECK-LABEL: log10_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log10f
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log10f
   %val = call float @llvm.experimental.constrained.log10.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -595,12 +395,7 @@ define float @log10_f32(float %x) #0 {
 define float @log2_f32(float %x) #0 {
 ; CHECK-LABEL: log2_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log2f
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log2f
   %val = call float @llvm.experimental.constrained.log2.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -608,12 +403,7 @@ define float @log2_f32(float %x) #0 {
 define float @exp_f32(float %x) #0 {
 ; CHECK-LABEL: exp_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl expf
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b expf
   %val = call float @llvm.experimental.constrained.exp.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -621,12 +411,7 @@ define float @exp_f32(float %x) #0 {
 define float @exp2_f32(float %x) #0 {
 ; CHECK-LABEL: exp2_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl exp2f
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b exp2f
   %val = call float @llvm.experimental.constrained.exp2.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -1207,12 +992,7 @@ define double @sqrt_f64(double %x) #0 {
 define double @powi_f64(double %x, i32 %y) #0 {
 ; CHECK-LABEL: powi_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __powidf2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b __powidf2
   %val = call double @llvm.experimental.constrained.powi.f64(double %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1220,12 +1000,7 @@ define double @powi_f64(double %x, i32 %y) #0 {
 define double @sin_f64(double %x) #0 {
 ; CHECK-LABEL: sin_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sin
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b sin
   %val = call double @llvm.experimental.constrained.sin.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1233,12 +1008,7 @@ define double @sin_f64(double %x) #0 {
 define double @cos_f64(double %x) #0 {
 ; CHECK-LABEL: cos_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl cos
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b cos
   %val = call double @llvm.experimental.constrained.cos.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1246,12 +1016,7 @@ define double @cos_f64(double %x) #0 {
 define double @tan_f64(double %x) #0 {
 ; CHECK-LABEL: tan_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl tan
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b tan
   %val = call double @llvm.experimental.constrained.tan.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1259,12 +1024,7 @@ define double @tan_f64(double %x) #0 {
 define double @asin_f64(double %x) #0 {
 ; CHECK-LABEL: asin_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl asin
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b asin
   %val = call double @llvm.experimental.constrained.asin.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1272,12 +1032,7 @@ define double @asin_f64(double %x) #0 {
 define double @acos_f64(double %x) #0 {
 ; CHECK-LABEL: acos_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl acos
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b acos
   %val = call double @llvm.experimental.constrained.acos.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1285,12 +1040,7 @@ define double @acos_f64(double %x) #0 {
 define double @atan_f64(double %x) #0 {
 ; CHECK-LABEL: atan_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl atan
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b atan
   %val = call double @llvm.experimental.constrained.atan.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1298,12 +1048,7 @@ define double @atan_f64(double %x) #0 {
 define double @atan2_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: atan2_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl atan2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b atan2
   %val = call double @llvm.experimental.constrained.atan2.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1311,12 +1056,7 @@ define double @atan2_f64(double %x, double %y) #0 {
 define double @sinh_f64(double %x) #0 {
 ; CHECK-LABEL: sinh_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sinh
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b sinh
   %val = call double @llvm.experimental.constrained.sinh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1324,12 +1064,7 @@ define double @sinh_f64(double %x) #0 {
 define double @cosh_f64(double %x) #0 {
 ; CHECK-LABEL: cosh_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl cosh
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b cosh
   %val = call double @llvm.experimental.constrained.cosh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1337,12 +1072,7 @@ define double @cosh_f64(double %x) #0 {
 define double @tanh_f64(double %x) #0 {
 ; CHECK-LABEL: tanh_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl tanh
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b tanh
   %val = call double @llvm.experimental.constrained.tanh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1350,12 +1080,7 @@ define double @tanh_f64(double %x) #0 {
 define double @pow_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: pow_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl pow
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b pow
   %val = call double @llvm.experimental.constrained.pow.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1363,12 +1088,7 @@ define double @pow_f64(double %x, double %y) #0 {
 define double @log_f64(double %x) #0 {
 ; CHECK-LABEL: log_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log
   %val = call double @llvm.experimental.constrained.log.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1376,12 +1096,7 @@ define double @log_f64(double %x) #0 {
 define double @log10_f64(double %x) #0 {
 ; CHECK-LABEL: log10_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log10
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log10
   %val = call double @llvm.experimental.constrained.log10.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1389,12 +1104,7 @@ define double @log10_f64(double %x) #0 {
 define double @log2_f64(double %x) #0 {
 ; CHECK-LABEL: log2_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log2
   %val = call double @llvm.experimental.constrained.log2.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1402,12 +1112,7 @@ define double @log2_f64(double %x) #0 {
 define double @exp_f64(double %x) #0 {
 ; CHECK-LABEL: exp_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl exp
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b exp
   %val = call double @llvm.experimental.constrained.exp.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1415,12 +1120,7 @@ define double @exp_f64(double %x) #0 {
 define double @exp2_f64(double %x) #0 {
 ; CHECK-LABEL: exp2_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl exp2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b exp2
   %val = call double @llvm.experimental.constrained.exp2.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
@@ -1899,66 +1599,86 @@ define fp128 @frem_f128(fp128 %x, fp128 %y) #0 {
 }
 
 define fp128 @fma_f128(fp128 %x, fp128 %y, fp128 %z) #0 {
-; CHECK-LABEL: fma_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl fmal
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fma_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    bl fmal
+; CHECK-SD-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fma_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    b fmal
   %val = call fp128 @llvm.experimental.constrained.fma.f128(fp128 %x, fp128 %y, fp128 %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
 
 define i32 @fptosi_i32_f128(fp128 %x) #0 {
-; CHECK-LABEL: fptosi_i32_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __fixtfsi
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptosi_i32_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    bl __fixtfsi
+; CHECK-SD-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptosi_i32_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    b __fixtfsi
   %val = call i32 @llvm.experimental.constrained.fptosi.i32.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @fptoui_i32_f128(fp128 %x) #0 {
-; CHECK-LABEL: fptoui_i32_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __fixunstfsi
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptoui_i32_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    bl __fixunstfsi
+; CHECK-SD-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptoui_i32_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    b __fixunstfsi
   %val = call i32 @llvm.experimental.constrained.fptoui.i32.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i64 @fptosi_i64_f128(fp128 %x) #0 {
-; CHECK-LABEL: fptosi_i64_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __fixtfdi
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptosi_i64_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    bl __fixtfdi
+; CHECK-SD-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptosi_i64_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    b __fixtfdi
   %val = call i64 @llvm.experimental.constrained.fptosi.i64.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret i64 %val
 }
 
 define i64 @fptoui_i64_f128(fp128 %x) #0 {
-; CHECK-LABEL: fptoui_i64_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __fixunstfdi
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: fptoui_i64_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    bl __fixunstfdi
+; CHECK-SD-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: fptoui_i64_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    b __fixunstfdi
   %val = call i64 @llvm.experimental.constrained.fptoui.i64.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret i64 %val
 }
@@ -2042,14 +1762,18 @@ define fp128 @uitofp_f128_i128(i128 %x) #0 {
 }
 
 define fp128 @sqrt_f128(fp128 %x) #0 {
-; CHECK-LABEL: sqrt_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sqrtl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: sqrt_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-SD-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-SD-NEXT:    .cfi_offset w30, -16
+; CHECK-SD-NEXT:    bl sqrtl
+; CHECK-SD-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: sqrt_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    b sqrtl
   %val = call fp128 @llvm.experimental.constrained.sqrt.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2057,12 +1781,7 @@ define fp128 @sqrt_f128(fp128 %x) #0 {
 define fp128 @powi_f128(fp128 %x, i32 %y) #0 {
 ; CHECK-LABEL: powi_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __powitf2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b __powitf2
   %val = call fp128 @llvm.experimental.constrained.powi.f128(fp128 %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2070,12 +1789,7 @@ define fp128 @powi_f128(fp128 %x, i32 %y) #0 {
 define fp128 @sin_f128(fp128 %x) #0 {
 ; CHECK-LABEL: sin_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sinl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b sinl
   %val = call fp128 @llvm.experimental.constrained.sin.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2083,12 +1797,7 @@ define fp128 @sin_f128(fp128 %x) #0 {
 define fp128 @cos_f128(fp128 %x) #0 {
 ; CHECK-LABEL: cos_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl cosl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b cosl
   %val = call fp128 @llvm.experimental.constrained.cos.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2096,12 +1805,7 @@ define fp128 @cos_f128(fp128 %x) #0 {
 define fp128 @tan_f128(fp128 %x) #0 {
 ; CHECK-LABEL: tan_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl tanl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b tanl
   %val = call fp128 @llvm.experimental.constrained.tan.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2109,12 +1813,7 @@ define fp128 @tan_f128(fp128 %x) #0 {
 define fp128 @asin_f128(fp128 %x) #0 {
 ; CHECK-LABEL: asin_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl asinl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b asinl
   %val = call fp128 @llvm.experimental.constrained.asin.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2122,12 +1821,7 @@ define fp128 @asin_f128(fp128 %x) #0 {
 define fp128 @acos_f128(fp128 %x) #0 {
 ; CHECK-LABEL: acos_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl acosl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b acosl
   %val = call fp128 @llvm.experimental.constrained.acos.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2135,12 +1829,7 @@ define fp128 @acos_f128(fp128 %x) #0 {
 define fp128 @atan_f128(fp128 %x) #0 {
 ; CHECK-LABEL: atan_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl atanl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b atanl
   %val = call fp128 @llvm.experimental.constrained.atan.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2148,12 +1837,7 @@ define fp128 @atan_f128(fp128 %x) #0 {
 define fp128 @atan2_f128(fp128 %x, fp128 %y) #0 {
 ; CHECK-LABEL: atan2_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl atan2l
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b atan2l
   %val = call fp128 @llvm.experimental.constrained.atan2.f128(fp128 %x, fp128 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2161,12 +1845,7 @@ define fp128 @atan2_f128(fp128 %x, fp128 %y) #0 {
 define fp128 @sinh_f128(fp128 %x) #0 {
 ; CHECK-LABEL: sinh_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl sinhl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b sinhl
   %val = call fp128 @llvm.experimental.constrained.sinh.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2174,12 +1853,7 @@ define fp128 @sinh_f128(fp128 %x) #0 {
 define fp128 @cosh_f128(fp128 %x) #0 {
 ; CHECK-LABEL: cosh_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl coshl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b coshl
   %val = call fp128 @llvm.experimental.constrained.cosh.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2187,12 +1861,7 @@ define fp128 @cosh_f128(fp128 %x) #0 {
 define fp128 @tanh_f128(fp128 %x) #0 {
 ; CHECK-LABEL: tanh_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl tanhl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b tanhl
   %val = call fp128 @llvm.experimental.constrained.tanh.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2200,12 +1869,7 @@ define fp128 @tanh_f128(fp128 %x) #0 {
 define fp128 @pow_f128(fp128 %x, fp128 %y) #0 {
 ; CHECK-LABEL: pow_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl powl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b powl
   %val = call fp128 @llvm.experimental.constrained.pow.f128(fp128 %x, fp128 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2213,12 +1877,7 @@ define fp128 @pow_f128(fp128 %x, fp128 %y) #0 {
 define fp128 @log_f128(fp128 %x) #0 {
 ; CHECK-LABEL: log_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl logl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b logl
   %val = call fp128 @llvm.experimental.constrained.log.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2226,12 +1885,7 @@ define fp128 @log_f128(fp128 %x) #0 {
 define fp128 @log10_f128(fp128 %x) #0 {
 ; CHECK-LABEL: log10_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log10l
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log10l
   %val = call fp128 @llvm.experimental.constrained.log10.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2239,12 +1893,7 @@ define fp128 @log10_f128(fp128 %x) #0 {
 define fp128 @log2_f128(fp128 %x) #0 {
 ; CHECK-LABEL: log2_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl log2l
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b log2l
   %val = call fp128 @llvm.experimental.constrained.log2.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2252,12 +1901,7 @@ define fp128 @log2_f128(fp128 %x) #0 {
 define fp128 @exp_f128(fp128 %x) #0 {
 ; CHECK-LABEL: exp_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl expl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b expl
   %val = call fp128 @llvm.experimental.constrained.exp.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2265,12 +1909,7 @@ define fp128 @exp_f128(fp128 %x) #0 {
 define fp128 @exp2_f128(fp128 %x) #0 {
 ; CHECK-LABEL: exp2_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl exp2l
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b exp2l
   %val = call fp128 @llvm.experimental.constrained.exp2.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2278,12 +1917,7 @@ define fp128 @exp2_f128(fp128 %x) #0 {
 define fp128 @rint_f128(fp128 %x) #0 {
 ; CHECK-LABEL: rint_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl rintl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b rintl
   %val = call fp128 @llvm.experimental.constrained.rint.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2291,25 +1925,24 @@ define fp128 @rint_f128(fp128 %x) #0 {
 define fp128 @nearbyint_f128(fp128 %x) #0 {
 ; CHECK-LABEL: nearbyint_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl nearbyintl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b nearbyintl
   %val = call fp128 @llvm.experimental.constrained.nearbyint.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
 
 define i32 @lrint_f128(fp128 %x) #0 {
-; CHECK-LABEL: lrint_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl lrintl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: lrint_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    b lrintl
+;
+; CHECK-GI-LABEL: lrint_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-GI-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-GI-NEXT:    .cfi_offset w30, -16
+; CHECK-GI-NEXT:    bl rintl
+; CHECK-GI-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-GI-NEXT:    b __fixtfsi
   %val = call i32 @llvm.experimental.constrained.lrint.i32.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
@@ -2317,12 +1950,7 @@ define i32 @lrint_f128(fp128 %x) #0 {
 define i64 @llrint_f128(fp128 %x) #0 {
 ; CHECK-LABEL: llrint_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl llrintl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b llrintl
   %val = call i64 @llvm.experimental.constrained.llrint.i64.f128(fp128 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i64 %val
 }
@@ -2330,12 +1958,7 @@ define i64 @llrint_f128(fp128 %x) #0 {
 define fp128 @maxnum_f128(fp128 %x, fp128 %y) #0 {
 ; CHECK-LABEL: maxnum_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl fmaxl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b fmaxl
   %val = call fp128 @llvm.experimental.constrained.maxnum.f128(fp128 %x, fp128 %y, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2343,12 +1966,7 @@ define fp128 @maxnum_f128(fp128 %x, fp128 %y) #0 {
 define fp128 @minnum_f128(fp128 %x, fp128 %y) #0 {
 ; CHECK-LABEL: minnum_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl fminl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b fminl
   %val = call fp128 @llvm.experimental.constrained.minnum.f128(fp128 %x, fp128 %y, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2356,12 +1974,7 @@ define fp128 @minnum_f128(fp128 %x, fp128 %y) #0 {
 define fp128 @ceil_f128(fp128 %x) #0 {
 ; CHECK-LABEL: ceil_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl ceill
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b ceill
   %val = call fp128 @llvm.experimental.constrained.ceil.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2369,25 +1982,24 @@ define fp128 @ceil_f128(fp128 %x) #0 {
 define fp128 @floor_f128(fp128 %x) #0 {
 ; CHECK-LABEL: floor_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl floorl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b floorl
   %val = call fp128 @llvm.experimental.constrained.floor.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
 
 define i32 @lround_f128(fp128 %x) #0 {
-; CHECK-LABEL: lround_f128:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl lroundl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-SD-LABEL: lround_f128:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    b lroundl
+;
+; CHECK-GI-LABEL: lround_f128:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-GI-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-GI-NEXT:    .cfi_offset w30, -16
+; CHECK-GI-NEXT:    bl roundl
+; CHECK-GI-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-GI-NEXT:    b __fixtfsi
   %val = call i32 @llvm.experimental.constrained.lround.i32.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
@@ -2395,12 +2007,7 @@ define i32 @lround_f128(fp128 %x) #0 {
 define i64 @llround_f128(fp128 %x) #0 {
 ; CHECK-LABEL: llround_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl llroundl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b llroundl
   %val = call i64 @llvm.experimental.constrained.llround.i64.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret i64 %val
 }
@@ -2408,12 +2015,7 @@ define i64 @llround_f128(fp128 %x) #0 {
 define fp128 @round_f128(fp128 %x) #0 {
 ; CHECK-LABEL: round_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl roundl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b roundl
   %val = call fp128 @llvm.experimental.constrained.round.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2421,12 +2023,7 @@ define fp128 @round_f128(fp128 %x) #0 {
 define fp128 @trunc_f128(fp128 %x) #0 {
 ; CHECK-LABEL: trunc_f128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl truncl
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b truncl
   %val = call fp128 @llvm.experimental.constrained.trunc.f128(fp128 %x, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2897,12 +2494,7 @@ define double @fpext_f64_f32(float %x) #0 {
 define fp128 @fpext_f128_f32(float %x) #0 {
 ; CHECK-LABEL: fpext_f128_f32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __extendsftf2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b __extendsftf2
   %val = call fp128 @llvm.experimental.constrained.fpext.f128.f32(float %x, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -2910,12 +2502,7 @@ define fp128 @fpext_f128_f32(float %x) #0 {
 define fp128 @fpext_f128_f64(double %x) #0 {
 ; CHECK-LABEL: fpext_f128_f64:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    .cfi_offset w30, -16
-; CHECK-NEXT:    bl __extenddftf2
-; CHECK-NEXT:    ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:    ret
+; CHECK-NEXT:    b __extenddftf2
   %val = call fp128 @llvm.experimental.constrained.fpext.f128.f64(double %x, metadata !"fpexcept.strict") #0
   ret fp128 %val
 }
@@ -3290,6 +2877,3 @@ declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)
 declare fp128 @llvm.experimental.constrained.fpext.f128.f32(float, metadata)
 declare fp128 @llvm.experimental.constrained.fpext.f128.f64(double, metadata)
 
-;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
-; CHECK-GI: {{.*}}
-; CHECK-SD: {{.*}}
diff --git a/llvm/test/CodeGen/AArch64/strict-fp-int-promote.ll b/llvm/test/CodeGen/AArch64/strict-fp-int-promote.ll
index 0f7ea36949da5..f7e555bad6825 100644
--- a/llvm/test/CodeGen/AArch64/strict-fp-int-promote.ll
+++ b/llvm/test/CodeGen/AArch64/strict-fp-int-promote.ll
@@ -12,7 +12,7 @@ declare float @llvm.experimental.constrained.uitofp.f32.i16(i16, metadata, metad
 define i32 @test() #0 {
 ; CHECK-LABEL: test:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    mov w8, #1
+; CHECK-NEXT:    mov w8, #1 // =0x1
 ; CHECK-NEXT:    scvtf s0, w8
 ; CHECK-NEXT:    fcmp s0, s0
 ; CHECK-NEXT:    cset w0, eq
@@ -20,11 +20,9 @@ define i32 @test() #0 {
 ;
 ; SUBOPTIMAL-LABEL: test:
 ; SUBOPTIMAL:       // %bb.0: // %entry
-; SUBOPTIMAL-NEXT:    mov w8, #1
+; SUBOPTIMAL-NEXT:    mov w8, #1 // =0x1
 ; SUBOPTIMAL-NEXT:    scvtf s0, w8
-; SUBOPTIMAL-NEXT:    mov w8, #1
-; SUBOPTIMAL-NEXT:    scvtf s1, w8
-; SUBOPTIMAL-NEXT:    fcmp s0, s1
+; SUBOPTIMAL-NEXT:    fcmp s0, s0
 ; SUBOPTIMAL-NEXT:    cset w8, eq
 ; SUBOPTIMAL-NEXT:    and w0, w8, #0x1
 ; SUBOPTIMAL-NEXT:    ret
@@ -39,7 +37,7 @@ entry:
 define i32 @test2() #0 {
 ; CHECK-LABEL: test2:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    mov w8, #1
+; CHECK-NEXT:    mov w8, #1 // =0x1
 ; CHECK-NEXT:    scvtf s0, w8
 ; CHECK-NEXT:    ucvtf s1, w8
 ; CHECK-NEXT:    fcmp s0, s1
@@ -48,9 +46,8 @@ define i32 @test2() #0 {
 ;
 ; SUBOPTIMAL-LABEL: test2:
 ; SUBOPTIMAL:       // %bb.0: // %entry
-; SUBOPTIMAL-NEXT:    mov w8, #1
+; SUBOPTIMAL-NEXT:    mov w8, #1 // =0x1
 ; SUBOPTIMAL-NEXT:    scvtf s0, w8
-; SUBOPTIMAL-NEXT:    mov w8, #1
 ; SUBOPTIMAL-NEXT:    ucvtf s1, w8
 ; SUBOPTIMAL-NEXT:    fcmp s0, s1
 ; SUBOPTIMAL-NEXT:    cset w8, eq
diff --git a/llvm/test/CodeGen/AArch64/strict-fp-opt.ll b/llvm/test/CodeGen/AArch64/strict-fp-opt.ll
index c433291ff576a..93c03ecec9152 100644
--- a/llvm/test/CodeGen/AArch64/strict-fp-opt.ll
+++ b/llvm/test/CodeGen/AArch64/strict-fp-opt.ll
@@ -68,13 +68,10 @@ if.end:
 define float @add_twice_fpexcept_strict(float %x, float %y, i32 %n) #0 {
 ; CHECK-LABEL: add_twice_fpexcept_strict:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    fmov s2, s0
 ; CHECK-NEXT:    fadd s0, s0, s1
-; CHECK-NEXT:    cbz w0, .LBB4_2
-; CHECK-NEXT:  // %bb.1: // %if.then
-; CHECK-NEXT:    fadd s1, s2, s1
-; CHECK-NEXT:    fmul s0, s0, s1
-; CHECK-NEXT:  .LBB4_2: // %if.end
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    fmul s1, s0, s0
+; CHECK-NEXT:    fcsel s0, s0, s1, eq
 ; CHECK-NEXT:    ret
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -95,10 +92,9 @@ define float @add_twice_round_dynamic(float %x, float %y, i32 %n) #0 {
 ; CHECK-LABEL: add_twice_round_dynamic:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    fadd s0, s0, s1
-; CHECK-NEXT:    cbz w0, .LBB5_2
-; CHECK-NEXT:  // %bb.1: // %if.then
-; CHECK-NEXT:    fmul s0, s0, s0
-; CHECK-NEXT:  .LBB5_2: // %if.end
+; CHECK-NEXT:    cmp w0, #0
+; CHECK-NEXT:    fmul s1, s0, s0
+; CHECK-NEXT:    fcsel s0, s0, s1, eq
 ; CHECK-NEXT:    ret
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-cvt-fp-to-int.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-cvt-fp-to-int.ll
index cfdc1baf8c282..60d9d380b592a 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-cvt-fp-to-int.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-cvt-fp-to-int.ll
@@ -227,7 +227,9 @@ define i32 @strict_convert_signed(double %x) {
 define i32 @strict_convert_unsigned(float %x) {
 ; CHECK-LABEL: strict_convert_unsigned:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    fcvtzu w0, s0
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcvtzu z0.s, p0/m, z0.s
+; CHECK-NEXT:    fmov w0, s0
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: strict_convert_unsigned:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constrained-fp.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constrained-fp.ll
index 4f360ef3c9f1e..e3e57d54ffe6c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constrained-fp.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constrained-fp.ll
@@ -195,8 +195,8 @@ define float @v_constained_fma_f32_fpexcept_ignore_flags(float %x, float %y, flo
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
   ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
   ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
-  ; CHECK-NEXT:   [[STRICT_FMA:%[0-9]+]]:_(s32) = nsz nofpexcept G_STRICT_FMA [[COPY]], [[COPY1]], [[COPY2]]
-  ; CHECK-NEXT:   $vgpr0 = COPY [[STRICT_FMA]](s32)
+  ; CHECK-NEXT:   [[FMA:%[0-9]+]]:_(s32) = nsz G_FMA [[COPY]], [[COPY1]], [[COPY2]]
+  ; CHECK-NEXT:   $vgpr0 = COPY [[FMA]](s32)
   ; CHECK-NEXT:   SI_RETURN implicit $vgpr0
   %val = call nsz float @llvm.experimental.constrained.fma.f32(float %x, float %y, float %z, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %val
@@ -208,8 +208,8 @@ define float @v_constained_sqrt_f32_fpexcept_strict(float %x) #0 {
   ; CHECK-NEXT:   liveins: $vgpr0
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-  ; CHECK-NEXT:   [[STRICT_FSQRT:%[0-9]+]]:_(s32) = G_STRICT_FSQRT [[COPY]]
-  ; CHECK-NEXT:   $vgpr0 = COPY [[STRICT_FSQRT]](s32)
+  ; CHECK-NEXT:   [[FSQRT:%[0-9]+]]:_(s32) = G_FSQRT [[COPY]]
+  ; CHECK-NEXT:   $vgpr0 = COPY [[FSQRT]](s32)
   ; CHECK-NEXT:   SI_RETURN implicit $vgpr0
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict")
   ret float %val
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f16.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f16.ll
index 92d87a0d74efb..c0b5fa1ea24fb 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f16.ll
@@ -41,7 +41,7 @@ define void @v_constained_fma_f16_fpexcept_strict_uni(half inreg %x, half inreg
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_mov_b16_e32 v2.l, s2
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_fma_f16 v2.l, s0, s1, v2.l
+; GFX11-NEXT:    v_fmac_f16_e64 v2.l, s0, s1
 ; GFX11-NEXT:    global_store_b16 v[0:1], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -93,8 +93,8 @@ define void @v_constained_fma_f16_fpexcept_strict_div(half %x, half %y, half %z,
 ; GFX11-LABEL: v_constained_fma_f16_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f16 v0.l, v0.l, v1.l, v2.l
-; GFX11-NEXT:    global_store_b16 v[3:4], v0, off
+; GFX11-NEXT:    v_fmac_f16_e32 v2.l, v0.l, v1.l
+; GFX11-NEXT:    global_store_b16 v[3:4], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_f16_fpexcept_strict_div:
@@ -104,8 +104,8 @@ define void @v_constained_fma_f16_fpexcept_strict_div(half %x, half %y, half %z,
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f16 v0.l, v0.l, v1.l, v2.l
-; GFX12-NEXT:    global_store_b16 v[3:4], v0, off
+; GFX12-NEXT:    v_fmac_f16_e32 v2.l, v0.l, v1.l
+; GFX12-NEXT:    global_store_b16 v[3:4], v2, off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %val = call half @llvm.experimental.constrained.fma.f16(half %x, half %y, half %z, metadata !"round.tonearest", metadata !"fpexcept.strict")
   store half %val, ptr addrspace(1) %out
@@ -592,7 +592,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_uni(half inreg %x, h
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s17
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s18
-; GFX8-NEXT:    v_fma_f16 v2, -s16, -v2, v3
+; GFX8-NEXT:    v_fma_f16 v2, s16, v2, v3
 ; GFX8-NEXT:    flat_store_short v[0:1], v2
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -602,7 +602,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_uni(half inreg %x, h
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    v_mov_b32_e32 v2, s17
 ; GFX900-NEXT:    v_mov_b32_e32 v3, s18
-; GFX900-NEXT:    v_fma_f16 v2, -s16, -v2, v3
+; GFX900-NEXT:    v_fma_f16 v2, s16, v2, v3
 ; GFX900-NEXT:    global_store_short v[0:1], v2, off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -612,7 +612,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_uni(half inreg %x, h
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v2, s1
 ; GFX942-NEXT:    v_mov_b32_e32 v3, s2
-; GFX942-NEXT:    v_fma_f16 v2, -s0, -v2, v3
+; GFX942-NEXT:    v_fma_f16 v2, s0, v2, v3
 ; GFX942-NEXT:    global_store_short v[0:1], v2, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
@@ -622,7 +622,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_uni(half inreg %x, h
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_mov_b16_e32 v2.l, s2
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_fma_f16 v2.l, -s0, -s1, v2.l
+; GFX11-NEXT:    v_fmac_f16_e64 v2.l, s0, s1
 ; GFX11-NEXT:    global_store_b16 v[0:1], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -633,9 +633,6 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_uni(half inreg %x, h
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_xor_b32 s0, s0, 0x8000
-; GFX12-NEXT:    s_xor_b32 s1, s1, 0x8000
-; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_fmac_f16 s2, s0, s1
 ; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_2)
@@ -653,7 +650,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_div(half %x, half %y
 ; GFX8-LABEL: v_constained_fma_f16_fpexcept_strict_fneg_fneg_div:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_fma_f16 v0, -v0, -v1, v2
+; GFX8-NEXT:    v_fma_f16 v0, v0, v1, v2
 ; GFX8-NEXT:    flat_store_short v[3:4], v0
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -661,7 +658,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_div(half %x, half %y
 ; GFX900-LABEL: v_constained_fma_f16_fpexcept_strict_fneg_fneg_div:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX900-NEXT:    v_fma_f16 v0, -v0, -v1, v2
+; GFX900-NEXT:    v_fma_f16 v0, v0, v1, v2
 ; GFX900-NEXT:    global_store_short v[3:4], v0, off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -671,7 +668,7 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_div(half %x, half %y
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v6, v3
 ; GFX942-NEXT:    v_mov_b32_e32 v7, v4
-; GFX942-NEXT:    v_fma_f16 v0, -v0, -v1, v2
+; GFX942-NEXT:    v_fma_f16 v0, v0, v1, v2
 ; GFX942-NEXT:    global_store_short v[6:7], v0, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
@@ -679,8 +676,8 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_div(half %x, half %y
 ; GFX11-LABEL: v_constained_fma_f16_fpexcept_strict_fneg_fneg_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f16 v0.l, -v0.l, -v1.l, v2.l
-; GFX11-NEXT:    global_store_b16 v[3:4], v0, off
+; GFX11-NEXT:    v_fmac_f16_e32 v2.l, v0.l, v1.l
+; GFX11-NEXT:    global_store_b16 v[3:4], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_f16_fpexcept_strict_fneg_fneg_div:
@@ -690,8 +687,8 @@ define void @v_constained_fma_f16_fpexcept_strict_fneg_fneg_div(half %x, half %y
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f16 v0.l, -v0.l, -v1.l, v2.l
-; GFX12-NEXT:    global_store_b16 v[3:4], v0, off
+; GFX12-NEXT:    v_fmac_f16_e32 v2.l, v0.l, v1.l
+; GFX12-NEXT:    global_store_b16 v[3:4], v2, off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg half %x
   %neg.y = fneg half %y
@@ -818,27 +815,21 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_uni(<2 x half> inr
 ; GFX8-LABEL: v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_uni:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_mov_b32_e32 v2, s16
-; GFX8-NEXT:    v_xor_b32_e32 v2, 0x80008000, v2
-; GFX8-NEXT:    v_readfirstlane_b32 s4, v2
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s17
-; GFX8-NEXT:    v_xor_b32_e32 v2, 0x80008000, v2
-; GFX8-NEXT:    v_readfirstlane_b32 s5, v2
-; GFX8-NEXT:    v_mov_b32_e32 v2, s5
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s18
-; GFX8-NEXT:    s_lshr_b32 s7, s5, 16
-; GFX8-NEXT:    s_lshr_b32 s8, s18, 16
+; GFX8-NEXT:    s_lshr_b32 s5, s17, 16
+; GFX8-NEXT:    s_lshr_b32 s6, s18, 16
+; GFX8-NEXT:    v_fma_f16 v2, s16, v2, v3
+; GFX8-NEXT:    s_lshr_b32 s4, s16, 16
+; GFX8-NEXT:    v_readfirstlane_b32 s7, v2
+; GFX8-NEXT:    v_mov_b32_e32 v2, s5
+; GFX8-NEXT:    v_mov_b32_e32 v3, s6
 ; GFX8-NEXT:    v_fma_f16 v2, s4, v2, v3
-; GFX8-NEXT:    s_lshr_b32 s6, s4, 16
 ; GFX8-NEXT:    v_readfirstlane_b32 s4, v2
-; GFX8-NEXT:    v_mov_b32_e32 v2, s7
-; GFX8-NEXT:    v_mov_b32_e32 v3, s8
-; GFX8-NEXT:    v_fma_f16 v2, s6, v2, v3
-; GFX8-NEXT:    v_readfirstlane_b32 s5, v2
-; GFX8-NEXT:    s_and_b32 s5, 0xffff, s5
 ; GFX8-NEXT:    s_and_b32 s4, 0xffff, s4
-; GFX8-NEXT:    s_lshl_b32 s5, s5, 16
-; GFX8-NEXT:    s_or_b32 s4, s4, s5
+; GFX8-NEXT:    s_and_b32 s5, 0xffff, s7
+; GFX8-NEXT:    s_lshl_b32 s4, s4, 16
+; GFX8-NEXT:    s_or_b32 s4, s5, s4
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX8-NEXT:    flat_store_dword v[0:1], v2
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
@@ -849,7 +840,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_uni(<2 x half> inr
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    v_mov_b32_e32 v2, s17
 ; GFX900-NEXT:    v_mov_b32_e32 v3, s18
-; GFX900-NEXT:    v_pk_fma_f16 v2, s16, v2, v3 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX900-NEXT:    v_pk_fma_f16 v2, s16, v2, v3
 ; GFX900-NEXT:    global_store_dword v[0:1], v2, off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -859,7 +850,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_uni(<2 x half> inr
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v2, s1
 ; GFX942-NEXT:    v_mov_b32_e32 v3, s2
-; GFX942-NEXT:    v_pk_fma_f16 v2, s0, v2, v3 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX942-NEXT:    v_pk_fma_f16 v2, s0, v2, v3
 ; GFX942-NEXT:    global_store_dword v[0:1], v2, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
@@ -869,7 +860,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_uni(<2 x half> inr
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_pk_fma_f16 v2, s0, s1, v2 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX11-NEXT:    v_pk_fma_f16 v2, s0, s1, v2
 ; GFX11-NEXT:    global_store_b32 v[0:1], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -882,14 +873,9 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_uni(<2 x half> inr
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-NEXT:    s_lshr_b32 s3, s0, 16
 ; GFX12-NEXT:    s_lshr_b32 s4, s1, 16
-; GFX12-NEXT:    s_xor_b32 s0, s0, 0x8000
-; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
-; GFX12-NEXT:    s_xor_b32 s3, s3, 0x8000
-; GFX12-NEXT:    s_xor_b32 s1, s1, 0x8000
-; GFX12-NEXT:    s_xor_b32 s4, s4, 0x8000
 ; GFX12-NEXT:    s_lshr_b32 s5, s2, 16
-; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_fmac_f16 s2, s0, s1
+; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_fmac_f16 s5, s3, s4
 ; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_2)
@@ -909,8 +895,6 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div(<2 x half> %x,
 ; GFX8-LABEL: v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_xor_b32_e32 v0, 0x80008000, v0
-; GFX8-NEXT:    v_xor_b32_e32 v1, 0x80008000, v1
 ; GFX8-NEXT:    v_lshrrev_b32_e32 v5, 16, v0
 ; GFX8-NEXT:    v_lshrrev_b32_e32 v6, 16, v1
 ; GFX8-NEXT:    v_lshrrev_b32_e32 v7, 16, v2
@@ -925,7 +909,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div(<2 x half> %x,
 ; GFX900-LABEL: v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX900-NEXT:    v_pk_fma_f16 v0, v0, v1, v2 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX900-NEXT:    v_pk_fma_f16 v0, v0, v1, v2
 ; GFX900-NEXT:    global_store_dword v[3:4], v0, off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -935,7 +919,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div(<2 x half> %x,
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v6, v3
 ; GFX942-NEXT:    v_mov_b32_e32 v7, v4
-; GFX942-NEXT:    v_pk_fma_f16 v0, v0, v1, v2 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX942-NEXT:    v_pk_fma_f16 v0, v0, v1, v2
 ; GFX942-NEXT:    global_store_dword v[6:7], v0, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
@@ -943,7 +927,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div(<2 x half> %x,
 ; GFX11-LABEL: v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_pk_fma_f16 v0, v0, v1, v2 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX11-NEXT:    v_pk_fma_f16 v0, v0, v1, v2
 ; GFX11-NEXT:    global_store_b32 v[3:4], v0, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -954,7 +938,7 @@ define void @v_constained_fma_v2f16_fpexcept_strict_fneg_fneg_div(<2 x half> %x,
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_pk_fma_f16 v0, v0, v1, v2 neg_lo:[1,1,0] neg_hi:[1,1,0]
+; GFX12-NEXT:    v_pk_fma_f16 v0, v0, v1, v2
 ; GFX12-NEXT:    global_store_b32 v[3:4], v0, off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg <2 x half> %x
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f32.ll
index 07865f5b4f3c9..5d8b92258c407 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f32.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f32.ll
@@ -31,8 +31,8 @@ define void @v_constained_fma_f32_fpexcept_strict_uni(float inreg %x, float inre
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v2, s1
 ; GFX942-NEXT:    v_mov_b32_e32 v3, s2
-; GFX942-NEXT:    v_fma_f32 v2, s0, v2, v3
-; GFX942-NEXT:    global_store_dword v[0:1], v2, off
+; GFX942-NEXT:    v_fmac_f32_e32 v3, s0, v2
+; GFX942-NEXT:    global_store_dword v[0:1], v3, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -41,7 +41,7 @@ define void @v_constained_fma_f32_fpexcept_strict_uni(float inreg %x, float inre
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_fma_f32 v2, s0, s1, v2
+; GFX11-NEXT:    v_fmac_f32_e64 v2, s0, s1
 ; GFX11-NEXT:    global_store_b32 v[0:1], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -85,16 +85,16 @@ define void @v_constained_fma_f32_fpexcept_strict_div(float %x, float %y, float
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v6, v3
 ; GFX942-NEXT:    v_mov_b32_e32 v7, v4
-; GFX942-NEXT:    v_fma_f32 v0, v0, v1, v2
-; GFX942-NEXT:    global_store_dword v[6:7], v0, off
+; GFX942-NEXT:    v_fmac_f32_e32 v2, v0, v1
+; GFX942-NEXT:    global_store_dword v[6:7], v2, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_f32_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f32 v0, v0, v1, v2
-; GFX11-NEXT:    global_store_b32 v[3:4], v0, off
+; GFX11-NEXT:    v_fmac_f32_e32 v2, v0, v1
+; GFX11-NEXT:    global_store_b32 v[3:4], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_f32_fpexcept_strict_div:
@@ -104,8 +104,8 @@ define void @v_constained_fma_f32_fpexcept_strict_div(float %x, float %y, float
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f32 v0, v0, v1, v2
-; GFX12-NEXT:    global_store_b32 v[3:4], v0, off
+; GFX12-NEXT:    v_fmac_f32_e32 v2, v0, v1
+; GFX12-NEXT:    global_store_b32 v[3:4], v2, off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %val = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float %z, metadata !"round.tonearest", metadata !"fpexcept.strict")
   store float %val, ptr addrspace(1) %out
@@ -154,8 +154,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_uni(<2 x float> inreg %x, <2
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_dual_mov_b32 v2, s16 :: v_dual_mov_b32 v3, s17
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-NEXT:    v_fma_f32 v2, s0, s2, v2
-; GFX11-NEXT:    v_fma_f32 v3, s1, s3, v3
+; GFX11-NEXT:    v_fmac_f32_e64 v2, s0, s2
+; GFX11-NEXT:    v_fmac_f32_e64 v3, s1, s3
 ; GFX11-NEXT:    global_store_b64 v[0:1], v[2:3], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -208,9 +208,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_div(<2 x float> %x, <2 x flo
 ; GFX11-LABEL: v_constained_fma_v2f32_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f32 v0, v0, v2, v4
-; GFX11-NEXT:    v_fma_f32 v1, v1, v3, v5
-; GFX11-NEXT:    global_store_b64 v[6:7], v[0:1], off
+; GFX11-NEXT:    v_dual_fmac_f32 v4, v0, v2 :: v_dual_fmac_f32 v5, v1, v3
+; GFX11-NEXT:    global_store_b64 v[6:7], v[4:5], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_v2f32_fpexcept_strict_div:
@@ -220,9 +219,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_div(<2 x float> %x, <2 x flo
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f32 v0, v0, v2, v4
-; GFX12-NEXT:    v_fma_f32 v1, v1, v3, v5
-; GFX12-NEXT:    global_store_b64 v[6:7], v[0:1], off
+; GFX12-NEXT:    v_dual_fmac_f32 v4, v0, v2 :: v_dual_fmac_f32 v5, v1, v3
+; GFX12-NEXT:    global_store_b64 v[6:7], v[4:5], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %val = call <2 x float> @llvm.experimental.constrained.fma.v2f32(<2 x float> %x, <2 x float> %y, <2 x float> %z, metadata !"round.tonearest", metadata !"fpexcept.strict")
   store <2 x float> %val, ptr addrspace(1) %out
@@ -292,10 +290,10 @@ define void @v_constained_fma_v3f32_fpexcept_strict_uni(<3 x float> inreg %x, <3
 ; GFX11-NEXT:    v_dual_mov_b32 v2, s18 :: v_dual_mov_b32 v3, s19
 ; GFX11-NEXT:    v_mov_b32_e32 v4, s20
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
-; GFX11-NEXT:    v_fma_f32 v2, s0, s3, v2
-; GFX11-NEXT:    v_fma_f32 v3, s1, s16, v3
+; GFX11-NEXT:    v_fmac_f32_e64 v2, s0, s3
+; GFX11-NEXT:    v_fmac_f32_e64 v3, s1, s16
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_3)
-; GFX11-NEXT:    v_fma_f32 v4, s2, s17, v4
+; GFX11-NEXT:    v_fmac_f32_e64 v4, s2, s17
 ; GFX11-NEXT:    global_store_b96 v[0:1], v[2:4], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -363,10 +361,9 @@ define void @v_constained_fma_v3f32_fpexcept_strict_div(<3 x float> %x, <3 x flo
 ; GFX11-LABEL: v_constained_fma_v3f32_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f32 v0, v0, v3, v6
-; GFX11-NEXT:    v_fma_f32 v1, v1, v4, v7
-; GFX11-NEXT:    v_fma_f32 v2, v2, v5, v8
-; GFX11-NEXT:    global_store_b96 v[9:10], v[0:2], off
+; GFX11-NEXT:    v_dual_fmac_f32 v6, v0, v3 :: v_dual_fmac_f32 v7, v1, v4
+; GFX11-NEXT:    v_fmac_f32_e32 v8, v2, v5
+; GFX11-NEXT:    global_store_b96 v[9:10], v[6:8], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_v3f32_fpexcept_strict_div:
@@ -376,10 +373,9 @@ define void @v_constained_fma_v3f32_fpexcept_strict_div(<3 x float> %x, <3 x flo
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f32 v0, v0, v3, v6
-; GFX12-NEXT:    v_fma_f32 v1, v1, v4, v7
-; GFX12-NEXT:    v_fma_f32 v2, v2, v5, v8
-; GFX12-NEXT:    global_store_b96 v[9:10], v[0:2], off
+; GFX12-NEXT:    v_dual_fmac_f32 v6, v0, v3 :: v_dual_fmac_f32 v7, v1, v4
+; GFX12-NEXT:    v_fmac_f32_e32 v8, v2, v5
+; GFX12-NEXT:    global_store_b96 v[9:10], v[6:8], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %val = call <3 x float> @llvm.experimental.constrained.fma.v3f32(<3 x float> %x, <3 x float> %y, <3 x float> %z, metadata !"round.tonearest", metadata !"fpexcept.strict")
   store <3 x float> %val, ptr addrspace(1) %out
@@ -452,11 +448,11 @@ define void @v_constained_fma_v4f32_fpexcept_strict_uni(<4 x float> inreg %x, <4
 ; GFX11-NEXT:    v_dual_mov_b32 v2, s20 :: v_dual_mov_b32 v3, s21
 ; GFX11-NEXT:    v_dual_mov_b32 v4, s22 :: v_dual_mov_b32 v5, s23
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
-; GFX11-NEXT:    v_fma_f32 v2, s0, s16, v2
-; GFX11-NEXT:    v_fma_f32 v3, s1, s17, v3
+; GFX11-NEXT:    v_fmac_f32_e64 v2, s0, s16
+; GFX11-NEXT:    v_fmac_f32_e64 v3, s1, s17
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_4)
-; GFX11-NEXT:    v_fma_f32 v4, s2, s18, v4
-; GFX11-NEXT:    v_fma_f32 v5, s3, s19, v5
+; GFX11-NEXT:    v_fmac_f32_e64 v4, s2, s18
+; GFX11-NEXT:    v_fmac_f32_e64 v5, s3, s19
 ; GFX11-NEXT:    global_store_b128 v[0:1], v[2:5], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -517,11 +513,9 @@ define void @v_constained_fma_v4f32_fpexcept_strict_div(<4 x float> %x, <4 x flo
 ; GFX11-LABEL: v_constained_fma_v4f32_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f32 v0, v0, v4, v8
-; GFX11-NEXT:    v_fma_f32 v1, v1, v5, v9
-; GFX11-NEXT:    v_fma_f32 v2, v2, v6, v10
-; GFX11-NEXT:    v_fma_f32 v3, v3, v7, v11
-; GFX11-NEXT:    global_store_b128 v[12:13], v[0:3], off
+; GFX11-NEXT:    v_dual_fmac_f32 v8, v0, v4 :: v_dual_fmac_f32 v9, v1, v5
+; GFX11-NEXT:    v_dual_fmac_f32 v10, v2, v6 :: v_dual_fmac_f32 v11, v3, v7
+; GFX11-NEXT:    global_store_b128 v[12:13], v[8:11], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_v4f32_fpexcept_strict_div:
@@ -531,11 +525,9 @@ define void @v_constained_fma_v4f32_fpexcept_strict_div(<4 x float> %x, <4 x flo
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f32 v0, v0, v4, v8
-; GFX12-NEXT:    v_fma_f32 v1, v1, v5, v9
-; GFX12-NEXT:    v_fma_f32 v2, v2, v6, v10
-; GFX12-NEXT:    v_fma_f32 v3, v3, v7, v11
-; GFX12-NEXT:    global_store_b128 v[12:13], v[0:3], off
+; GFX12-NEXT:    v_dual_fmac_f32 v8, v0, v4 :: v_dual_fmac_f32 v9, v1, v5
+; GFX12-NEXT:    v_dual_fmac_f32 v10, v2, v6 :: v_dual_fmac_f32 v11, v3, v7
+; GFX12-NEXT:    global_store_b128 v[12:13], v[8:11], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %val = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %x, <4 x float> %y, <4 x float> %z, metadata !"round.tonearest", metadata !"fpexcept.strict")
   store <4 x float> %val, ptr addrspace(1) %out
@@ -659,7 +651,7 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_uni(float inreg %x,
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s17
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s18
-; GFX8-NEXT:    v_fma_f32 v2, -s16, -v2, v3
+; GFX8-NEXT:    v_fma_f32 v2, s16, v2, v3
 ; GFX8-NEXT:    flat_store_dword v[0:1], v2
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -669,7 +661,7 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_uni(float inreg %x,
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    v_mov_b32_e32 v2, s17
 ; GFX900-NEXT:    v_mov_b32_e32 v3, s18
-; GFX900-NEXT:    v_fma_f32 v2, -s16, -v2, v3
+; GFX900-NEXT:    v_fma_f32 v2, s16, v2, v3
 ; GFX900-NEXT:    global_store_dword v[0:1], v2, off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -679,8 +671,8 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_uni(float inreg %x,
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v2, s1
 ; GFX942-NEXT:    v_mov_b32_e32 v3, s2
-; GFX942-NEXT:    v_fma_f32 v2, -s0, -v2, v3
-; GFX942-NEXT:    global_store_dword v[0:1], v2, off
+; GFX942-NEXT:    v_fmac_f32_e32 v3, s0, v2
+; GFX942-NEXT:    global_store_dword v[0:1], v3, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -689,7 +681,7 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_uni(float inreg %x,
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_fma_f32 v2, -s0, -s1, v2
+; GFX11-NEXT:    v_fmac_f32_e64 v2, s0, s1
 ; GFX11-NEXT:    global_store_b32 v[0:1], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -700,9 +692,6 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_uni(float inreg %x,
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_xor_b32 s0, s0, 0x80000000
-; GFX12-NEXT:    s_xor_b32 s1, s1, 0x80000000
-; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_fmac_f32 s2, s0, s1
 ; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_2)
@@ -720,7 +709,7 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_div(float %x, float
 ; GFX8-LABEL: v_constained_fma_f32_fpexcept_strict_fneg_fneg_div:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_fma_f32 v0, -v0, -v1, v2
+; GFX8-NEXT:    v_fma_f32 v0, v0, v1, v2
 ; GFX8-NEXT:    flat_store_dword v[3:4], v0
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -728,7 +717,7 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_div(float %x, float
 ; GFX900-LABEL: v_constained_fma_f32_fpexcept_strict_fneg_fneg_div:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX900-NEXT:    v_fma_f32 v0, -v0, -v1, v2
+; GFX900-NEXT:    v_fma_f32 v0, v0, v1, v2
 ; GFX900-NEXT:    global_store_dword v[3:4], v0, off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -738,16 +727,16 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_div(float %x, float
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b32_e32 v6, v3
 ; GFX942-NEXT:    v_mov_b32_e32 v7, v4
-; GFX942-NEXT:    v_fma_f32 v0, -v0, -v1, v2
-; GFX942-NEXT:    global_store_dword v[6:7], v0, off
+; GFX942-NEXT:    v_fmac_f32_e32 v2, v0, v1
+; GFX942-NEXT:    global_store_dword v[6:7], v2, off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_f32_fpexcept_strict_fneg_fneg_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f32 v0, -v0, -v1, v2
-; GFX11-NEXT:    global_store_b32 v[3:4], v0, off
+; GFX11-NEXT:    v_fmac_f32_e32 v2, v0, v1
+; GFX11-NEXT:    global_store_b32 v[3:4], v2, off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_f32_fpexcept_strict_fneg_fneg_div:
@@ -757,8 +746,8 @@ define void @v_constained_fma_f32_fpexcept_strict_fneg_fneg_div(float %x, float
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f32 v0, -v0, -v1, v2
-; GFX12-NEXT:    global_store_b32 v[3:4], v0, off
+; GFX12-NEXT:    v_fmac_f32_e32 v2, v0, v1
+; GFX12-NEXT:    global_store_b32 v[3:4], v2, off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg float %x
   %neg.y = fneg float %y
@@ -886,11 +875,11 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_uni(<2 x float> in
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s18
-; GFX8-NEXT:    v_mov_b32_e32 v4, s20
+; GFX8-NEXT:    v_mov_b32_e32 v3, s20
+; GFX8-NEXT:    v_fma_f32 v2, s16, v2, v3
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s19
-; GFX8-NEXT:    v_fma_f32 v2, -s16, -v2, v4
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s21
-; GFX8-NEXT:    v_fma_f32 v3, -s17, -v3, v4
+; GFX8-NEXT:    v_fma_f32 v3, s17, v3, v4
 ; GFX8-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -899,11 +888,11 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_uni(<2 x float> in
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    v_mov_b32_e32 v2, s18
-; GFX900-NEXT:    v_mov_b32_e32 v4, s20
+; GFX900-NEXT:    v_mov_b32_e32 v3, s20
+; GFX900-NEXT:    v_fma_f32 v2, s16, v2, v3
 ; GFX900-NEXT:    v_mov_b32_e32 v3, s19
-; GFX900-NEXT:    v_fma_f32 v2, -s16, -v2, v4
 ; GFX900-NEXT:    v_mov_b32_e32 v4, s21
-; GFX900-NEXT:    v_fma_f32 v3, -s17, -v3, v4
+; GFX900-NEXT:    v_fma_f32 v3, s17, v3, v4
 ; GFX900-NEXT:    global_store_dwordx2 v[0:1], v[2:3], off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -911,15 +900,9 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_uni(<2 x float> in
 ; GFX942-LABEL: v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_uni:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX942-NEXT:    v_mov_b32_e32 v2, s0
-; GFX942-NEXT:    v_mov_b32_e32 v3, s1
-; GFX942-NEXT:    v_mov_b32_e32 v4, s2
-; GFX942-NEXT:    v_mov_b32_e32 v5, s3
-; GFX942-NEXT:    v_xor_b32_e32 v2, 0x80000000, v2
-; GFX942-NEXT:    v_xor_b32_e32 v3, 0x80000000, v3
-; GFX942-NEXT:    v_xor_b32_e32 v4, 0x80000000, v4
-; GFX942-NEXT:    v_xor_b32_e32 v5, 0x80000000, v5
-; GFX942-NEXT:    v_pk_fma_f32 v[2:3], v[2:3], v[4:5], s[16:17]
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[2:3]
+; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[16:17]
+; GFX942-NEXT:    v_pk_fma_f32 v[2:3], s[0:1], v[2:3], v[4:5]
 ; GFX942-NEXT:    global_store_dwordx2 v[0:1], v[2:3], off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
@@ -929,8 +912,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_uni(<2 x float> in
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_dual_mov_b32 v2, s16 :: v_dual_mov_b32 v3, s17
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-NEXT:    v_fma_f32 v2, -s0, -s2, v2
-; GFX11-NEXT:    v_fma_f32 v3, -s1, -s3, v3
+; GFX11-NEXT:    v_fmac_f32_e64 v2, s0, s2
+; GFX11-NEXT:    v_fmac_f32_e64 v3, s1, s3
 ; GFX11-NEXT:    global_store_b64 v[0:1], v[2:3], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -941,11 +924,6 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_uni(<2 x float> in
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_xor_b32 s0, s0, 0x80000000
-; GFX12-NEXT:    s_xor_b32 s1, s1, 0x80000000
-; GFX12-NEXT:    s_xor_b32 s2, s2, 0x80000000
-; GFX12-NEXT:    s_xor_b32 s3, s3, 0x80000000
-; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
 ; GFX12-NEXT:    s_fmac_f32 s16, s0, s2
 ; GFX12-NEXT:    s_fmac_f32 s17, s1, s3
 ; GFX12-NEXT:    s_wait_alu depctr_sa_sdst(0)
@@ -964,8 +942,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div(<2 x float> %x
 ; GFX8-LABEL: v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_fma_f32 v0, -v0, -v2, v4
-; GFX8-NEXT:    v_fma_f32 v1, -v1, -v3, v5
+; GFX8-NEXT:    v_fma_f32 v0, v0, v2, v4
+; GFX8-NEXT:    v_fma_f32 v1, v1, v3, v5
 ; GFX8-NEXT:    flat_store_dwordx2 v[6:7], v[0:1]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -973,8 +951,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div(<2 x float> %x
 ; GFX900-LABEL: v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX900-NEXT:    v_fma_f32 v0, -v0, -v2, v4
-; GFX900-NEXT:    v_fma_f32 v1, -v1, -v3, v5
+; GFX900-NEXT:    v_fma_f32 v0, v0, v2, v4
+; GFX900-NEXT:    v_fma_f32 v1, v1, v3, v5
 ; GFX900-NEXT:    global_store_dwordx2 v[6:7], v[0:1], off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -982,10 +960,6 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div(<2 x float> %x
 ; GFX942-LABEL: v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX942-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
-; GFX942-NEXT:    v_xor_b32_e32 v1, 0x80000000, v1
-; GFX942-NEXT:    v_xor_b32_e32 v2, 0x80000000, v2
-; GFX942-NEXT:    v_xor_b32_e32 v3, 0x80000000, v3
 ; GFX942-NEXT:    v_pk_fma_f32 v[0:1], v[0:1], v[2:3], v[4:5]
 ; GFX942-NEXT:    global_store_dwordx2 v[6:7], v[0:1], off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
@@ -994,9 +968,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div(<2 x float> %x
 ; GFX11-LABEL: v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f32 v0, -v0, -v2, v4
-; GFX11-NEXT:    v_fma_f32 v1, -v1, -v3, v5
-; GFX11-NEXT:    global_store_b64 v[6:7], v[0:1], off
+; GFX11-NEXT:    v_dual_fmac_f32 v4, v0, v2 :: v_dual_fmac_f32 v5, v1, v3
+; GFX11-NEXT:    global_store_b64 v[6:7], v[4:5], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div:
@@ -1006,9 +979,8 @@ define void @v_constained_fma_v2f32_fpexcept_strict_fneg_fneg_div(<2 x float> %x
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f32 v0, -v0, -v2, v4
-; GFX12-NEXT:    v_fma_f32 v1, -v1, -v3, v5
-; GFX12-NEXT:    global_store_b64 v[6:7], v[0:1], off
+; GFX12-NEXT:    v_dual_fmac_f32 v4, v0, v2 :: v_dual_fmac_f32 v5, v1, v3
+; GFX12-NEXT:    global_store_b64 v[6:7], v[4:5], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg <2 x float> %x
   %neg.y = fneg <2 x float> %y
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f64.ll
index c2dd36b9622a8..6d4186f4a556a 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/strict_fma.f64.ll
@@ -35,8 +35,8 @@ define void @v_constained_fma_f64_fpexcept_strict_uni(double inreg %x, double in
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[16:17]
-; GFX942-NEXT:    v_fma_f64 v[2:3], s[0:1], v[2:3], v[4:5]
-; GFX942-NEXT:    global_store_dwordx2 v[0:1], v[2:3], off
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[0:1], v[2:3]
+; GFX942-NEXT:    global_store_dwordx2 v[0:1], v[4:5], off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -75,13 +75,21 @@ define void @v_constained_fma_f64_fpexcept_strict_div(double %x, double %y, doub
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-LABEL: v_constained_fma_f64_fpexcept_strict_div:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
-; GFX9-NEXT:    global_store_dwordx2 v[6:7], v[0:1], off
-; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GFX900-LABEL: v_constained_fma_f64_fpexcept_strict_div:
+; GFX900:       ; %bb.0:
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
+; GFX900-NEXT:    global_store_dwordx2 v[6:7], v[0:1], off
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: v_constained_fma_f64_fpexcept_strict_div:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], v[0:1], v[2:3]
+; GFX942-NEXT:    global_store_dwordx2 v[6:7], v[4:5], off
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_f64_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
@@ -161,15 +169,15 @@ define void @v_constained_fma_v2f64_fpexcept_strict_uni(<2 x double> inreg %x, <
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[16:17]
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[20:21]
-; GFX942-NEXT:    v_fma_f64 v[2:3], s[0:1], v[2:3], v[4:5]
-; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[22:23]
-; GFX942-NEXT:    v_readfirstlane_b32 s0, v2
-; GFX942-NEXT:    v_readfirstlane_b32 s1, v3
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[0:1], v[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[18:19]
-; GFX942-NEXT:    v_fma_f64 v[2:3], s[2:3], v[2:3], v[4:5]
+; GFX942-NEXT:    v_readfirstlane_b32 s0, v4
+; GFX942-NEXT:    v_readfirstlane_b32 s1, v5
+; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[22:23]
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[2:3], v[2:3]
 ; GFX942-NEXT:    s_nop 0
-; GFX942-NEXT:    v_readfirstlane_b32 s2, v2
-; GFX942-NEXT:    v_readfirstlane_b32 s3, v3
+; GFX942-NEXT:    v_readfirstlane_b32 s2, v4
+; GFX942-NEXT:    v_readfirstlane_b32 s3, v5
 ; GFX942-NEXT:    s_nop 1
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[0:1]
@@ -238,14 +246,23 @@ define void @v_constained_fma_v2f64_fpexcept_strict_div(<2 x double> %x, <2 x do
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-LABEL: v_constained_fma_v2f64_fpexcept_strict_div:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_fma_f64 v[0:1], v[0:1], v[4:5], v[8:9]
-; GFX9-NEXT:    v_fma_f64 v[2:3], v[2:3], v[6:7], v[10:11]
-; GFX9-NEXT:    global_store_dwordx4 v[12:13], v[0:3], off
-; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GFX900-LABEL: v_constained_fma_v2f64_fpexcept_strict_div:
+; GFX900:       ; %bb.0:
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_fma_f64 v[0:1], v[0:1], v[4:5], v[8:9]
+; GFX900-NEXT:    v_fma_f64 v[2:3], v[2:3], v[6:7], v[10:11]
+; GFX900-NEXT:    global_store_dwordx4 v[12:13], v[0:3], off
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: v_constained_fma_v2f64_fpexcept_strict_div:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_fmac_f64_e32 v[8:9], v[0:1], v[4:5]
+; GFX942-NEXT:    v_fmac_f64_e32 v[10:11], v[2:3], v[6:7]
+; GFX942-NEXT:    global_store_dwordx4 v[12:13], v[8:11], off
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_v2f64_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
@@ -350,17 +367,17 @@ define void @v_constained_fma_v3f64_fpexcept_strict_uni(<3 x double> inreg %x, <
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[18:19]
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[24:25]
-; GFX942-NEXT:    v_fma_f64 v[2:3], s[0:1], v[2:3], v[4:5]
-; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[26:27]
-; GFX942-NEXT:    v_readfirstlane_b32 s0, v2
-; GFX942-NEXT:    v_readfirstlane_b32 s1, v3
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[0:1], v[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[20:21]
-; GFX942-NEXT:    v_fma_f64 v[2:3], s[2:3], v[2:3], v[4:5]
-; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[28:29]
-; GFX942-NEXT:    v_readfirstlane_b32 s2, v2
-; GFX942-NEXT:    v_readfirstlane_b32 s3, v3
+; GFX942-NEXT:    v_readfirstlane_b32 s0, v4
+; GFX942-NEXT:    v_readfirstlane_b32 s1, v5
+; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[26:27]
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[2:3], v[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[22:23]
-; GFX942-NEXT:    v_fma_f64 v[6:7], s[16:17], v[2:3], v[4:5]
+; GFX942-NEXT:    v_readfirstlane_b32 s2, v4
+; GFX942-NEXT:    v_readfirstlane_b32 s3, v5
+; GFX942-NEXT:    v_mov_b64_e32 v[6:7], s[28:29]
+; GFX942-NEXT:    v_fmac_f64_e32 v[6:7], s[16:17], v[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[0:1]
 ; GFX942-NEXT:    global_store_dwordx4 v[0:1], v[2:5], off
@@ -440,16 +457,27 @@ define void @v_constained_fma_v3f64_fpexcept_strict_div(<3 x double> %x, <3 x do
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-LABEL: v_constained_fma_v3f64_fpexcept_strict_div:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_fma_f64 v[0:1], v[0:1], v[6:7], v[12:13]
-; GFX9-NEXT:    v_fma_f64 v[2:3], v[2:3], v[8:9], v[14:15]
-; GFX9-NEXT:    v_fma_f64 v[4:5], v[4:5], v[10:11], v[16:17]
-; GFX9-NEXT:    global_store_dwordx4 v[18:19], v[0:3], off
-; GFX9-NEXT:    global_store_dwordx2 v[18:19], v[4:5], off offset:16
-; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GFX900-LABEL: v_constained_fma_v3f64_fpexcept_strict_div:
+; GFX900:       ; %bb.0:
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_fma_f64 v[0:1], v[0:1], v[6:7], v[12:13]
+; GFX900-NEXT:    v_fma_f64 v[2:3], v[2:3], v[8:9], v[14:15]
+; GFX900-NEXT:    v_fma_f64 v[4:5], v[4:5], v[10:11], v[16:17]
+; GFX900-NEXT:    global_store_dwordx4 v[18:19], v[0:3], off
+; GFX900-NEXT:    global_store_dwordx2 v[18:19], v[4:5], off offset:16
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: v_constained_fma_v3f64_fpexcept_strict_div:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_fmac_f64_e32 v[12:13], v[0:1], v[6:7]
+; GFX942-NEXT:    v_fmac_f64_e32 v[14:15], v[2:3], v[8:9]
+; GFX942-NEXT:    v_fmac_f64_e32 v[16:17], v[4:5], v[10:11]
+; GFX942-NEXT:    global_store_dwordx4 v[18:19], v[12:15], off
+; GFX942-NEXT:    global_store_dwordx2 v[18:19], v[16:17], off offset:16
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_v3f64_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
@@ -602,28 +630,27 @@ define void @v_constained_fma_v4f64_fpexcept_strict_uni(<4 x double> inreg %x, <
 ; GFX942-NEXT:    v_readfirstlane_b32 s7, v3
 ; GFX942-NEXT:    v_mov_b64_e32 v[0:1], s[20:21]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[28:29]
-; GFX942-NEXT:    v_fma_f64 v[0:1], s[0:1], v[0:1], v[2:3]
-; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[4:5]
-; GFX942-NEXT:    v_readfirstlane_b32 s0, v0
-; GFX942-NEXT:    v_readfirstlane_b32 s1, v1
+; GFX942-NEXT:    v_fmac_f64_e32 v[2:3], s[0:1], v[0:1]
 ; GFX942-NEXT:    v_mov_b64_e32 v[0:1], s[22:23]
-; GFX942-NEXT:    v_fma_f64 v[0:1], s[2:3], v[0:1], v[2:3]
-; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[6:7]
-; GFX942-NEXT:    v_readfirstlane_b32 s2, v0
-; GFX942-NEXT:    v_readfirstlane_b32 s3, v1
+; GFX942-NEXT:    v_readfirstlane_b32 s0, v2
+; GFX942-NEXT:    v_readfirstlane_b32 s1, v3
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[4:5]
+; GFX942-NEXT:    v_fmac_f64_e32 v[2:3], s[2:3], v[0:1]
 ; GFX942-NEXT:    v_mov_b64_e32 v[0:1], s[24:25]
+; GFX942-NEXT:    v_readfirstlane_b32 s2, v2
+; GFX942-NEXT:    v_readfirstlane_b32 s3, v3
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[6:7]
 ; GFX942-NEXT:    v_readfirstlane_b32 s8, v4
 ; GFX942-NEXT:    v_readfirstlane_b32 s9, v5
-; GFX942-NEXT:    v_fma_f64 v[0:1], s[16:17], v[0:1], v[2:3]
-; GFX942-NEXT:    s_nop 0
-; GFX942-NEXT:    v_readfirstlane_b32 s4, v0
-; GFX942-NEXT:    v_readfirstlane_b32 s5, v1
+; GFX942-NEXT:    v_fmac_f64_e32 v[2:3], s[16:17], v[0:1]
 ; GFX942-NEXT:    v_mov_b64_e32 v[0:1], s[26:27]
+; GFX942-NEXT:    v_readfirstlane_b32 s4, v2
+; GFX942-NEXT:    v_readfirstlane_b32 s5, v3
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[8:9]
-; GFX942-NEXT:    v_fma_f64 v[0:1], s[18:19], v[0:1], v[2:3]
+; GFX942-NEXT:    v_fmac_f64_e32 v[2:3], s[18:19], v[0:1]
 ; GFX942-NEXT:    s_nop 0
-; GFX942-NEXT:    v_readfirstlane_b32 s6, v0
-; GFX942-NEXT:    v_readfirstlane_b32 s7, v1
+; GFX942-NEXT:    v_readfirstlane_b32 s6, v2
+; GFX942-NEXT:    v_readfirstlane_b32 s7, v3
 ; GFX942-NEXT:    v_mov_b64_e32 v[0:1], s[0:1]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[2:3]
 ; GFX942-NEXT:    global_store_dwordx4 v[6:7], v[0:3], off
@@ -735,17 +762,29 @@ define void @v_constained_fma_v4f64_fpexcept_strict_div(<4 x double> %x, <4 x do
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-LABEL: v_constained_fma_v4f64_fpexcept_strict_div:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_fma_f64 v[0:1], v[0:1], v[8:9], v[16:17]
-; GFX9-NEXT:    v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
-; GFX9-NEXT:    v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
-; GFX9-NEXT:    v_fma_f64 v[6:7], v[6:7], v[14:15], v[22:23]
-; GFX9-NEXT:    global_store_dwordx4 v[24:25], v[0:3], off
-; GFX9-NEXT:    global_store_dwordx4 v[24:25], v[4:7], off offset:16
-; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GFX900-LABEL: v_constained_fma_v4f64_fpexcept_strict_div:
+; GFX900:       ; %bb.0:
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_fma_f64 v[0:1], v[0:1], v[8:9], v[16:17]
+; GFX900-NEXT:    v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
+; GFX900-NEXT:    v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
+; GFX900-NEXT:    v_fma_f64 v[6:7], v[6:7], v[14:15], v[22:23]
+; GFX900-NEXT:    global_store_dwordx4 v[24:25], v[0:3], off
+; GFX900-NEXT:    global_store_dwordx4 v[24:25], v[4:7], off offset:16
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: v_constained_fma_v4f64_fpexcept_strict_div:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_fmac_f64_e32 v[16:17], v[0:1], v[8:9]
+; GFX942-NEXT:    v_fmac_f64_e32 v[18:19], v[2:3], v[10:11]
+; GFX942-NEXT:    v_fmac_f64_e32 v[20:21], v[4:5], v[12:13]
+; GFX942-NEXT:    v_fmac_f64_e32 v[22:23], v[6:7], v[14:15]
+; GFX942-NEXT:    global_store_dwordx4 v[24:25], v[16:19], off
+; GFX942-NEXT:    global_store_dwordx4 v[24:25], v[20:23], off offset:16
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_v4f64_fpexcept_strict_div:
 ; GFX11:       ; %bb.0:
@@ -889,7 +928,7 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_uni(double inreg %x,
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s20
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s19
 ; GFX8-NEXT:    v_mov_b32_e32 v5, s21
-; GFX8-NEXT:    v_fma_f64 v[2:3], -s[16:17], -v[2:3], v[4:5]
+; GFX8-NEXT:    v_fma_f64 v[2:3], s[16:17], v[2:3], v[4:5]
 ; GFX8-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
@@ -901,7 +940,7 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_uni(double inreg %x,
 ; GFX900-NEXT:    v_mov_b32_e32 v4, s20
 ; GFX900-NEXT:    v_mov_b32_e32 v3, s19
 ; GFX900-NEXT:    v_mov_b32_e32 v5, s21
-; GFX900-NEXT:    v_fma_f64 v[2:3], -s[16:17], -v[2:3], v[4:5]
+; GFX900-NEXT:    v_fma_f64 v[2:3], s[16:17], v[2:3], v[4:5]
 ; GFX900-NEXT:    global_store_dwordx2 v[0:1], v[2:3], off
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
@@ -911,8 +950,8 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_uni(double inreg %x,
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[16:17]
-; GFX942-NEXT:    v_fma_f64 v[2:3], -s[0:1], -v[2:3], v[4:5]
-; GFX942-NEXT:    global_store_dwordx2 v[0:1], v[2:3], off
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[0:1], v[2:3]
+; GFX942-NEXT:    global_store_dwordx2 v[0:1], v[4:5], off
 ; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -921,7 +960,7 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_uni(double inreg %x,
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    v_dual_mov_b32 v2, s16 :: v_dual_mov_b32 v3, s17
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT:    v_fma_f64 v[2:3], -s[0:1], -s[2:3], v[2:3]
+; GFX11-NEXT:    v_fma_f64 v[2:3], s[0:1], s[2:3], v[2:3]
 ; GFX11-NEXT:    global_store_b64 v[0:1], v[2:3], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -934,7 +973,7 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_uni(double inreg %x,
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-NEXT:    v_dual_mov_b32 v2, s16 :: v_dual_mov_b32 v3, s17
 ; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX12-NEXT:    v_fma_f64 v[2:3], -s[0:1], -s[2:3], v[2:3]
+; GFX12-NEXT:    v_fma_f64 v[2:3], s[0:1], s[2:3], v[2:3]
 ; GFX12-NEXT:    global_store_b64 v[0:1], v[2:3], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg double %x
@@ -948,23 +987,31 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_div(double %x, doubl
 ; GFX8-LABEL: v_constained_fma_f64_fpexcept_strict_fneg_fneg_div:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[2:3], v[4:5]
+; GFX8-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
 ; GFX8-NEXT:    flat_store_dwordx2 v[6:7], v[0:1]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-LABEL: v_constained_fma_f64_fpexcept_strict_fneg_fneg_div:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[2:3], v[4:5]
-; GFX9-NEXT:    global_store_dwordx2 v[6:7], v[0:1], off
-; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GFX900-LABEL: v_constained_fma_f64_fpexcept_strict_fneg_fneg_div:
+; GFX900:       ; %bb.0:
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
+; GFX900-NEXT:    global_store_dwordx2 v[6:7], v[0:1], off
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: v_constained_fma_f64_fpexcept_strict_fneg_fneg_div:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], v[0:1], v[2:3]
+; GFX942-NEXT:    global_store_dwordx2 v[6:7], v[4:5], off
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_f64_fpexcept_strict_fneg_fneg_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[2:3], v[4:5]
+; GFX11-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
 ; GFX11-NEXT:    global_store_b64 v[6:7], v[0:1], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -975,7 +1022,7 @@ define void @v_constained_fma_f64_fpexcept_strict_fneg_fneg_div(double %x, doubl
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[2:3], v[4:5]
+; GFX12-NEXT:    v_fma_f64 v[0:1], v[0:1], v[2:3], v[4:5]
 ; GFX12-NEXT:    global_store_b64 v[6:7], v[0:1], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg double %x
@@ -1094,15 +1141,15 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_uni(<2 x double> i
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    v_mov_b32_e32 v2, s20
-; GFX8-NEXT:    v_mov_b32_e32 v6, s24
+; GFX8-NEXT:    v_mov_b32_e32 v4, s24
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s21
-; GFX8-NEXT:    v_mov_b32_e32 v7, s25
+; GFX8-NEXT:    v_mov_b32_e32 v5, s25
+; GFX8-NEXT:    v_fma_f64 v[2:3], s[16:17], v[2:3], v[4:5]
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s22
-; GFX8-NEXT:    v_fma_f64 v[2:3], -s[16:17], -v[2:3], v[6:7]
 ; GFX8-NEXT:    v_mov_b32_e32 v6, s26
 ; GFX8-NEXT:    v_mov_b32_e32 v5, s23
 ; GFX8-NEXT:    v_mov_b32_e32 v7, s27
-; GFX8-NEXT:    v_fma_f64 v[4:5], -s[18:19], -v[4:5], v[6:7]
+; GFX8-NEXT:    v_fma_f64 v[4:5], s[18:19], v[4:5], v[6:7]
 ; GFX8-NEXT:    v_readfirstlane_b32 s4, v2
 ; GFX8-NEXT:    v_readfirstlane_b32 s5, v3
 ; GFX8-NEXT:    v_readfirstlane_b32 s6, v4
@@ -1119,15 +1166,15 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_uni(<2 x double> i
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    v_mov_b32_e32 v2, s20
-; GFX900-NEXT:    v_mov_b32_e32 v6, s24
+; GFX900-NEXT:    v_mov_b32_e32 v4, s24
 ; GFX900-NEXT:    v_mov_b32_e32 v3, s21
-; GFX900-NEXT:    v_mov_b32_e32 v7, s25
+; GFX900-NEXT:    v_mov_b32_e32 v5, s25
+; GFX900-NEXT:    v_fma_f64 v[2:3], s[16:17], v[2:3], v[4:5]
 ; GFX900-NEXT:    v_mov_b32_e32 v4, s22
-; GFX900-NEXT:    v_fma_f64 v[2:3], -s[16:17], -v[2:3], v[6:7]
 ; GFX900-NEXT:    v_mov_b32_e32 v6, s26
 ; GFX900-NEXT:    v_mov_b32_e32 v5, s23
 ; GFX900-NEXT:    v_mov_b32_e32 v7, s27
-; GFX900-NEXT:    v_fma_f64 v[4:5], -s[18:19], -v[4:5], v[6:7]
+; GFX900-NEXT:    v_fma_f64 v[4:5], s[18:19], v[4:5], v[6:7]
 ; GFX900-NEXT:    v_readfirstlane_b32 s4, v2
 ; GFX900-NEXT:    v_readfirstlane_b32 s5, v3
 ; GFX900-NEXT:    v_readfirstlane_b32 s6, v4
@@ -1144,16 +1191,16 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_uni(<2 x double> i
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[16:17]
-; GFX942-NEXT:    v_mov_b64_e32 v[6:7], s[20:21]
-; GFX942-NEXT:    v_fma_f64 v[2:3], -s[0:1], -v[2:3], v[6:7]
-; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[18:19]
-; GFX942-NEXT:    v_readfirstlane_b32 s0, v2
-; GFX942-NEXT:    v_readfirstlane_b32 s1, v3
-; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[22:23]
-; GFX942-NEXT:    v_fma_f64 v[2:3], -s[2:3], -v[4:5], v[2:3]
+; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[20:21]
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[0:1], v[2:3]
+; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[18:19]
+; GFX942-NEXT:    v_readfirstlane_b32 s0, v4
+; GFX942-NEXT:    v_readfirstlane_b32 s1, v5
+; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[22:23]
+; GFX942-NEXT:    v_fmac_f64_e32 v[4:5], s[2:3], v[2:3]
 ; GFX942-NEXT:    s_nop 0
-; GFX942-NEXT:    v_readfirstlane_b32 s2, v2
-; GFX942-NEXT:    v_readfirstlane_b32 s3, v3
+; GFX942-NEXT:    v_readfirstlane_b32 s2, v4
+; GFX942-NEXT:    v_readfirstlane_b32 s3, v5
 ; GFX942-NEXT:    s_nop 1
 ; GFX942-NEXT:    v_mov_b64_e32 v[4:5], s[2:3]
 ; GFX942-NEXT:    v_mov_b64_e32 v[2:3], s[0:1]
@@ -1167,8 +1214,8 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_uni(<2 x double> i
 ; GFX11-NEXT:    v_dual_mov_b32 v4, s22 :: v_dual_mov_b32 v5, s23
 ; GFX11-NEXT:    v_dual_mov_b32 v2, s20 :: v_dual_mov_b32 v3, s21
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-NEXT:    v_fma_f64 v[4:5], -s[2:3], -s[18:19], v[4:5]
-; GFX11-NEXT:    v_fma_f64 v[2:3], -s[0:1], -s[16:17], v[2:3]
+; GFX11-NEXT:    v_fma_f64 v[4:5], s[2:3], s[18:19], v[4:5]
+; GFX11-NEXT:    v_fma_f64 v[2:3], s[0:1], s[16:17], v[2:3]
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
 ; GFX11-NEXT:    v_readfirstlane_b32 s3, v5
 ; GFX11-NEXT:    v_readfirstlane_b32 s2, v4
@@ -1191,8 +1238,8 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_uni(<2 x double> i
 ; GFX12-NEXT:    v_dual_mov_b32 v4, s22 :: v_dual_mov_b32 v5, s23
 ; GFX12-NEXT:    v_dual_mov_b32 v2, s20 :: v_dual_mov_b32 v3, s21
 ; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX12-NEXT:    v_fma_f64 v[4:5], -s[2:3], -s[18:19], v[4:5]
-; GFX12-NEXT:    v_fma_f64 v[2:3], -s[0:1], -s[16:17], v[2:3]
+; GFX12-NEXT:    v_fma_f64 v[4:5], s[2:3], s[18:19], v[4:5]
+; GFX12-NEXT:    v_fma_f64 v[2:3], s[0:1], s[16:17], v[2:3]
 ; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_3)
 ; GFX12-NEXT:    v_readfirstlane_b32 s3, v5
 ; GFX12-NEXT:    v_readfirstlane_b32 s2, v4
@@ -1216,26 +1263,35 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div(<2 x double> %
 ; GFX8-LABEL: v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[4:5], v[8:9]
-; GFX8-NEXT:    v_fma_f64 v[2:3], -v[2:3], -v[6:7], v[10:11]
+; GFX8-NEXT:    v_fma_f64 v[0:1], v[0:1], v[4:5], v[8:9]
+; GFX8-NEXT:    v_fma_f64 v[2:3], v[2:3], v[6:7], v[10:11]
 ; GFX8-NEXT:    flat_store_dwordx4 v[12:13], v[0:3]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-LABEL: v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[4:5], v[8:9]
-; GFX9-NEXT:    v_fma_f64 v[2:3], -v[2:3], -v[6:7], v[10:11]
-; GFX9-NEXT:    global_store_dwordx4 v[12:13], v[0:3], off
-; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
+; GFX900-LABEL: v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div:
+; GFX900:       ; %bb.0:
+; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    v_fma_f64 v[0:1], v[0:1], v[4:5], v[8:9]
+; GFX900-NEXT:    v_fma_f64 v[2:3], v[2:3], v[6:7], v[10:11]
+; GFX900-NEXT:    global_store_dwordx4 v[12:13], v[0:3], off
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
+; GFX900-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX942-LABEL: v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div:
+; GFX942:       ; %bb.0:
+; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    v_fmac_f64_e32 v[8:9], v[0:1], v[4:5]
+; GFX942-NEXT:    v_fmac_f64_e32 v[10:11], v[2:3], v[6:7]
+; GFX942-NEXT:    global_store_dwordx4 v[12:13], v[8:11], off
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
+; GFX942-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[4:5], v[8:9]
-; GFX11-NEXT:    v_fma_f64 v[2:3], -v[2:3], -v[6:7], v[10:11]
+; GFX11-NEXT:    v_fma_f64 v[0:1], v[0:1], v[4:5], v[8:9]
+; GFX11-NEXT:    v_fma_f64 v[2:3], v[2:3], v[6:7], v[10:11]
 ; GFX11-NEXT:    global_store_b128 v[12:13], v[0:3], off
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -1246,8 +1302,8 @@ define void @v_constained_fma_v2f64_fpexcept_strict_fneg_fneg_div(<2 x double> %
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    v_fma_f64 v[0:1], -v[0:1], -v[4:5], v[8:9]
-; GFX12-NEXT:    v_fma_f64 v[2:3], -v[2:3], -v[6:7], v[10:11]
+; GFX12-NEXT:    v_fma_f64 v[0:1], v[0:1], v[4:5], v[8:9]
+; GFX12-NEXT:    v_fma_f64 v[2:3], v[2:3], v[6:7], v[10:11]
 ; GFX12-NEXT:    global_store_b128 v[12:13], v[0:3], off
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
   %neg.x = fneg <2 x double> %x
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
index b2d5bb2faeca7..f1e2e686fc719 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 2
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink %s | FileCheck -check-prefixes=CHECK,PRELINK %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-simplifylib,instcombine %s | FileCheck -check-prefixes=CHECK,NOPRELINK %s
 
@@ -968,11 +968,13 @@ define <16 x half> @test_pow_v16f16(<16 x half> %x, <16 x half> %y) {
 }
 
 define float @test_pow_afn_f32_minsize(float %x, float %y) #0 {
+; PRELINK: Function Attrs: minsize
 ; PRELINK-LABEL: define float @test_pow_afn_f32_minsize
 ; PRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) #[[ATTR2:[0-9]+]] {
 ; PRELINK-NEXT:    [[POW:%.*]] = tail call afn float @_Z10__pow_fastff(float [[X]], float [[Y]])
 ; PRELINK-NEXT:    ret float [[POW]]
 ;
+; NOPRELINK: Function Attrs: minsize
 ; NOPRELINK-LABEL: define float @test_pow_afn_f32_minsize
 ; NOPRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) #[[ATTR2:[0-9]+]] {
 ; NOPRELINK-NEXT:    [[TMP1:%.*]] = fcmp afn oeq float [[X]], 1.000000e+00
@@ -1026,11 +1028,13 @@ define float @test_pow_afn_f32_minsize(float %x, float %y) #0 {
 }
 
 define float @test_pow_afn_f32_nnan_minsize(float %x, float %y) #0 {
+; PRELINK: Function Attrs: minsize
 ; PRELINK-LABEL: define float @test_pow_afn_f32_nnan_minsize
 ; PRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) #[[ATTR2]] {
 ; PRELINK-NEXT:    [[POW:%.*]] = tail call nnan afn float @_Z10__pow_fastff(float [[X]], float [[Y]])
 ; PRELINK-NEXT:    ret float [[POW]]
 ;
+; NOPRELINK: Function Attrs: minsize
 ; NOPRELINK-LABEL: define float @test_pow_afn_f32_nnan_minsize
 ; NOPRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) #[[ATTR2]] {
 ; NOPRELINK-NEXT:    [[TMP1:%.*]] = fcmp nnan afn oeq float [[X]], 1.000000e+00
@@ -1196,60 +1200,58 @@ define float @test_pow_afn_f32_nnan_noinline(float %x, float %y) {
 }
 
 define float @test_pow_afn_f32_strictfp(float %x, float %y) #2 {
+; PRELINK: Function Attrs: strictfp
 ; PRELINK-LABEL: define float @test_pow_afn_f32_strictfp
 ; PRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) #[[ATTR3:[0-9]+]] {
 ; PRELINK-NEXT:    [[POW:%.*]] = tail call nnan nsz afn float @_Z10__pow_fastff(float [[X]], float [[Y]]) #[[ATTR3]]
 ; PRELINK-NEXT:    ret float [[POW]]
 ;
+; NOPRELINK: Function Attrs: strictfp
 ; NOPRELINK-LABEL: define float @test_pow_afn_f32_strictfp
 ; NOPRELINK-SAME: (float [[X:%.*]], float [[Y:%.*]]) #[[ATTR3:[0-9]+]] {
-; NOPRELINK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[X]], float 1.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+; NOPRELINK-NEXT:    [[TMP1:%.*]] = fcmp nnan nsz afn oeq float [[X]], 1.000000e+00
 ; NOPRELINK-NEXT:    [[TMP2:%.*]] = select nnan nsz afn i1 [[TMP1]], float 1.000000e+00, float [[Y]]
-; NOPRELINK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+; NOPRELINK-NEXT:    [[TMP3:%.*]] = fcmp nnan nsz afn oeq float [[TMP2]], 0.000000e+00
 ; NOPRELINK-NEXT:    [[TMP4:%.*]] = select nnan nsz afn i1 [[TMP3]], float 1.000000e+00, float [[X]]
-; NOPRELINK-NEXT:    [[TMP5:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP6:%.*]] = call nnan nsz afn float @llvm.log2.f32(float [[TMP5]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP7:%.*]] = call nnan nsz afn float @llvm.experimental.constrained.fmul.f32(float [[TMP2]], float [[TMP6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP8:%.*]] = call nnan nsz afn float @llvm.exp2.f32(float [[TMP7]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP9:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP10:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP9]], float [[TMP2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP11:%.*]] = call nnan nsz afn float @llvm.experimental.constrained.fmul.f32(float [[TMP2]], float 5.000000e-01, metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP12:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP11]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP13:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP12]], float [[TMP11]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP14:%.*]] = xor i1 [[TMP13]], true
-; NOPRELINK-NEXT:    [[TMP15:%.*]] = and i1 [[TMP10]], [[TMP14]]
-; NOPRELINK-NEXT:    [[TMP16:%.*]] = select nnan nsz afn i1 [[TMP15]], float [[TMP4]], float 1.000000e+00
-; NOPRELINK-NEXT:    [[TMP17:%.*]] = call nnan nsz afn float @llvm.copysign.f32(float [[TMP8]], float [[TMP16]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP18:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP19:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP18]], float [[TMP2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP20:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP4]], float 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP21:%.*]] = xor i1 [[TMP19]], true
-; NOPRELINK-NEXT:    [[TMP22:%.*]] = and i1 [[TMP20]], [[TMP21]]
-; NOPRELINK-NEXT:    [[TMP23:%.*]] = select nnan nsz afn i1 [[TMP22]], float 0x7FF8000000000000, float [[TMP17]]
-; NOPRELINK-NEXT:    [[TMP24:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP2]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP25:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP24]], float 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP26:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP2]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP27:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP2]], float [[TMP26]], metadata !"une", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP28:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP29:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP28]], float 1.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP30:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP28]], float 1.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP31:%.*]] = xor i1 [[TMP30]], [[TMP27]]
-; NOPRELINK-NEXT:    [[TMP32:%.*]] = select nnan nsz afn i1 [[TMP31]], float 0.000000e+00, float [[TMP26]]
-; NOPRELINK-NEXT:    [[TMP33:%.*]] = select nnan nsz afn i1 [[TMP29]], float [[TMP28]], float [[TMP32]]
-; NOPRELINK-NEXT:    [[TMP34:%.*]] = select nnan nsz afn i1 [[TMP25]], float [[TMP33]], float [[TMP23]]
-; NOPRELINK-NEXT:    [[TMP35:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP36:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP35]], float 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP37:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP4]], float 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP38:%.*]] = or i1 [[TMP36]], [[TMP37]]
-; NOPRELINK-NEXT:    [[TMP39:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP40:%.*]] = xor i1 [[TMP37]], [[TMP39]]
-; NOPRELINK-NEXT:    [[TMP41:%.*]] = select nnan nsz afn i1 [[TMP40]], float 0.000000e+00, float 0x7FF0000000000000
-; NOPRELINK-NEXT:    [[TMP42:%.*]] = select nnan nsz afn i1 [[TMP15]], float [[TMP4]], float 0.000000e+00
-; NOPRELINK-NEXT:    [[TMP43:%.*]] = call nnan nsz afn float @llvm.copysign.f32(float [[TMP41]], float [[TMP42]]) #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP44:%.*]] = select nnan nsz afn i1 [[TMP38]], float [[TMP43]], float [[TMP34]]
-; NOPRELINK-NEXT:    [[TMP45:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP4]], float [[TMP2]], metadata !"uno", metadata !"fpexcept.strict") #[[ATTR3]]
-; NOPRELINK-NEXT:    [[TMP46:%.*]] = select nnan nsz afn i1 [[TMP45]], float 0x7FF8000000000000, float [[TMP44]]
-; NOPRELINK-NEXT:    ret float [[TMP46]]
+; NOPRELINK-NEXT:    [[TMP5:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR4:[0-9]+]]
+; NOPRELINK-NEXT:    [[TMP6:%.*]] = call nnan nsz afn float @llvm.log2.f32(float [[TMP5]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP7:%.*]] = fmul nnan nsz afn float [[TMP2]], [[TMP6]]
+; NOPRELINK-NEXT:    [[TMP8:%.*]] = call nnan nsz afn float @llvm.exp2.f32(float [[TMP7]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP9:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP10:%.*]] = fcmp nnan nsz afn oeq float [[TMP9]], [[TMP2]]
+; NOPRELINK-NEXT:    [[TMP11:%.*]] = fmul nnan nsz afn float [[TMP2]], 5.000000e-01
+; NOPRELINK-NEXT:    [[TMP12:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP11]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP13:%.*]] = fcmp nnan nsz afn une float [[TMP12]], [[TMP11]]
+; NOPRELINK-NEXT:    [[TMP14:%.*]] = and i1 [[TMP10]], [[TMP13]]
+; NOPRELINK-NEXT:    [[TMP15:%.*]] = select nnan nsz afn i1 [[TMP14]], float [[TMP4]], float 1.000000e+00
+; NOPRELINK-NEXT:    [[TMP16:%.*]] = call nnan nsz afn float @llvm.copysign.f32(float [[TMP8]], float [[TMP15]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP17:%.*]] = call nnan nsz afn float @llvm.trunc.f32(float [[TMP2]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP18:%.*]] = fcmp nnan nsz afn une float [[TMP17]], [[TMP2]]
+; NOPRELINK-NEXT:    [[TMP19:%.*]] = fcmp nnan nsz afn olt float [[TMP4]], 0.000000e+00
+; NOPRELINK-NEXT:    [[TMP20:%.*]] = and i1 [[TMP19]], [[TMP18]]
+; NOPRELINK-NEXT:    [[TMP21:%.*]] = select nnan nsz afn i1 [[TMP20]], float 0x7FF8000000000000, float [[TMP16]]
+; NOPRELINK-NEXT:    [[TMP22:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP2]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP23:%.*]] = fcmp nnan nsz afn oeq float [[TMP22]], 0x7FF0000000000000
+; NOPRELINK-NEXT:    [[TMP24:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP2]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP25:%.*]] = fcmp nnan nsz afn une float [[TMP2]], [[TMP24]]
+; NOPRELINK-NEXT:    [[TMP26:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP27:%.*]] = fcmp nnan nsz afn oeq float [[TMP26]], 1.000000e+00
+; NOPRELINK-NEXT:    [[TMP28:%.*]] = fcmp nnan nsz afn olt float [[TMP26]], 1.000000e+00
+; NOPRELINK-NEXT:    [[TMP29:%.*]] = xor i1 [[TMP28]], [[TMP25]]
+; NOPRELINK-NEXT:    [[TMP30:%.*]] = select nnan nsz afn i1 [[TMP29]], float 0.000000e+00, float [[TMP24]]
+; NOPRELINK-NEXT:    [[TMP31:%.*]] = select nnan nsz afn i1 [[TMP27]], float 1.000000e+00, float [[TMP30]]
+; NOPRELINK-NEXT:    [[TMP32:%.*]] = select nnan nsz afn i1 [[TMP23]], float [[TMP31]], float [[TMP21]]
+; NOPRELINK-NEXT:    [[TMP33:%.*]] = call nnan nsz afn float @llvm.fabs.f32(float [[TMP4]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP34:%.*]] = fcmp nnan nsz afn oeq float [[TMP33]], 0x7FF0000000000000
+; NOPRELINK-NEXT:    [[TMP35:%.*]] = fcmp nnan nsz afn oeq float [[TMP4]], 0.000000e+00
+; NOPRELINK-NEXT:    [[TMP36:%.*]] = or i1 [[TMP34]], [[TMP35]]
+; NOPRELINK-NEXT:    [[TMP37:%.*]] = fcmp nnan nsz afn olt float [[TMP2]], 0.000000e+00
+; NOPRELINK-NEXT:    [[TMP38:%.*]] = xor i1 [[TMP35]], [[TMP37]]
+; NOPRELINK-NEXT:    [[TMP39:%.*]] = select nnan nsz afn i1 [[TMP38]], float 0.000000e+00, float 0x7FF0000000000000
+; NOPRELINK-NEXT:    [[TMP40:%.*]] = select nnan nsz afn i1 [[TMP14]], float [[TMP4]], float 0.000000e+00
+; NOPRELINK-NEXT:    [[TMP41:%.*]] = call nnan nsz afn float @llvm.copysign.f32(float [[TMP39]], float [[TMP40]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP42:%.*]] = select nnan nsz afn i1 [[TMP36]], float [[TMP41]], float [[TMP32]]
+; NOPRELINK-NEXT:    ret float [[TMP42]]
 ;
   %pow = tail call afn nsz nnan float @_Z3powff(float %x, float %y) #2
   ret float %pow
@@ -5680,6 +5682,7 @@ define float @test_pow_f32_known_positive_x__known_integral_sitofp(float nofpcla
 }
 
 define float @test_pow_f32__known_positive_x__known_integral_y(float nofpclass(ninf nnorm nsub nzero) %x, i32 %y.int) #0 {
+; PRELINK: Function Attrs: minsize
 ; PRELINK-LABEL: define float @test_pow_f32__known_positive_x__known_integral_y
 ; PRELINK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]], i32 [[Y_INT:%.*]]) #[[ATTR2]] {
 ; PRELINK-NEXT:  entry:
@@ -5687,6 +5690,7 @@ define float @test_pow_f32__known_positive_x__known_integral_y(float nofpclass(n
 ; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z4powrff(float [[X]], float [[Y]])
 ; PRELINK-NEXT:    ret float [[CALL]]
 ;
+; NOPRELINK: Function Attrs: minsize
 ; NOPRELINK-LABEL: define float @test_pow_f32__known_positive_x__known_integral_y
 ; NOPRELINK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]], i32 [[Y_INT:%.*]]) #[[ATTR2]] {
 ; NOPRELINK-NEXT:  entry:
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
index b4182ccbf77a4..b52c213d606e5 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 2
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink %s | FileCheck -check-prefixes=CHECK,PRELINK %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-simplifylib,instcombine %s | FileCheck -check-prefixes=CHECK,NOPRELINK %s
 
@@ -864,13 +864,13 @@ define float @test_pown_fast_f32_nobuiltin(float %x, i32 %y) {
 ; PRELINK-LABEL: define float @test_pown_fast_f32_nobuiltin
 ; PRELINK-SAME: (float [[X:%.*]], i32 [[Y:%.*]]) {
 ; PRELINK-NEXT:  entry:
-; PRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z4pownfi(float [[X]], i32 [[Y]]) #[[ATTR4:[0-9]+]]
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z4pownfi(float [[X]], i32 [[Y]]) #[[ATTR3:[0-9]+]]
 ; PRELINK-NEXT:    ret float [[CALL]]
 ;
 ; NOPRELINK-LABEL: define float @test_pown_fast_f32_nobuiltin
 ; NOPRELINK-SAME: (float [[X:%.*]], i32 [[Y:%.*]]) {
 ; NOPRELINK-NEXT:  entry:
-; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z4pownfi(float [[X]], i32 [[Y]]) #[[ATTR3:[0-9]+]]
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z4pownfi(float [[X]], i32 [[Y]]) #[[ATTR2:[0-9]+]]
 ; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
 entry:
@@ -879,20 +879,37 @@ entry:
 }
 
 define float @test_pown_fast_f32_strictfp(float %x, i32 %y) #1 {
-; CHECK-LABEL: define float @test_pown_fast_f32_strictfp
-; CHECK-SAME: (float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[__FABS:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR0]]
-; CHECK-NEXT:    [[__LOG2:%.*]] = call fast float @llvm.log2.f32(float [[__FABS]]) #[[ATTR0]]
-; CHECK-NEXT:    [[POWNI2F:%.*]] = call fast float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[__YLOGX:%.*]] = call fast float @llvm.experimental.constrained.fmul.f32(float [[POWNI2F]], float [[__LOG2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[__EXP2:%.*]] = call fast nofpclass(nan ninf nzero nsub nnorm) float @llvm.exp2.f32(float [[__YLOGX]]) #[[ATTR0]]
-; CHECK-NEXT:    [[__YEVEN:%.*]] = shl i32 [[Y]], 31
-; CHECK-NEXT:    [[TMP0:%.*]] = bitcast float [[X]] to i32
-; CHECK-NEXT:    [[__POW_SIGN:%.*]] = and i32 [[__YEVEN]], [[TMP0]]
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32 [[__POW_SIGN]] to float
-; CHECK-NEXT:    [[__POW_SIGN1:%.*]] = call fast float @llvm.copysign.f32(float [[__EXP2]], float [[TMP1]]) #[[ATTR0]]
-; CHECK-NEXT:    ret float [[__POW_SIGN1]]
+; PRELINK: Function Attrs: strictfp
+; PRELINK-LABEL: define float @test_pown_fast_f32_strictfp
+; PRELINK-SAME: (float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+; PRELINK-NEXT:  entry:
+; PRELINK-NEXT:    [[__FABS:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR4:[0-9]+]]
+; PRELINK-NEXT:    [[__LOG2:%.*]] = call fast float @llvm.log2.f32(float [[__FABS]]) #[[ATTR4]]
+; PRELINK-NEXT:    [[POWNI2F:%.*]] = sitofp i32 [[Y]] to float
+; PRELINK-NEXT:    [[__YLOGX:%.*]] = fmul fast float [[__LOG2]], [[POWNI2F]]
+; PRELINK-NEXT:    [[__EXP2:%.*]] = call fast nofpclass(nan ninf nzero nsub nnorm) float @llvm.exp2.f32(float [[__YLOGX]]) #[[ATTR4]]
+; PRELINK-NEXT:    [[__YEVEN:%.*]] = shl i32 [[Y]], 31
+; PRELINK-NEXT:    [[TMP0:%.*]] = bitcast float [[X]] to i32
+; PRELINK-NEXT:    [[__POW_SIGN:%.*]] = and i32 [[__YEVEN]], [[TMP0]]
+; PRELINK-NEXT:    [[TMP1:%.*]] = bitcast i32 [[__POW_SIGN]] to float
+; PRELINK-NEXT:    [[__POW_SIGN1:%.*]] = call fast float @llvm.copysign.f32(float [[__EXP2]], float [[TMP1]]) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[__POW_SIGN1]]
+;
+; NOPRELINK: Function Attrs: strictfp
+; NOPRELINK-LABEL: define float @test_pown_fast_f32_strictfp
+; NOPRELINK-SAME: (float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+; NOPRELINK-NEXT:  entry:
+; NOPRELINK-NEXT:    [[__FABS:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR3:[0-9]+]]
+; NOPRELINK-NEXT:    [[__LOG2:%.*]] = call fast float @llvm.log2.f32(float [[__FABS]]) #[[ATTR3]]
+; NOPRELINK-NEXT:    [[POWNI2F:%.*]] = sitofp i32 [[Y]] to float
+; NOPRELINK-NEXT:    [[__YLOGX:%.*]] = fmul fast float [[__LOG2]], [[POWNI2F]]
+; NOPRELINK-NEXT:    [[__EXP2:%.*]] = call fast nofpclass(nan ninf nzero nsub nnorm) float @llvm.exp2.f32(float [[__YLOGX]]) #[[ATTR3]]
+; NOPRELINK-NEXT:    [[__YEVEN:%.*]] = shl i32 [[Y]], 31
+; NOPRELINK-NEXT:    [[TMP0:%.*]] = bitcast float [[X]] to i32
+; NOPRELINK-NEXT:    [[__POW_SIGN:%.*]] = and i32 [[__YEVEN]], [[TMP0]]
+; NOPRELINK-NEXT:    [[TMP1:%.*]] = bitcast i32 [[__POW_SIGN]] to float
+; NOPRELINK-NEXT:    [[__POW_SIGN1:%.*]] = call fast float @llvm.copysign.f32(float [[__EXP2]], float [[TMP1]]) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[__POW_SIGN1]]
 ;
 entry:
   %call = tail call fast float @_Z4pownfi(float %x, i32 %y) #1
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
index 337ccb4a2d0e9..ee2a32e4bcf04 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --check-globals all --version 4
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-simplifylib,instcombine -amdgpu-prelink %s | FileCheck -check-prefixes=CHECK,PRELINK %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-simplifylib,instcombine %s | FileCheck -check-prefixes=CHECK,NOPRELINK %s
 
@@ -514,6 +514,7 @@ entry:
 }
 
 define float @test_rootn_f32__y_1__strictfp(float %x) #1 {
+; CHECK: Function Attrs: strictfp
 ; CHECK-LABEL: define float @test_rootn_f32__y_1__strictfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
@@ -537,6 +538,7 @@ entry:
 }
 
 define <2 x float> @test_rootn_v2f32__y_1__strictfp(<2 x float> %x) #1 {
+; CHECK: Function Attrs: strictfp
 ; CHECK-LABEL: define <2 x float> @test_rootn_v2f32__y_1__strictfp(
 ; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
@@ -906,6 +908,7 @@ entry:
 }
 
 define float @test_rootn_f32__y_neg2__strictfp(float %x) #1 {
+; CHECK: Function Attrs: strictfp
 ; CHECK-LABEL: define float @test_rootn_f32__y_neg2__strictfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
@@ -918,11 +921,17 @@ entry:
 }
 
 define float @test_rootn_f32__y_neg2__noinline(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_neg2__noinline(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[__ROOTN2RSQRT:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR3:[0-9]+]]
-; CHECK-NEXT:    ret float [[__ROOTN2RSQRT]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_neg2__noinline(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:  entry:
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR3:[0-9]+]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_neg2__noinline(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:  entry:
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR2:[0-9]+]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
 entry:
   %call = tail call float @_Z5rootnfi(float %x, i32 -2) #2
@@ -930,11 +939,17 @@ entry:
 }
 
 define float @test_rootn_f32__y_neg2__nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_neg2__nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR4:[0-9]+]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_neg2__nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:  entry:
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR4:[0-9]+]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_neg2__nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:  entry:
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR3:[0-9]+]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
 entry:
   %call = tail call float @_Z5rootnfi(float %x, i32 -2) #0
@@ -968,6 +983,7 @@ entry:
 }
 
 define <2 x float> @test_rootn_v2f32__y_neg2__strictfp(<2 x float> %x) #1 {
+; CHECK: Function Attrs: strictfp
 ; CHECK-LABEL: define <2 x float> @test_rootn_v2f32__y_neg2__strictfp(
 ; CHECK-SAME: <2 x float> [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
@@ -1258,11 +1274,17 @@ entry:
 }
 
 define float @test_rootn_fast_f32_nobuiltin(float %x, i32 %y) {
-; CHECK-LABEL: define float @test_rootn_fast_f32_nobuiltin(
-; CHECK-SAME: float [[X:%.*]], i32 [[Y:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z5rootnfi(float [[X]], i32 [[Y]]) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_fast_f32_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]], i32 [[Y:%.*]]) {
+; PRELINK-NEXT:  entry:
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z5rootnfi(float [[X]], i32 [[Y]]) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_fast_f32_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]], i32 [[Y:%.*]]) {
+; NOPRELINK-NEXT:  entry:
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z5rootnfi(float [[X]], i32 [[Y]]) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
 entry:
   %call = tail call fast float @_Z5rootnfi(float %x, i32 %y) #0
@@ -1270,36 +1292,35 @@ entry:
 }
 
 define float @test_rootn_fast_f32_strictfp(float %x, i32 %y) #1 {
+; PRELINK: Function Attrs: strictfp
 ; PRELINK-LABEL: define float @test_rootn_fast_f32_strictfp(
 ; PRELINK-SAME: float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR0]] {
 ; PRELINK-NEXT:  entry:
 ; PRELINK-NEXT:    [[CALL:%.*]] = tail call fast float @_Z12__rootn_fastfi(float [[X]], i32 [[Y]]) #[[ATTR0]]
 ; PRELINK-NEXT:    ret float [[CALL]]
 ;
+; NOPRELINK: Function Attrs: strictfp
 ; NOPRELINK-LABEL: define float @test_rootn_fast_f32_strictfp(
 ; NOPRELINK-SAME: float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR0]] {
 ; NOPRELINK-NEXT:  entry:
-; NOPRELINK-NEXT:    [[TMP0:%.*]] = call fast float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP1:%.*]] = call fast float @llvm.experimental.constrained.fdiv.f32(float 1.000000e+00, float [[TMP0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP2:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP3:%.*]] = call fast float @llvm.log2.f32(float [[TMP2]]) #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP4:%.*]] = call fast float @llvm.experimental.constrained.fmul.f32(float [[TMP1]], float [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP5:%.*]] = call fast float @llvm.exp2.f32(float [[TMP4]]) #[[ATTR0]]
+; NOPRELINK-NEXT:    [[TMP0:%.*]] = sitofp i32 [[Y]] to float
+; NOPRELINK-NEXT:    [[TMP1:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR4:[0-9]+]]
+; NOPRELINK-NEXT:    [[TMP2:%.*]] = call fast float @llvm.log2.f32(float [[TMP1]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP3:%.*]] = fdiv fast float [[TMP2]], [[TMP0]]
+; NOPRELINK-NEXT:    [[TMP4:%.*]] = call fast float @llvm.exp2.f32(float [[TMP3]]) #[[ATTR4]]
 ; NOPRELINK-NEXT:    [[TMP6:%.*]] = and i32 [[Y]], 1
 ; NOPRELINK-NEXT:    [[DOTNOT:%.*]] = icmp eq i32 [[TMP6]], 0
 ; NOPRELINK-NEXT:    [[TMP7:%.*]] = select fast i1 [[DOTNOT]], float 1.000000e+00, float [[X]]
-; NOPRELINK-NEXT:    [[TMP8:%.*]] = call fast float @llvm.copysign.f32(float [[TMP5]], float [[TMP7]]) #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP9:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP10:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[TMP9]], float 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP11:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[X]], float 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP12:%.*]] = or i1 [[TMP10]], [[TMP11]]
+; NOPRELINK-NEXT:    [[TMP9:%.*]] = call fast float @llvm.copysign.f32(float [[TMP4]], float [[TMP7]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP8:%.*]] = call fast float @llvm.fabs.f32(float [[X]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP11:%.*]] = fcmp fast oeq float [[X]], 0.000000e+00
 ; NOPRELINK-NEXT:    [[TMP13:%.*]] = icmp slt i32 [[Y]], 0
 ; NOPRELINK-NEXT:    [[TMP14:%.*]] = xor i1 [[TMP11]], [[TMP13]]
 ; NOPRELINK-NEXT:    [[TMP15:%.*]] = select fast i1 [[TMP14]], float 0.000000e+00, float 0x7FF0000000000000
 ; NOPRELINK-NEXT:    [[TMP16:%.*]] = select fast i1 [[DOTNOT]], float 0.000000e+00, float [[X]]
-; NOPRELINK-NEXT:    [[TMP17:%.*]] = call fast float @llvm.copysign.f32(float [[TMP15]], float [[TMP16]]) #[[ATTR0]]
-; NOPRELINK-NEXT:    [[TMP18:%.*]] = select fast i1 [[TMP12]], float [[TMP17]], float [[TMP8]]
-; NOPRELINK-NEXT:    [[TMP19:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[X]], float 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR0]]
+; NOPRELINK-NEXT:    [[TMP17:%.*]] = call fast float @llvm.copysign.f32(float [[TMP15]], float [[TMP16]]) #[[ATTR4]]
+; NOPRELINK-NEXT:    [[TMP18:%.*]] = select fast i1 [[TMP11]], float [[TMP17]], float [[TMP9]]
+; NOPRELINK-NEXT:    [[TMP19:%.*]] = fcmp fast olt float [[X]], 0.000000e+00
 ; NOPRELINK-NEXT:    [[TMP20:%.*]] = and i1 [[TMP19]], [[DOTNOT]]
 ; NOPRELINK-NEXT:    [[TMP21:%.*]] = icmp eq i32 [[Y]], 0
 ; NOPRELINK-NEXT:    [[TMP22:%.*]] = or i1 [[TMP20]], [[TMP21]]
@@ -1885,60 +1906,90 @@ entry:
 }
 
 define float @test_rootn_f32__y_0_nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_0_nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 0) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_0_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 0) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_0_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 0) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
   %call = tail call float @_Z5rootnfi(float %x, i32 0) #0
   ret float %call
 }
 
 define float @test_rootn_f32__y_1_nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_1_nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 1) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_1_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 1) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_1_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 1) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
   %call = tail call float @_Z5rootnfi(float %x, i32 1) #0
   ret float %call
 }
 
 define float @test_rootn_f32__y_2_nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_2_nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 2) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_2_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 2) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_2_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 2) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
   %call = tail call float @_Z5rootnfi(float %x, i32 2) #0
   ret float %call
 }
 
 define float @test_rootn_f32__y_3_nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_3_nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 3) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_3_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 3) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_3_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 3) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
   %call = tail call float @_Z5rootnfi(float %x, i32 3) #0
   ret float %call
 }
 
 define float @test_rootn_f32__y_neg1_nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_neg1_nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -1) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_neg1_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -1) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_neg1_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -1) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
   %call = tail call float @_Z5rootnfi(float %x, i32 -1) #0
   ret float %call
 }
 
 define float @test_rootn_f32__y_neg2_nobuiltin(float %x) {
-; CHECK-LABEL: define float @test_rootn_f32__y_neg2_nobuiltin(
-; CHECK-SAME: float [[X:%.*]]) {
-; CHECK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR4]]
-; CHECK-NEXT:    ret float [[CALL]]
+; PRELINK-LABEL: define float @test_rootn_f32__y_neg2_nobuiltin(
+; PRELINK-SAME: float [[X:%.*]]) {
+; PRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR4]]
+; PRELINK-NEXT:    ret float [[CALL]]
+;
+; NOPRELINK-LABEL: define float @test_rootn_f32__y_neg2_nobuiltin(
+; NOPRELINK-SAME: float [[X:%.*]]) {
+; NOPRELINK-NEXT:    [[CALL:%.*]] = tail call float @_Z5rootnfi(float [[X]], i32 -2) #[[ATTR3]]
+; NOPRELINK-NEXT:    ret float [[CALL]]
 ;
   %call = tail call float @_Z5rootnfi(float %x, i32 -2) #0
   ret float %call
@@ -1962,9 +2013,9 @@ attributes #2 = { noinline }
 ;.
 ; NOPRELINK: attributes #[[ATTR0]] = { strictfp }
 ; NOPRELINK: attributes #[[ATTR1:[0-9]+]] = { nocallback nocreateundeforpoison nofree nosync nounwind speculatable willreturn memory(none) }
-; NOPRELINK: attributes #[[ATTR2:[0-9]+]] = { nocallback nofree nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite) }
-; NOPRELINK: attributes #[[ATTR3]] = { noinline }
-; NOPRELINK: attributes #[[ATTR4]] = { nobuiltin }
+; NOPRELINK: attributes #[[ATTR2]] = { noinline }
+; NOPRELINK: attributes #[[ATTR3]] = { nobuiltin }
+; NOPRELINK: attributes #[[ATTR4]] = { strictfp memory(inaccessiblemem: readwrite) }
 ;.
 ; PRELINK: [[META0:![0-9]+]] = !{i32 1, !"amdgpu-libcall-have-fast-pow", i32 1}
 ; PRELINK: [[META1]] = !{float 2.000000e+00}
diff --git a/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll b/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
index 95d2f07402dd4..1f4c0a2273ad0 100644
--- a/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
+++ b/llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll
@@ -8347,7 +8347,7 @@ define double @v_constrained_fmul_32_f64(double %x, double %y) #0 {
 ; GFX10-GISEL-LABEL: v_constrained_fmul_32_f64:
 ; GFX10-GISEL:       ; %bb.0:
 ; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], 0x40400000
+; GFX10-GISEL-NEXT:    v_mul_f64 v[0:1], 0x40400000, v[0:1]
 ; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-SDAG-LABEL: v_constrained_fmul_32_f64:
@@ -8359,7 +8359,7 @@ define double @v_constrained_fmul_32_f64(double %x, double %y) #0 {
 ; GFX11-GISEL-LABEL: v_constrained_fmul_32_f64:
 ; GFX11-GISEL:       ; %bb.0:
 ; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], 0x40400000
+; GFX11-GISEL-NEXT:    v_mul_f64 v[0:1], 0x40400000, v[0:1]
 ; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %val = call double @llvm.experimental.constrained.fmul.f64(double %x, double 32.0, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret double %val
@@ -8389,7 +8389,7 @@ define double @v_constrained_fmul_0x1p64_f64(double %x, double %y) #0 {
 ; GFX10-GISEL-LABEL: v_constrained_fmul_0x1p64_f64:
 ; GFX10-GISEL:       ; %bb.0:
 ; GFX10-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], 0x43f00000
+; GFX10-GISEL-NEXT:    v_mul_f64 v[0:1], 0x43f00000, v[0:1]
 ; GFX10-GISEL-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-SDAG-LABEL: v_constrained_fmul_0x1p64_f64:
@@ -8401,7 +8401,7 @@ define double @v_constrained_fmul_0x1p64_f64(double %x, double %y) #0 {
 ; GFX11-GISEL-LABEL: v_constrained_fmul_0x1p64_f64:
 ; GFX11-GISEL:       ; %bb.0:
 ; GFX11-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-GISEL-NEXT:    v_mul_f64 v[0:1], v[0:1], 0x43f00000
+; GFX11-GISEL-NEXT:    v_mul_f64 v[0:1], 0x43f00000, v[0:1]
 ; GFX11-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %val = call double @llvm.experimental.constrained.fmul.f64(double %x, double 18446744073709551616.0, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret double %val
diff --git a/llvm/test/CodeGen/AMDGPU/fsub-as-fneg-src-modifier.ll b/llvm/test/CodeGen/AMDGPU/fsub-as-fneg-src-modifier.ll
index a19fd63a37162..f02b8e9a86713 100644
--- a/llvm/test/CodeGen/AMDGPU/fsub-as-fneg-src-modifier.ll
+++ b/llvm/test/CodeGen/AMDGPU/fsub-as-fneg-src-modifier.ll
@@ -944,36 +944,54 @@ define <2 x half> @no_fold_v2f16_select_user_fsub_into_fneg_modifier_dynamic(i1
 }
 
 define float @fold_f32_strict_fsub_into_fneg_modifier_ieee(float %v0, float %v1) #3 {
-; CHECK-LABEL: fold_f32_strict_fsub_into_fneg_modifier_ieee:
-; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    v_sub_f32_e32 v0, 0x80000000, v0
-; CHECK-NEXT:    v_mul_f32_e32 v0, v0, v1
-; CHECK-NEXT:    s_setpc_b64 s[30:31]
+; SDAG-LABEL: fold_f32_strict_fsub_into_fneg_modifier_ieee:
+; SDAG:       ; %bb.0:
+; SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SDAG-NEXT:    v_mul_f32_e64 v0, -v0, v1
+; SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GISEL-LABEL: fold_f32_strict_fsub_into_fneg_modifier_ieee:
+; GISEL:       ; %bb.0:
+; GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-NEXT:    v_max_f32_e64 v0, -v0, -v0
+; GISEL-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GISEL-NEXT:    s_setpc_b64 s[30:31]
   %sub = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %v0, metadata !"round.dynamic", metadata !"fpexcept.strict")
   %mul = call float @llvm.experimental.constrained.fmul.f32(float %sub, float %v1, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %mul
 }
 
 define float @fold_f32_strict_fsub_into_fneg_modifier_daz(float %v0, float %v1) #4 {
-; CHECK-LABEL: fold_f32_strict_fsub_into_fneg_modifier_daz:
-; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    v_sub_f32_e32 v0, 0x80000000, v0
-; CHECK-NEXT:    v_mul_f32_e32 v0, v0, v1
-; CHECK-NEXT:    s_setpc_b64 s[30:31]
+; SDAG-LABEL: fold_f32_strict_fsub_into_fneg_modifier_daz:
+; SDAG:       ; %bb.0:
+; SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SDAG-NEXT:    v_mul_f32_e64 v0, -v0, v1
+; SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GISEL-LABEL: fold_f32_strict_fsub_into_fneg_modifier_daz:
+; GISEL:       ; %bb.0:
+; GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-NEXT:    v_max_f32_e64 v0, -v0, -v0
+; GISEL-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GISEL-NEXT:    s_setpc_b64 s[30:31]
   %sub = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %v0, metadata !"round.dynamic", metadata !"fpexcept.strict")
   %mul = call float @llvm.experimental.constrained.fmul.f32(float %sub, float %v1, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %mul
 }
 
 define float @fold_f32_strict_fsub_into_fneg_modifier_dynamic(float %v0, float %v1) #5 {
-; CHECK-LABEL: fold_f32_strict_fsub_into_fneg_modifier_dynamic:
-; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    v_sub_f32_e32 v0, 0x80000000, v0
-; CHECK-NEXT:    v_mul_f32_e32 v0, v0, v1
-; CHECK-NEXT:    s_setpc_b64 s[30:31]
+; SDAG-LABEL: fold_f32_strict_fsub_into_fneg_modifier_dynamic:
+; SDAG:       ; %bb.0:
+; SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SDAG-NEXT:    v_mul_f32_e64 v0, -v0, v1
+; SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GISEL-LABEL: fold_f32_strict_fsub_into_fneg_modifier_dynamic:
+; GISEL:       ; %bb.0:
+; GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-NEXT:    v_max_f32_e64 v0, -v0, -v0
+; GISEL-NEXT:    v_mul_f32_e32 v0, v0, v1
+; GISEL-NEXT:    s_setpc_b64 s[30:31]
   %sub = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %v0, metadata !"round.dynamic", metadata !"fpexcept.strict")
   %mul = call float @llvm.experimental.constrained.fmul.f32(float %sub, float %v1, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %mul
diff --git a/llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll b/llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll
index 8a61b8f5eeda5..43cb81c8db11c 100644
--- a/llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll
+++ b/llvm/test/CodeGen/AMDGPU/global_atomic_optimizer_fp_rtn.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes
 ; RUN:  opt -S -mtriple=amdgcn-- -mcpu=gfx906 -passes='amdgpu-atomic-optimizer<strategy=iterative>,verify<domtree>' %s | FileCheck --check-prefixes=IR,IR-ITERATIVE %s
 ; RUN:  opt -S -mtriple=amdgcn-- -mcpu=gfx906 -passes='amdgpu-atomic-optimizer<strategy=dpp>,verify<domtree>' %s | FileCheck --check-prefixes=IR,IR-DPP %s
 
@@ -7,6 +7,7 @@
 ; strategies are valid for only divergent values. This optimization is valid for divergent addresses. Test also covers different scopes.
 
 define amdgpu_ps float @global_atomic_fadd_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
@@ -43,6 +44,7 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_uni_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_scope_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
@@ -85,6 +87,7 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_scope_agent_sco
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
@@ -132,92 +135,63 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_scope_agent_sco
 }
 
 define amdgpu_ps float @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #1 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7:[0-9]+]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("one-as") monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL]], float [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP18]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
-; IR-ITERATIVE-NEXT:    br label [[TMP23]]
-; IR-ITERATIVE:       23:
-; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-ITERATIVE-NEXT:    ret float [[TMP24]]
-;
-; IR-DPP-LABEL: @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8:[0-9]+]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("one-as") monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL]], float [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP18]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
-; IR-DPP-NEXT:    br label [[TMP23]]
-; IR-DPP:       23:
-; IR-DPP-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-DPP-NEXT:    ret float [[TMP24]]
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9:[0-9]+]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to float
+; IR-NEXT:    [[TMP12:%.*]] = fmul float [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("one-as") monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
+; IR-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP19:%.*]] = uitofp i32 [[TMP8]] to float
+; IR-NEXT:    [[TMP20:%.*]] = fmul float [[VAL]], [[TMP19]]
+; IR-NEXT:    [[TMP21:%.*]] = fadd float [[TMP18]], [[TMP20]]
+; IR-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
+; IR-NEXT:    br label [[TMP23]]
+; IR:       23:
+; IR-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
+; IR-NEXT:    ret float [[TMP24]]
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float %val syncscope("one-as") monotonic
   ret float %result
 }
 
 define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float %val) #1 {
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP23:%.*]] syncscope("one-as") monotonic, align 4
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP14]], float [[TMP22:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = fadd float [[TMP14]], [[TMP22:%.*]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], float [[TMP14]], float [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -227,11 +201,11 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_one_as_scope_un
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ -0.000000e+00, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = fadd float [[ACCUMULATOR]], [[TMP21]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -241,32 +215,33 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_one_as_scope_un
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP9]], float [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP11]], float [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP13]], float [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP15]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP17]], float [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP19]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd float [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd float [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd float [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd float [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -274,9 +249,9 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_one_as_scope_un
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP30]], float [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = fadd float [[TMP30]], [[TMP31]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], float [[TMP30]], float [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -288,92 +263,63 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_one_as_scope_un
 }
 
 define amdgpu_ps float @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("agent") monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL]], float [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP18]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
-; IR-ITERATIVE-NEXT:    br label [[TMP23]]
-; IR-ITERATIVE:       23:
-; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-ITERATIVE-NEXT:    ret float [[TMP24]]
-;
-; IR-DPP-LABEL: @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("agent") monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL]], float [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP18]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
-; IR-DPP-NEXT:    br label [[TMP23]]
-; IR-DPP:       23:
-; IR-DPP-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-DPP-NEXT:    ret float [[TMP24]]
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to float
+; IR-NEXT:    [[TMP12:%.*]] = fmul float [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("agent") monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
+; IR-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP19:%.*]] = uitofp i32 [[TMP8]] to float
+; IR-NEXT:    [[TMP20:%.*]] = fmul float [[VAL]], [[TMP19]]
+; IR-NEXT:    [[TMP21:%.*]] = fadd float [[TMP18]], [[TMP20]]
+; IR-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
+; IR-NEXT:    br label [[TMP23]]
+; IR:       23:
+; IR-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
+; IR-NEXT:    ret float [[TMP24]]
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float %val syncscope("agent") monotonic
   ret float %result
 }
 
 define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, float %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fsub_uni_address_div_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], float [[TMP23:%.*]] syncscope("agent") monotonic, align 4
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[TMP14]], float [[TMP22:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = fsub float [[TMP14]], [[TMP22:%.*]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], float [[TMP14]], float [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -383,11 +329,11 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ -0.000000e+00, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = fadd float [[ACCUMULATOR]], [[TMP21]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -397,32 +343,33 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fsub_uni_address_div_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP9]], float [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP11]], float [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP13]], float [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP15]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP17]], float [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP19]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd float [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd float [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd float [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd float [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -430,9 +377,9 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[TMP30]], float [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = fsub float [[TMP30]], [[TMP31]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], float [[TMP30]], float [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -444,6 +391,7 @@ define amdgpu_ps float @global_atomic_fsub_uni_address_div_value_agent_scope_str
 }
 
 define amdgpu_ps float @global_atomic_fmin_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
@@ -476,6 +424,7 @@ define amdgpu_ps float @global_atomic_fmin_uni_address_uni_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fmin_uni_address_div_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmin_uni_address_div_value_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
@@ -518,6 +467,7 @@ define amdgpu_ps float @global_atomic_fmin_uni_address_div_value_agent_scope_uns
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmin_uni_address_div_value_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
@@ -565,84 +515,59 @@ define amdgpu_ps float @global_atomic_fmin_uni_address_div_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #1{
-; IR-ITERATIVE-LABEL: @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-ITERATIVE:       10:
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP12]]
-; IR-ITERATIVE:       12:
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP11]], [[TMP10]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP9]], float 0x7FF8000000000000, float [[VAL]]
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP14]], float [[TMP16]], metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = select i1 [[TMP9]], float [[TMP14]], float [[TMP17]]
-; IR-ITERATIVE-NEXT:    br label [[TMP19]]
-; IR-ITERATIVE:       19:
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP18]], [[TMP12]] ]
-; IR-ITERATIVE-NEXT:    ret float [[TMP20]]
-;
-; IR-DPP-LABEL: @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-DPP:       10:
-; IR-DPP-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP12]]
-; IR-DPP:       12:
-; IR-DPP-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP11]], [[TMP10]] ]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = select i1 [[TMP9]], float 0x7FF8000000000000, float [[VAL]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP14]], float [[TMP16]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = select i1 [[TMP9]], float [[TMP14]], float [[TMP17]]
-; IR-DPP-NEXT:    br label [[TMP19]]
-; IR-DPP:       19:
-; IR-DPP-NEXT:    [[TMP20:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP18]], [[TMP12]] ]
-; IR-DPP-NEXT:    ret float [[TMP20]]
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
+; IR:       10:
+; IR-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
+; IR-NEXT:    br label [[TMP12]]
+; IR:       12:
+; IR-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP11]], [[TMP10]] ]
+; IR-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP15:%.*]] = uitofp i32 [[TMP8]] to float
+; IR-NEXT:    [[TMP16:%.*]] = select i1 [[TMP9]], float 0x7FF8000000000000, float [[VAL]]
+; IR-NEXT:    [[TMP17:%.*]] = call float @llvm.maxnum.f32(float [[TMP14]], float [[TMP16]]) #[[ATTR10:[0-9]+]]
+; IR-NEXT:    [[TMP18:%.*]] = select i1 [[TMP9]], float [[TMP14]], float [[TMP17]]
+; IR-NEXT:    br label [[TMP19]]
+; IR:       19:
+; IR-NEXT:    [[TMP20:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP18]], [[TMP12]] ]
+; IR-NEXT:    ret float [[TMP20]]
 ;
   %result = atomicrmw fmax ptr addrspace(1) %ptr, float %val syncscope("agent") monotonic
   ret float %result
 }
 
 define amdgpu_ps float @global_atomic_fmax_uni_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float %val) #1{
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmax_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[TMP23:%.*]] syncscope("agent") monotonic, align 4
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP14]], float [[TMP22:%.*]], metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call float @llvm.maxnum.f32(float [[TMP14]], float [[TMP22:%.*]]) #[[ATTR10]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], float [[TMP14]], float [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -652,11 +577,11 @@ define amdgpu_ps float @global_atomic_fmax_uni_address_div_value_agent_scope_uns
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ 0x7FF8000000000000, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call float @llvm.experimental.constrained.maxnum.f32(float [[ACCUMULATOR]], float [[TMP21]], metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = call float @llvm.maxnum.f32(float [[ACCUMULATOR]], float [[TMP21]]) #[[ATTR10]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -666,32 +591,33 @@ define amdgpu_ps float @global_atomic_fmax_uni_address_div_value_agent_scope_uns
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmax_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float 0x7FF8000000000000) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP9]], float [[TMP10]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP11]], float [[TMP12]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP13]], float [[TMP14]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP15]], float [[TMP16]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP17]], float [[TMP18]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP19]], float [[TMP20]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float 0x7FF8000000000000) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.maxnum.f32(float [[TMP9]], float [[TMP10]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.maxnum.f32(float [[TMP11]], float [[TMP12]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.maxnum.f32(float [[TMP13]], float [[TMP14]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.maxnum.f32(float [[TMP15]], float [[TMP16]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.maxnum.f32(float [[TMP17]], float [[TMP18]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.maxnum.f32(float [[TMP19]], float [[TMP20]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -699,9 +625,9 @@ define amdgpu_ps float @global_atomic_fmax_uni_address_div_value_agent_scope_uns
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP30]], float [[TMP31]], metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = call float @llvm.maxnum.f32(float [[TMP30]], float [[TMP31]]) #[[ATTR10]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], float [[TMP30]], float [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -713,92 +639,63 @@ define amdgpu_ps float @global_atomic_fmax_uni_address_div_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL]], float [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP18]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
-; IR-ITERATIVE-NEXT:    br label [[TMP23]]
-; IR-ITERATIVE:       23:
-; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-ITERATIVE-NEXT:    ret float [[TMP24]]
-;
-; IR-DPP-LABEL: @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL]], float [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP18]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
-; IR-DPP-NEXT:    br label [[TMP23]]
-; IR-DPP:       23:
-; IR-DPP-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-DPP-NEXT:    ret float [[TMP24]]
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to float
+; IR-NEXT:    [[TMP12:%.*]] = fmul float [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    [[TMP17:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
+; IR-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP17]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP19:%.*]] = uitofp i32 [[TMP8]] to float
+; IR-NEXT:    [[TMP20:%.*]] = fmul float [[VAL]], [[TMP19]]
+; IR-NEXT:    [[TMP21:%.*]] = fadd float [[TMP18]], [[TMP20]]
+; IR-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], float [[TMP18]], float [[TMP21]]
+; IR-NEXT:    br label [[TMP23]]
+; IR:       23:
+; IR-NEXT:    [[TMP24:%.*]] = phi float [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
+; IR-NEXT:    ret float [[TMP24]]
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float %val monotonic, align 4
   ret float %result
 }
 
 define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, float %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_div_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP23:%.*]] monotonic, align 4
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi float [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP14]], float [[TMP22:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = fadd float [[TMP14]], [[TMP22:%.*]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], float [[TMP14]], float [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -808,11 +705,11 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_system_scope_st
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ -0.000000e+00, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call float @llvm.amdgcn.writelane.f32(float [[ACCUMULATOR]], i32 [[TMP20]], float [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = fadd float [[ACCUMULATOR]], [[TMP21]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -822,32 +719,33 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_system_scope_st
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fadd_uni_address_div_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP9]], float [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP11]], float [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP13]], float [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP15]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP17]], float [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP19]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd float [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd float [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd float [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd float [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -855,9 +753,9 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_system_scope_st
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi float [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP30]], float [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = fadd float [[TMP30]], [[TMP31]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], float [[TMP30]], float [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -869,6 +767,7 @@ define amdgpu_ps float @global_atomic_fadd_uni_address_div_value_system_scope_st
 }
 
 define amdgpu_ps float @global_atomic_fadd_div_address_uni_value_agent_scope_unsafe(ptr addrspace(1) %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -878,6 +777,7 @@ define amdgpu_ps float @global_atomic_fadd_div_address_uni_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fadd_div_address_div_value_agent_scope_unsafe(ptr addrspace(1) %ptr, float %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_div_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -887,6 +787,7 @@ define amdgpu_ps float @global_atomic_fadd_div_address_div_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fadd_div_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float inreg %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_uni_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("one-as") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -896,6 +797,7 @@ define amdgpu_ps float @global_atomic_fadd_div_address_uni_value_one_as_scope_un
 }
 
 define amdgpu_ps float @global_atomic_fadd_div_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_div_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("one-as") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -905,6 +807,7 @@ define amdgpu_ps float @global_atomic_fadd_div_address_div_value_one_as_scope_un
 }
 
 define amdgpu_ps float @global_atomic_fsub_div_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr, float inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_div_address_uni_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -914,6 +817,7 @@ define amdgpu_ps float @global_atomic_fsub_div_address_uni_value_agent_scope_str
 }
 
 define amdgpu_ps float @global_atomic_fsub_div_address_div_value_agent_scope_strictfp(ptr addrspace(1) %ptr, float %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_div_address_div_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -923,6 +827,7 @@ define amdgpu_ps float @global_atomic_fsub_div_address_div_value_agent_scope_str
 }
 
 define amdgpu_ps float @global_atomic_fmin_div_address_uni_value_agent_scope(ptr addrspace(1) %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_div_address_uni_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -932,6 +837,7 @@ define amdgpu_ps float @global_atomic_fmin_div_address_uni_value_agent_scope(ptr
 }
 
 define amdgpu_ps float @global_atomic_fmin_div_address_div_value_agent_scope(ptr addrspace(1) %ptr, float %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_div_address_div_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -941,6 +847,7 @@ define amdgpu_ps float @global_atomic_fmin_div_address_div_value_agent_scope(ptr
 }
 
 define amdgpu_ps float @global_atomic_fmax_div_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float inreg %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmax_div_address_uni_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -950,6 +857,7 @@ define amdgpu_ps float @global_atomic_fmax_div_address_uni_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fmax_div_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmax_div_address_div_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -959,6 +867,7 @@ define amdgpu_ps float @global_atomic_fmax_div_address_div_value_agent_scope_uns
 }
 
 define amdgpu_ps float @global_atomic_fadd_div_address_uni_value_system_scope_strictfp(ptr addrspace(1) %ptr, float inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_div_address_uni_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -968,6 +877,7 @@ define amdgpu_ps float @global_atomic_fadd_div_address_uni_value_system_scope_st
 }
 
 define amdgpu_ps float @global_atomic_fadd_div_address_div_value_system_scope_strictfp(ptr addrspace(1) %ptr, float %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_div_address_div_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret float [[RESULT]]
@@ -977,6 +887,7 @@ define amdgpu_ps float @global_atomic_fadd_div_address_div_value_system_scope_st
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
@@ -1013,6 +924,7 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_uni_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_scope_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
@@ -1055,6 +967,7 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_scope_a
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
@@ -1102,92 +1015,63 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_scope_a
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #1 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("one-as") monotonic, align 8
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL]], double [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
-; IR-ITERATIVE-NEXT:    br label [[TMP23]]
-; IR-ITERATIVE:       23:
-; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-ITERATIVE-NEXT:    ret double [[TMP24]]
-;
-; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("one-as") monotonic, align 8
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL]], double [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
-; IR-DPP-NEXT:    br label [[TMP23]]
-; IR-DPP:       23:
-; IR-DPP-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-DPP-NEXT:    ret double [[TMP24]]
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to double
+; IR-NEXT:    [[TMP12:%.*]] = fmul double [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("one-as") monotonic, align 8
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
+; IR-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP19:%.*]] = uitofp i32 [[TMP8]] to double
+; IR-NEXT:    [[TMP20:%.*]] = fmul double [[VAL]], [[TMP19]]
+; IR-NEXT:    [[TMP21:%.*]] = fadd double [[TMP18]], [[TMP20]]
+; IR-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
+; IR-NEXT:    br label [[TMP23]]
+; IR:       23:
+; IR-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
+; IR-NEXT:    ret double [[TMP24]]
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, double %val syncscope("one-as") monotonic
   ret double %result
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double %val) #1 {
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP23:%.*]] syncscope("one-as") monotonic, align 8
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP14]], double [[TMP22:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = fadd double [[TMP14]], [[TMP22:%.*]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], double [[TMP14]], double [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -1197,11 +1081,11 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_one_as_
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ -0.000000e+00, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call double @llvm.experimental.constrained.fadd.f64(double [[ACCUMULATOR]], double [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = fadd double [[ACCUMULATOR]], [[TMP21]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -1211,32 +1095,33 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_one_as_
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP9]], double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP11]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP13]], double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP15]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP17]], double [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP19]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd double [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd double [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd double [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd double [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd double [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -1244,9 +1129,9 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_one_as_
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP30]], double [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = fadd double [[TMP30]], [[TMP31]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], double [[TMP30]], double [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -1258,92 +1143,63 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_one_as_
 }
 
 define amdgpu_ps double @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("agent") monotonic, align 8
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL]], double [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
-; IR-ITERATIVE-NEXT:    br label [[TMP23]]
-; IR-ITERATIVE:       23:
-; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-ITERATIVE-NEXT:    ret double [[TMP24]]
-;
-; IR-DPP-LABEL: @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("agent") monotonic, align 8
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL]], double [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
-; IR-DPP-NEXT:    br label [[TMP23]]
-; IR-DPP:       23:
-; IR-DPP-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-DPP-NEXT:    ret double [[TMP24]]
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to double
+; IR-NEXT:    [[TMP12:%.*]] = fmul double [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("agent") monotonic, align 8
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
+; IR-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP19:%.*]] = uitofp i32 [[TMP8]] to double
+; IR-NEXT:    [[TMP20:%.*]] = fmul double [[VAL]], [[TMP19]]
+; IR-NEXT:    [[TMP21:%.*]] = fadd double [[TMP18]], [[TMP20]]
+; IR-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
+; IR-NEXT:    br label [[TMP23]]
+; IR:       23:
+; IR-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
+; IR-NEXT:    ret double [[TMP24]]
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, double %val syncscope("agent") monotonic
   ret double %result
 }
 
 define amdgpu_ps double @global_atomic_fsub_double_uni_address_div_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, double %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fsub_double_uni_address_div_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], double [[TMP23:%.*]] syncscope("agent") monotonic, align 8
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP14]], double [[TMP22:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = fsub double [[TMP14]], [[TMP22:%.*]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], double [[TMP14]], double [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -1353,11 +1209,11 @@ define amdgpu_ps double @global_atomic_fsub_double_uni_address_div_value_agent_s
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ -0.000000e+00, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call double @llvm.experimental.constrained.fadd.f64(double [[ACCUMULATOR]], double [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = fadd double [[ACCUMULATOR]], [[TMP21]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -1367,32 +1223,33 @@ define amdgpu_ps double @global_atomic_fsub_double_uni_address_div_value_agent_s
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fsub_double_uni_address_div_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP9]], double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP11]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP13]], double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP15]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP17]], double [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP19]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd double [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd double [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd double [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd double [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd double [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -1400,9 +1257,9 @@ define amdgpu_ps double @global_atomic_fsub_double_uni_address_div_value_agent_s
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP30]], double [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = fsub double [[TMP30]], [[TMP31]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], double [[TMP30]], double [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -1414,6 +1271,7 @@ define amdgpu_ps double @global_atomic_fsub_double_uni_address_div_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fmin_double_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_double_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
@@ -1446,6 +1304,7 @@ define amdgpu_ps double @global_atomic_fmin_double_uni_address_uni_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fmin_double_uni_address_div_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmin_double_uni_address_div_value_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
@@ -1488,6 +1347,7 @@ define amdgpu_ps double @global_atomic_fmin_double_uni_address_div_value_agent_s
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmin_double_uni_address_div_value_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
@@ -1535,84 +1395,59 @@ define amdgpu_ps double @global_atomic_fmin_double_uni_address_div_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic__fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #1{
-; IR-ITERATIVE-LABEL: @global_atomic__fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-ITERATIVE:       10:
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
-; IR-ITERATIVE-NEXT:    br label [[TMP12]]
-; IR-ITERATIVE:       12:
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP11]], [[TMP10]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP9]], double 0x7FF8000000000000, double [[VAL]]
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP14]], double [[TMP16]], metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = select i1 [[TMP9]], double [[TMP14]], double [[TMP17]]
-; IR-ITERATIVE-NEXT:    br label [[TMP19]]
-; IR-ITERATIVE:       19:
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP18]], [[TMP12]] ]
-; IR-ITERATIVE-NEXT:    ret double [[TMP20]]
-;
-; IR-DPP-LABEL: @global_atomic__fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-DPP:       10:
-; IR-DPP-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
-; IR-DPP-NEXT:    br label [[TMP12]]
-; IR-DPP:       12:
-; IR-DPP-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP11]], [[TMP10]] ]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = select i1 [[TMP9]], double 0x7FF8000000000000, double [[VAL]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP14]], double [[TMP16]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = select i1 [[TMP9]], double [[TMP14]], double [[TMP17]]
-; IR-DPP-NEXT:    br label [[TMP19]]
-; IR-DPP:       19:
-; IR-DPP-NEXT:    [[TMP20:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP18]], [[TMP12]] ]
-; IR-DPP-NEXT:    ret double [[TMP20]]
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic__fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP19:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
+; IR:       10:
+; IR-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
+; IR-NEXT:    br label [[TMP12]]
+; IR:       12:
+; IR-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP11]], [[TMP10]] ]
+; IR-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP15:%.*]] = uitofp i32 [[TMP8]] to double
+; IR-NEXT:    [[TMP16:%.*]] = select i1 [[TMP9]], double 0x7FF8000000000000, double [[VAL]]
+; IR-NEXT:    [[TMP17:%.*]] = call double @llvm.maxnum.f64(double [[TMP14]], double [[TMP16]]) #[[ATTR10]]
+; IR-NEXT:    [[TMP18:%.*]] = select i1 [[TMP9]], double [[TMP14]], double [[TMP17]]
+; IR-NEXT:    br label [[TMP19]]
+; IR:       19:
+; IR-NEXT:    [[TMP20:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP18]], [[TMP12]] ]
+; IR-NEXT:    ret double [[TMP20]]
 ;
   %result = atomicrmw fmax ptr addrspace(1) %ptr, double %val syncscope("agent") monotonic
   ret double %result
 }
 
 define amdgpu_ps double @global_atomic__fmax_double_uni_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double %val) #1{
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic__fmax_double_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[TMP23:%.*]] syncscope("agent") monotonic, align 8
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP14]], double [[TMP22:%.*]], metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call double @llvm.maxnum.f64(double [[TMP14]], double [[TMP22:%.*]]) #[[ATTR10]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], double [[TMP14]], double [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -1622,11 +1457,11 @@ define amdgpu_ps double @global_atomic__fmax_double_uni_address_div_value_agent_
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ 0x7FF8000000000000, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call double @llvm.experimental.constrained.maxnum.f64(double [[ACCUMULATOR]], double [[TMP21]], metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = call double @llvm.maxnum.f64(double [[ACCUMULATOR]], double [[TMP21]]) #[[ATTR10]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -1636,32 +1471,33 @@ define amdgpu_ps double @global_atomic__fmax_double_uni_address_div_value_agent_
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic__fmax_double_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double 0x7FF8000000000000) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP9]], double [[TMP10]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP11]], double [[TMP12]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP13]], double [[TMP14]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP15]], double [[TMP16]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP17]], double [[TMP18]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP19]], double [[TMP20]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double 0x7FF8000000000000) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.maxnum.f64(double [[TMP9]], double [[TMP10]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.maxnum.f64(double [[TMP11]], double [[TMP12]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.maxnum.f64(double [[TMP13]], double [[TMP14]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.maxnum.f64(double [[TMP15]], double [[TMP16]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.maxnum.f64(double [[TMP17]], double [[TMP18]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.maxnum.f64(double [[TMP19]], double [[TMP20]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -1669,9 +1505,9 @@ define amdgpu_ps double @global_atomic__fmax_double_uni_address_div_value_agent_
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP30]], double [[TMP31]], metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = call double @llvm.maxnum.f64(double [[TMP30]], double [[TMP31]]) #[[ATTR10]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], double [[TMP30]], double [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -1683,92 +1519,63 @@ define amdgpu_ps double @global_atomic__fmax_double_uni_address_div_value_agent_
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL]], double [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
-; IR-ITERATIVE-NEXT:    br label [[TMP23]]
-; IR-ITERATIVE:       23:
-; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-ITERATIVE-NEXT:    ret double [[TMP24]]
-;
-; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL]], double [[TMP19]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
-; IR-DPP-NEXT:    br label [[TMP23]]
-; IR-DPP:       23:
-; IR-DPP-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
-; IR-DPP-NEXT:    ret double [[TMP24]]
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP23:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to double
+; IR-NEXT:    [[TMP12:%.*]] = fmul double [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    [[TMP17:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP15]], [[TMP14]] ]
+; IR-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP17]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP19:%.*]] = uitofp i32 [[TMP8]] to double
+; IR-NEXT:    [[TMP20:%.*]] = fmul double [[VAL]], [[TMP19]]
+; IR-NEXT:    [[TMP21:%.*]] = fadd double [[TMP18]], [[TMP20]]
+; IR-NEXT:    [[TMP22:%.*]] = select i1 [[TMP13]], double [[TMP18]], double [[TMP21]]
+; IR-NEXT:    br label [[TMP23]]
+; IR:       23:
+; IR-NEXT:    [[TMP24:%.*]] = phi double [ poison, [[TMP0:%.*]] ], [ [[TMP22]], [[TMP16]] ]
+; IR-NEXT:    ret double [[TMP24]]
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, double %val monotonic, align 4
   ret double %result
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, double %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_div_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP23:%.*]] monotonic, align 4
 ; IR-ITERATIVE-NEXT:    br label [[TMP12:%.*]]
 ; IR-ITERATIVE:       12:
 ; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = phi double [ poison, [[COMPUTEEND:%.*]] ], [ [[TMP11]], [[TMP10:%.*]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP14]], double [[TMP22:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP13]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = fadd double [[TMP14]], [[TMP22:%.*]]
 ; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = select i1 [[TMP28:%.*]], double [[TMP14]], double [[TMP15]]
 ; IR-ITERATIVE-NEXT:    br label [[TMP17]]
 ; IR-ITERATIVE:       17:
@@ -1778,11 +1585,11 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_system_
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ -0.000000e+00, [[TMP2]] ], [ [[TMP23]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[OLDVALUEPHI:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP22]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP26:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP19]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP23]] = call double @llvm.experimental.constrained.fadd.f64(double [[ACCUMULATOR]], double [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP21:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP20]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP22]] = call double @llvm.amdgcn.writelane.f64(double [[ACCUMULATOR]], i32 [[TMP20]], double [[OLDVALUEPHI]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP23]] = fadd double [[ACCUMULATOR]], [[TMP21]]
 ; IR-ITERATIVE-NEXT:    [[TMP24:%.*]] = shl i64 1, [[TMP19]]
 ; IR-ITERATIVE-NEXT:    [[TMP25:%.*]] = xor i64 [[TMP24]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP26]] = and i64 [[ACTIVEBITS]], [[TMP25]]
@@ -1792,32 +1599,33 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_system_
 ; IR-ITERATIVE-NEXT:    [[TMP28]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP28]], label [[TMP10]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_div_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP34:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP9]], double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP11]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP13]], double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP15]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP17]], double [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP19]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd double [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd double [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd double [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd double [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd double [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP21]], i32 312, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP24:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP23]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP25:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP25]], label [[TMP26:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       26:
@@ -1825,9 +1633,9 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_system_
 ; IR-DPP-NEXT:    br label [[TMP28]]
 ; IR-DPP:       28:
 ; IR-DPP-NEXT:    [[TMP29:%.*]] = phi double [ poison, [[TMP2]] ], [ [[TMP27]], [[TMP26]] ]
-; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP32:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP30]], double [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP30:%.*]] = call double @llvm.amdgcn.readfirstlane.f64(double [[TMP29]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP31:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP32:%.*]] = fadd double [[TMP30]], [[TMP31]]
 ; IR-DPP-NEXT:    [[TMP33:%.*]] = select i1 [[TMP25]], double [[TMP30]], double [[TMP32]]
 ; IR-DPP-NEXT:    br label [[TMP34]]
 ; IR-DPP:       34:
@@ -1839,6 +1647,7 @@ define amdgpu_ps double @global_atomic_fadd_double_uni_address_div_value_system_
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_div_address_uni_value_agent_scope_unsafe(ptr addrspace(1) %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1848,6 +1657,7 @@ define amdgpu_ps double @global_atomic_fadd_double_div_address_uni_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_div_address_div_value_agent_scope_unsafe(ptr addrspace(1) %ptr, double %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_div_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1857,6 +1667,7 @@ define amdgpu_ps double @global_atomic_fadd_double_div_address_div_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_div_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double inreg %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_uni_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("one-as") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1866,6 +1677,7 @@ define amdgpu_ps double @global_atomic_fadd_double_div_address_uni_value_one_as_
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_div_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_div_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("one-as") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1875,6 +1687,7 @@ define amdgpu_ps double @global_atomic_fadd_double_div_address_div_value_one_as_
 }
 
 define amdgpu_ps double @global_atomic_fsub_double_div_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr, double inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_double_div_address_uni_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1884,6 +1697,7 @@ define amdgpu_ps double @global_atomic_fsub_double_div_address_uni_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fsub_double_div_address_div_value_agent_scope_strictfp(ptr addrspace(1) %ptr, double %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_double_div_address_div_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1893,6 +1707,7 @@ define amdgpu_ps double @global_atomic_fsub_double_div_address_div_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fmin_double_div_address_uni_value_agent_scope(ptr addrspace(1) %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_double_div_address_uni_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1902,6 +1717,7 @@ define amdgpu_ps double @global_atomic_fmin_double_div_address_uni_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic_fmin_double_div_address_div_value_agent_scope(ptr addrspace(1) %ptr, double %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_double_div_address_div_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1911,6 +1727,7 @@ define amdgpu_ps double @global_atomic_fmin_double_div_address_div_value_agent_s
 }
 
 define amdgpu_ps double @global_atomic__fmax_double_div_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double inreg %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic__fmax_double_div_address_uni_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1920,6 +1737,7 @@ define amdgpu_ps double @global_atomic__fmax_double_div_address_uni_value_agent_
 }
 
 define amdgpu_ps double @global_atomic__fmax_double_div_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic__fmax_double_div_address_div_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1929,6 +1747,7 @@ define amdgpu_ps double @global_atomic__fmax_double_div_address_div_value_agent_
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_div_address_uni_value_system_scope_strictfp(ptr addrspace(1) %ptr, double inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_double_div_address_uni_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret double [[RESULT]]
@@ -1938,6 +1757,7 @@ define amdgpu_ps double @global_atomic_fadd_double_div_address_uni_value_system_
 }
 
 define amdgpu_ps double @global_atomic_fadd_double_div_address_div_value_system_scope_strictfp(ptr addrspace(1) %ptr, double %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_double_div_address_div_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret double [[RESULT]]
diff --git a/llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll b/llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll
index 8587ab3fdb3af..36d95587f0169 100644
--- a/llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll
+++ b/llvm/test/CodeGen/AMDGPU/global_atomics_optimizer_fp_no_rtn.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes
 ; RUN:  opt -S -mtriple=amdgcn-- -mcpu=gfx906 -passes='amdgpu-atomic-optimizer<strategy=iterative>,verify<domtree>' %s | FileCheck --check-prefixes=IR,IR-ITERATIVE %s
 ; RUN:  opt -S -mtriple=amdgcn-- -mcpu=gfx906 -passes='amdgpu-atomic-optimizer<strategy=dpp>,verify<domtree>' %s | FileCheck --check-prefixes=IR,IR-DPP %s
 
@@ -7,6 +7,7 @@
 ; strategies are valid for only divergent values. This optimization is valid for divergent addresses. Test also covers different scopes.
 
 define amdgpu_ps void @global_atomic_fadd_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
@@ -36,6 +37,7 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_uni_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_scope_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
@@ -71,6 +73,7 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_scope_agent_scop
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
@@ -111,70 +114,48 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_scope_agent_scop
 }
 
 define amdgpu_ps void @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #1 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7:[0-9]+]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("one-as") monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    br label [[TMP17]]
-; IR-ITERATIVE:       17:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8:[0-9]+]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("one-as") monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    br label [[TMP17]]
-; IR-DPP:       17:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9:[0-9]+]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to float
+; IR-NEXT:    [[TMP12:%.*]] = fmul float [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("one-as") monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    br label [[TMP17]]
+; IR:       17:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float %val syncscope("one-as") monotonic
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float %val) #1 {
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP17:%.*]] syncscope("one-as") monotonic, align 4
@@ -186,10 +167,10 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_one_as_scope_uns
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ -0.000000e+00, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = fadd float [[ACCUMULATOR]], [[TMP16]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -199,31 +180,32 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_one_as_scope_uns
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP9]], float [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP11]], float [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP13]], float [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP15]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP17]], float [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP19]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd float [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd float [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd float [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd float [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -239,70 +221,48 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_one_as_scope_uns
 }
 
 define amdgpu_ps void @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("agent") monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    br label [[TMP17]]
-; IR-ITERATIVE:       17:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("agent") monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    br label [[TMP17]]
-; IR-DPP:       17:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to float
+; IR-NEXT:    [[TMP12:%.*]] = fmul float [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] syncscope("agent") monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    br label [[TMP17]]
+; IR:       17:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float %val syncscope("agent") monotonic
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fsub_uni_address_div_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, float %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fsub_uni_address_div_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], float [[TMP17:%.*]] syncscope("agent") monotonic, align 4
@@ -314,10 +274,10 @@ define amdgpu_ps void @global_atomic_fsub_uni_address_div_value_agent_scope_stri
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ -0.000000e+00, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = fadd float [[ACCUMULATOR]], [[TMP16]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -327,31 +287,32 @@ define amdgpu_ps void @global_atomic_fsub_uni_address_div_value_agent_scope_stri
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fsub_uni_address_div_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP9]], float [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP11]], float [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP13]], float [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP15]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP17]], float [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP19]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd float [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd float [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd float [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd float [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -367,6 +328,7 @@ define amdgpu_ps void @global_atomic_fsub_uni_address_div_value_agent_scope_stri
 }
 
 define amdgpu_ps void @global_atomic_fmin_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
@@ -392,6 +354,7 @@ define amdgpu_ps void @global_atomic_fmin_uni_address_uni_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fmin_uni_address_div_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, float %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmin_uni_address_div_value_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
@@ -427,6 +390,7 @@ define amdgpu_ps void @global_atomic_fmin_uni_address_div_value_agent_scope_unsa
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmin_uni_address_div_value_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
@@ -467,62 +431,44 @@ define amdgpu_ps void @global_atomic_fmin_uni_address_div_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #1{
-; IR-ITERATIVE-LABEL: @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-ITERATIVE:       10:
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP12]]
-; IR-ITERATIVE:       12:
-; IR-ITERATIVE-NEXT:    br label [[TMP13]]
-; IR-ITERATIVE:       13:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-DPP:       10:
-; IR-DPP-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP12]]
-; IR-DPP:       12:
-; IR-DPP-NEXT:    br label [[TMP13]]
-; IR-DPP:       13:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fmax_uni_address_uni_value_agent_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
+; IR:       10:
+; IR-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
+; IR-NEXT:    br label [[TMP12]]
+; IR:       12:
+; IR-NEXT:    br label [[TMP13]]
+; IR:       13:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fmax ptr addrspace(1) %ptr, float %val syncscope("agent") monotonic
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fmax_uni_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, float %val) #1{
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmax_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[TMP17:%.*]] syncscope("agent") monotonic, align 4
@@ -534,10 +480,10 @@ define amdgpu_ps void @global_atomic_fmax_uni_address_div_value_agent_scope_unsa
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ 0x7FF8000000000000, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call float @llvm.experimental.constrained.maxnum.f32(float [[ACCUMULATOR]], float [[TMP16]], metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = call float @llvm.maxnum.f32(float [[ACCUMULATOR]], float [[TMP16]]) #[[ATTR10:[0-9]+]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -547,31 +493,32 @@ define amdgpu_ps void @global_atomic_fmax_uni_address_div_value_agent_scope_unsa
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmax_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float 0x7FF8000000000000) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP9]], float [[TMP10]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP11]], float [[TMP12]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP13]], float [[TMP14]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP15]], float [[TMP16]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP17]], float [[TMP18]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[TMP19]], float [[TMP20]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float 0x7FF8000000000000) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.maxnum.f32(float [[TMP9]], float [[TMP10]]) #[[ATTR10:[0-9]+]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.maxnum.f32(float [[TMP11]], float [[TMP12]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.maxnum.f32(float [[TMP13]], float [[TMP14]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.maxnum.f32(float [[TMP15]], float [[TMP16]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.maxnum.f32(float [[TMP17]], float [[TMP18]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float 0x7FF8000000000000, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.maxnum.f32(float [[TMP19]], float [[TMP20]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -587,70 +534,48 @@ define amdgpu_ps void @global_atomic_fmax_uni_address_div_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, float inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    br label [[TMP17]]
-; IR-ITERATIVE:       17:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[VAL:%.*]], float [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    br label [[TMP17]]
-; IR-DPP:       17:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fadd_uni_address_uni_value_system_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to float
+; IR-NEXT:    [[TMP12:%.*]] = fmul float [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP12]] monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    br label [[TMP17]]
+; IR:       17:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float %val monotonic, align 4
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, float %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_uni_address_div_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[TMP17:%.*]] monotonic, align 4
@@ -662,10 +587,10 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_system_scope_str
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi float [ -0.000000e+00, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call float @llvm.experimental.constrained.fadd.f32(float [[ACCUMULATOR]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = fadd float [[ACCUMULATOR]], [[TMP16]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -675,31 +600,32 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_system_scope_str
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fadd_uni_address_div_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP9]], float [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP11]], float [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP13]], float [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP15]], float [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP17]], float [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[TMP19]], float [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call float @llvm.amdgcn.set.inactive.f32(float [[VAL:%.*]], float -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd float [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd float [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd float [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call float @llvm.amdgcn.update.dpp.f32(float -0.000000e+00, float [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd float [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call float @llvm.amdgcn.strict.wwm.f32(float [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -715,6 +641,7 @@ define amdgpu_ps void @global_atomic_fadd_uni_address_div_value_system_scope_str
 }
 
 define amdgpu_ps void @global_atomic_fadd_div_address_uni_value_agent_scope_unsafe(ptr addrspace(1) %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -724,6 +651,7 @@ define amdgpu_ps void @global_atomic_fadd_div_address_uni_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fadd_div_address_div_value_agent_scope_unsafe(ptr addrspace(1) %ptr, float %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_div_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -733,6 +661,7 @@ define amdgpu_ps void @global_atomic_fadd_div_address_div_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fadd_div_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float inreg %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_uni_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("one-as") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -742,6 +671,7 @@ define amdgpu_ps void @global_atomic_fadd_div_address_uni_value_one_as_scope_uns
 }
 
 define amdgpu_ps void @global_atomic_fadd_div_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_div_address_div_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("one-as") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -751,6 +681,7 @@ define amdgpu_ps void @global_atomic_fadd_div_address_div_value_one_as_scope_uns
 }
 
 define amdgpu_ps void @global_atomic_fsub_div_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr, float inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_div_address_uni_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -760,6 +691,7 @@ define amdgpu_ps void @global_atomic_fsub_div_address_uni_value_agent_scope_stri
 }
 
 define amdgpu_ps void @global_atomic_fsub_div_address_div_value_agent_scope_strictfp(ptr addrspace(1) %ptr, float %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_div_address_div_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -769,6 +701,7 @@ define amdgpu_ps void @global_atomic_fsub_div_address_div_value_agent_scope_stri
 }
 
 define amdgpu_ps void @global_atomic_fmin_div_address_uni_value_agent_scope(ptr addrspace(1) %ptr, float inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_div_address_uni_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -778,6 +711,7 @@ define amdgpu_ps void @global_atomic_fmin_div_address_uni_value_agent_scope(ptr
 }
 
 define amdgpu_ps void @global_atomic_fmin_div_address_div_value_agent_scope(ptr addrspace(1) %ptr, float %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_div_address_div_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -787,6 +721,7 @@ define amdgpu_ps void @global_atomic_fmin_div_address_div_value_agent_scope(ptr
 }
 
 define amdgpu_ps void @global_atomic_fmax_div_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float inreg %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmax_div_address_uni_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -796,6 +731,7 @@ define amdgpu_ps void @global_atomic_fmax_div_address_uni_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fmax_div_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, float %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmax_div_address_div_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -805,6 +741,7 @@ define amdgpu_ps void @global_atomic_fmax_div_address_div_value_agent_scope_unsa
 }
 
 define amdgpu_ps void @global_atomic_fadd_div_address_uni_value_system_scope_strictfp(ptr addrspace(1) %ptr, float inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_div_address_uni_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret void
@@ -814,6 +751,7 @@ define amdgpu_ps void @global_atomic_fadd_div_address_uni_value_system_scope_str
 }
 
 define amdgpu_ps void @global_atomic_fadd_div_address_div_value_system_scope_strictfp(ptr addrspace(1) %ptr, float %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_div_address_div_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret void
@@ -823,6 +761,7 @@ define amdgpu_ps void @global_atomic_fadd_div_address_div_value_system_scope_str
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
@@ -852,6 +791,7 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_uni_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_scope_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
@@ -887,6 +827,7 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_scope_age
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_div_value_scope_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
@@ -927,70 +868,48 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_scope_age
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #1 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("one-as") monotonic, align 8
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    br label [[TMP17]]
-; IR-ITERATIVE:       17:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("one-as") monotonic, align 8
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    br label [[TMP17]]
-; IR-DPP:       17:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to double
+; IR-NEXT:    [[TMP12:%.*]] = fmul double [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("one-as") monotonic, align 8
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    br label [[TMP17]]
+; IR:       17:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, double %val syncscope("one-as") monotonic
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double %val) #1 {
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP17:%.*]] syncscope("one-as") monotonic, align 8
@@ -1002,10 +921,10 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_one_as_sc
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ -0.000000e+00, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call double @llvm.experimental.constrained.fadd.f64(double [[ACCUMULATOR]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = fadd double [[ACCUMULATOR]], [[TMP16]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -1015,31 +934,32 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_one_as_sc
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_div_value_one_as_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP9]], double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP11]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP13]], double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP15]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP17]], double [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP19]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd double [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd double [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd double [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd double [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd double [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -1055,70 +975,48 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_one_as_sc
 }
 
 define amdgpu_ps void @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("agent") monotonic, align 8
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    br label [[TMP17]]
-; IR-ITERATIVE:       17:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("agent") monotonic, align 8
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    br label [[TMP17]]
-; IR-DPP:       17:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to double
+; IR-NEXT:    [[TMP12:%.*]] = fmul double [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] syncscope("agent") monotonic, align 8
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    br label [[TMP17]]
+; IR:       17:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, double %val syncscope("agent") monotonic
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fsub_double_uni_address_div_value_agent_scope_strictfp(ptr addrspace(1) inreg %ptr, double %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fsub_double_uni_address_div_value_agent_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], double [[TMP17:%.*]] syncscope("agent") monotonic, align 8
@@ -1130,10 +1028,10 @@ define amdgpu_ps void @global_atomic_fsub_double_uni_address_div_value_agent_sco
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ -0.000000e+00, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call double @llvm.experimental.constrained.fadd.f64(double [[ACCUMULATOR]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = fadd double [[ACCUMULATOR]], [[TMP16]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -1143,31 +1041,32 @@ define amdgpu_ps void @global_atomic_fsub_double_uni_address_div_value_agent_sco
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fsub_double_uni_address_div_value_agent_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP9]], double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP11]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP13]], double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP15]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP17]], double [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP19]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd double [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd double [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd double [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd double [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd double [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -1183,6 +1082,7 @@ define amdgpu_ps void @global_atomic_fsub_double_uni_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmin_double_uni_address_uni_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_double_uni_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
@@ -1208,6 +1108,7 @@ define amdgpu_ps void @global_atomic_fmin_double_uni_address_uni_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmin_double_uni_address_div_value_agent_scope_unsafe(ptr addrspace(1) inreg %ptr, double %val) #0 {
+; IR-ITERATIVE: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmin_double_uni_address_div_value_agent_scope_unsafe(
 ; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
@@ -1243,6 +1144,7 @@ define amdgpu_ps void @global_atomic_fmin_double_uni_address_div_value_agent_sco
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmin_double_uni_address_div_value_agent_scope_unsafe(
 ; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live()
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
@@ -1283,62 +1185,44 @@ define amdgpu_ps void @global_atomic_fmin_double_uni_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #1{
-; IR-ITERATIVE-LABEL: @global_atomic_fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-ITERATIVE:       10:
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
-; IR-ITERATIVE-NEXT:    br label [[TMP12]]
-; IR-ITERATIVE:       12:
-; IR-ITERATIVE-NEXT:    br label [[TMP13]]
-; IR-ITERATIVE:       13:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
-; IR-DPP:       10:
-; IR-DPP-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
-; IR-DPP-NEXT:    br label [[TMP12]]
-; IR-DPP:       12:
-; IR-DPP-NEXT:    br label [[TMP13]]
-; IR-DPP:       13:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
+; IR-LABEL: @global_atomic_fmax_double_uni_address_uni_value_agent_scope_unsafe_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP9]], label [[TMP10:%.*]], label [[TMP12:%.*]]
+; IR:       10:
+; IR-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
+; IR-NEXT:    br label [[TMP12]]
+; IR:       12:
+; IR-NEXT:    br label [[TMP13]]
+; IR:       13:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fmax ptr addrspace(1) %ptr, double %val syncscope("agent") monotonic
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fmax_double_uni_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) inreg %ptr, double %val) #1{
+; IR-ITERATIVE: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-ITERATIVE-LABEL: @global_atomic_fmax_double_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[TMP17:%.*]] syncscope("agent") monotonic, align 8
@@ -1350,10 +1234,10 @@ define amdgpu_ps void @global_atomic_fmax_double_uni_address_div_value_agent_sco
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ 0x7FF8000000000000, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call double @llvm.experimental.constrained.maxnum.f64(double [[ACCUMULATOR]], double [[TMP16]], metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = call double @llvm.maxnum.f64(double [[ACCUMULATOR]], double [[TMP16]]) #[[ATTR10]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -1363,31 +1247,32 @@ define amdgpu_ps void @global_atomic_fmax_double_uni_address_div_value_agent_sco
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-DPP-LABEL: @global_atomic_fmax_double_uni_address_div_value_agent_scope_unsafe_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double 0x7FF8000000000000) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP9]], double [[TMP10]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP11]], double [[TMP12]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP13]], double [[TMP14]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP15]], double [[TMP16]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP17]], double [[TMP18]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[TMP19]], double [[TMP20]], metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double 0x7FF8000000000000) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.maxnum.f64(double [[TMP9]], double [[TMP10]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.maxnum.f64(double [[TMP11]], double [[TMP12]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.maxnum.f64(double [[TMP13]], double [[TMP14]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.maxnum.f64(double [[TMP15]], double [[TMP16]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.maxnum.f64(double [[TMP17]], double [[TMP18]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double 0x7FF8000000000000, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.maxnum.f64(double [[TMP19]], double [[TMP20]]) #[[ATTR10]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -1403,70 +1288,48 @@ define amdgpu_ps void @global_atomic_fmax_double_uni_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, double inreg %val) #2 {
-; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-ITERATIVE-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-ITERATIVE:       14:
-; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] monotonic, align 4
-; IR-ITERATIVE-NEXT:    br label [[TMP16]]
-; IR-ITERATIVE:       16:
-; IR-ITERATIVE-NEXT:    br label [[TMP17]]
-; IR-ITERATIVE:       17:
-; IR-ITERATIVE-NEXT:    ret void
-;
-; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
-; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
-; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
-; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
-; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[VAL:%.*]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
-; IR-DPP-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
-; IR-DPP:       14:
-; IR-DPP-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] monotonic, align 4
-; IR-DPP-NEXT:    br label [[TMP16]]
-; IR-DPP:       16:
-; IR-DPP-NEXT:    br label [[TMP17]]
-; IR-DPP:       17:
-; IR-DPP-NEXT:    ret void
+; IR: Function Attrs: strictfp
+; IR-LABEL: @global_atomic_fadd_double_uni_address_uni_value_system_scope_strictfp(
+; IR-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
+; IR-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP17:%.*]]
+; IR:       2:
+; IR-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
+; IR-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; IR-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
+; IR-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
+; IR-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP9:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP3]]) #[[ATTR9]]
+; IR-NEXT:    [[TMP10:%.*]] = trunc i64 [[TMP9]] to i32
+; IR-NEXT:    [[TMP11:%.*]] = uitofp i32 [[TMP10]] to double
+; IR-NEXT:    [[TMP12:%.*]] = fmul double [[VAL:%.*]], [[TMP11]]
+; IR-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[TMP8]], 0
+; IR-NEXT:    br i1 [[TMP13]], label [[TMP14:%.*]], label [[TMP16:%.*]]
+; IR:       14:
+; IR-NEXT:    [[TMP15:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP12]] monotonic, align 4
+; IR-NEXT:    br label [[TMP16]]
+; IR:       16:
+; IR-NEXT:    br label [[TMP17]]
+; IR:       17:
+; IR-NEXT:    ret void
 ;
   %result = atomicrmw fadd ptr addrspace(1) %ptr, double %val monotonic, align 4
   ret void
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_system_scope_strictfp(ptr addrspace(1) inreg %ptr, double %val) #2 {
+; IR-ITERATIVE: Function Attrs: strictfp
 ; IR-ITERATIVE-LABEL: @global_atomic_fadd_double_uni_address_div_value_system_scope_strictfp(
-; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP13:%.*]]
 ; IR-ITERATIVE:       2:
-; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-ITERATIVE-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-ITERATIVE-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    br label [[COMPUTELOOP:%.*]]
 ; IR-ITERATIVE:       10:
 ; IR-ITERATIVE-NEXT:    [[TMP11:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[TMP17:%.*]] monotonic, align 4
@@ -1478,10 +1341,10 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_system_sc
 ; IR-ITERATIVE:       ComputeLoop:
 ; IR-ITERATIVE-NEXT:    [[ACCUMULATOR:%.*]] = phi double [ -0.000000e+00, [[TMP2]] ], [ [[TMP17]], [[COMPUTELOOP]] ]
 ; IR-ITERATIVE-NEXT:    [[ACTIVEBITS:%.*]] = phi i64 [ [[TMP9]], [[TMP2]] ], [ [[TMP20:%.*]], [[COMPUTELOOP]] ]
-; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP14:%.*]] = call i64 @llvm.cttz.i64(i64 [[ACTIVEBITS]], i1 true) #[[ATTR9]]
 ; IR-ITERATIVE-NEXT:    [[TMP15:%.*]] = trunc i64 [[TMP14]] to i32
-; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR7]]
-; IR-ITERATIVE-NEXT:    [[TMP17]] = call double @llvm.experimental.constrained.fadd.f64(double [[ACCUMULATOR]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR7]]
+; IR-ITERATIVE-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[VAL:%.*]], i32 [[TMP15]]) #[[ATTR9]]
+; IR-ITERATIVE-NEXT:    [[TMP17]] = fadd double [[ACCUMULATOR]], [[TMP16]]
 ; IR-ITERATIVE-NEXT:    [[TMP18:%.*]] = shl i64 1, [[TMP14]]
 ; IR-ITERATIVE-NEXT:    [[TMP19:%.*]] = xor i64 [[TMP18]], -1
 ; IR-ITERATIVE-NEXT:    [[TMP20]] = and i64 [[ACTIVEBITS]], [[TMP19]]
@@ -1491,31 +1354,32 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_system_sc
 ; IR-ITERATIVE-NEXT:    [[TMP22:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-ITERATIVE-NEXT:    br i1 [[TMP22]], label [[TMP10:%.*]], label [[TMP12]]
 ;
+; IR-DPP: Function Attrs: strictfp
 ; IR-DPP-LABEL: @global_atomic_fadd_double_uni_address_div_value_system_scope_strictfp(
-; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP1:%.*]] = call i1 @llvm.amdgcn.ps.live() #[[ATTR9]]
 ; IR-DPP-NEXT:    br i1 [[TMP1]], label [[TMP2:%.*]], label [[TMP28:%.*]]
 ; IR-DPP:       2:
-; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP3:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
 ; IR-DPP-NEXT:    [[TMP5:%.*]] = lshr i64 [[TMP3]], 32
 ; IR-DPP-NEXT:    [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
-; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP9]], double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP11]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP13]], double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP15]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP17]], double [[TMP18]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP19]], double [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR8]]
-; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR8]]
+; IR-DPP-NEXT:    [[TMP7:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP4]], i32 0) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP8:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP6]], i32 [[TMP7]]) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP9:%.*]] = call double @llvm.amdgcn.set.inactive.f64(double [[VAL:%.*]], double -0.000000e+00) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP10:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP9]], i32 273, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP11:%.*]] = fadd double [[TMP9]], [[TMP10]]
+; IR-DPP-NEXT:    [[TMP12:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP11]], i32 274, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP13:%.*]] = fadd double [[TMP11]], [[TMP12]]
+; IR-DPP-NEXT:    [[TMP14:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP13]], i32 276, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP15:%.*]] = fadd double [[TMP13]], [[TMP14]]
+; IR-DPP-NEXT:    [[TMP16:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP15]], i32 280, i32 15, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP17:%.*]] = fadd double [[TMP15]], [[TMP16]]
+; IR-DPP-NEXT:    [[TMP18:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP17]], i32 322, i32 10, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP19:%.*]] = fadd double [[TMP17]], [[TMP18]]
+; IR-DPP-NEXT:    [[TMP20:%.*]] = call double @llvm.amdgcn.update.dpp.f64(double -0.000000e+00, double [[TMP19]], i32 323, i32 12, i32 15, i1 false) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP21:%.*]] = fadd double [[TMP19]], [[TMP20]]
+; IR-DPP-NEXT:    [[TMP22:%.*]] = call double @llvm.amdgcn.readlane.f64(double [[TMP21]], i32 63) #[[ATTR9]]
+; IR-DPP-NEXT:    [[TMP23:%.*]] = call double @llvm.amdgcn.strict.wwm.f64(double [[TMP22]]) #[[ATTR9]]
 ; IR-DPP-NEXT:    [[TMP24:%.*]] = icmp eq i32 [[TMP8]], 0
 ; IR-DPP-NEXT:    br i1 [[TMP24]], label [[TMP25:%.*]], label [[TMP27:%.*]]
 ; IR-DPP:       25:
@@ -1531,6 +1395,7 @@ define amdgpu_ps void @global_atomic_fadd_double_uni_address_div_value_system_sc
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_div_address_uni_value_agent_scope_unsafe(ptr addrspace(1) %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_uni_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -1540,6 +1405,7 @@ define amdgpu_ps void @global_atomic_fadd_double_div_address_uni_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_div_address_div_value_agent_scope_unsafe(ptr addrspace(1) %ptr, double %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_div_value_agent_scope_unsafe(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 4
 ; IR-NEXT:    ret void
@@ -1549,6 +1415,7 @@ define amdgpu_ps void @global_atomic_fadd_double_div_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_div_address_uni_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double inreg %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_uni_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("one-as") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1558,6 +1425,7 @@ define amdgpu_ps void @global_atomic_fadd_double_div_address_uni_value_one_as_sc
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_div_address_div_value_one_as_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double %val) #1 {
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fadd_double_div_address_div_value_one_as_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("one-as") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1567,6 +1435,7 @@ define amdgpu_ps void @global_atomic_fadd_double_div_address_div_value_one_as_sc
 }
 
 define amdgpu_ps void @global_atomic_fsub_double_div_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr, double inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_double_div_address_uni_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1576,6 +1445,7 @@ define amdgpu_ps void @global_atomic_fsub_double_div_address_uni_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fsub_double_div_address_div_value_agent_scope_strictfp(ptr addrspace(1) %ptr, double %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fsub_double_div_address_div_value_agent_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fsub ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1585,6 +1455,7 @@ define amdgpu_ps void @global_atomic_fsub_double_div_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmin_double_div_address_uni_value_agent_scope(ptr addrspace(1) %ptr, double inreg %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_double_div_address_uni_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1594,6 +1465,7 @@ define amdgpu_ps void @global_atomic_fmin_double_div_address_uni_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmin_double_div_address_div_value_agent_scope(ptr addrspace(1) %ptr, double %val) #0 {
+; IR: Function Attrs: denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmin_double_div_address_div_value_agent_scope(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmin ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1603,6 +1475,7 @@ define amdgpu_ps void @global_atomic_fmin_double_div_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmax_double_div_address_uni_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double inreg %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmax_double_div_address_uni_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1612,6 +1485,7 @@ define amdgpu_ps void @global_atomic_fmax_double_div_address_uni_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fmax_double_div_address_div_value_agent_scope_unsafe_strictfp(ptr addrspace(1) %ptr, double %val) #1{
+; IR: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; IR-LABEL: @global_atomic_fmax_double_div_address_div_value_agent_scope_unsafe_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fmax ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] syncscope("agent") monotonic, align 8
 ; IR-NEXT:    ret void
@@ -1621,6 +1495,7 @@ define amdgpu_ps void @global_atomic_fmax_double_div_address_div_value_agent_sco
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_div_address_uni_value_system_scope_strictfp(ptr addrspace(1) %ptr, double inreg %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_double_div_address_uni_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret void
@@ -1630,6 +1505,7 @@ define amdgpu_ps void @global_atomic_fadd_double_div_address_uni_value_system_sc
 }
 
 define amdgpu_ps void @global_atomic_fadd_double_div_address_div_value_system_scope_strictfp(ptr addrspace(1) %ptr, double %val) #2 {
+; IR: Function Attrs: strictfp
 ; IR-LABEL: @global_atomic_fadd_double_div_address_div_value_system_scope_strictfp(
 ; IR-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VAL:%.*]] monotonic, align 4
 ; IR-NEXT:    ret void
diff --git a/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll b/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll
index 8284bb5d9e99b..3164f232167bd 100644
--- a/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll
+++ b/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll
@@ -1146,33 +1146,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_div_value_agent_scope_
 define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp(ptr addrspace(1) %ptr) #1 {
 ; GFX7LESS-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX7LESS-NEXT:  ; %bb.1:
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1192,33 +1182,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX9-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -1234,31 +1214,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX1064-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1064-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -1274,30 +1246,22 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX1032-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1032-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -1313,86 +1277,63 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX1164-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB2_2
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
-; GFX1164-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1164-NEXT:    s_load_b64 s[2:3], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v1, 0
+; GFX1164-NEXT:    v_cvt_f32_ubyte0_e32 v0, s0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-NEXT:    v_mul_f32_e32 v0, 4.0, v0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    global_atomic_add_f32 v1, v0, s[0:1]
+; GFX1164-NEXT:    global_atomic_add_f32 v1, v0, s[2:3]
 ; GFX1164-NEXT:  .LBB2_2:
 ; GFX1164-NEXT:    s_endpgm
 ;
 ; GFX1132-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB2_2
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1132-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-NEXT:    s_load_b64 s[2:3], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
+; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1132-NEXT:    v_cvt_f32_ubyte0_e32 v0, s0
 ; GFX1132-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mul_f32 v0, 4.0, v0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    global_atomic_add_f32 v1, v0, s[0:1]
+; GFX1132-NEXT:    global_atomic_add_f32 v1, v0, s[2:3]
 ; GFX1132-NEXT:  .LBB2_2:
 ; GFX1132-NEXT:    s_endpgm
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1412,33 +1353,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX9-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX9-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -1454,31 +1385,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1064-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -1494,30 +1417,22 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1032-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -1533,54 +1448,41 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_one_as_scope
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB2_2
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | instid1(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, 0
+; GFX1164-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-DPP-NEXT:    v_mul_f32_e32 v0, 4.0, v0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    global_atomic_add_f32 v1, v0, s[0:1]
+; GFX1164-DPP-NEXT:    global_atomic_add_f32 v1, v0, s[2:3]
 ; GFX1164-DPP-NEXT:  .LBB2_2:
 ; GFX1164-DPP-NEXT:    s_endpgm
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB2_2
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX1132-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-DPP-NEXT:    s_load_b64 s[2:3], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s0
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mul_f32 v0, 4.0, v0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    global_atomic_add_f32 v1, v0, s[0:1]
+; GFX1132-DPP-NEXT:    global_atomic_add_f32 v1, v0, s[2:3]
 ; GFX1132-DPP-NEXT:  .LBB2_2:
 ; GFX1132-DPP-NEXT:    s_endpgm
   %result = atomicrmw fadd ptr addrspace(1) %ptr, float 4.0 syncscope("one-as") monotonic, !amdgpu.no.fine.grained.memory !1, !amdgpu.ignore.denormal.mode !1
@@ -2373,33 +2275,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_div_value_one_as_scope
 define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr) #2{
 ; GFX7LESS-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX7LESS-NEXT:  ; %bb.1:
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -2419,33 +2311,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX9-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -2461,31 +2343,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1064-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1064-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -2501,30 +2375,22 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1032-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1032-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -2540,32 +2406,25 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1164-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1164-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -2583,29 +2442,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1132-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1132-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -2623,33 +2476,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -2669,33 +2512,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX9-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX9-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -2711,31 +2544,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1064-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -2751,30 +2576,22 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1032-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -2790,32 +2607,25 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1164-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -2833,29 +2643,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_agent_scope_
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1132-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -4443,33 +4247,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_div_value_agent_scope_
 define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scope_strictfp(ptr addrspace(1) %ptr) #2 {
 ; GFX7LESS-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX7LESS-NEXT:  ; %bb.1:
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -4489,33 +4283,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX9-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -4531,31 +4315,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1064-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1064-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -4571,30 +4347,22 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1032-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1032-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -4610,32 +4378,25 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1164-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1164-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -4653,29 +4414,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1132-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1132-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -4693,33 +4448,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -4739,33 +4484,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX9-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX9-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -4781,31 +4516,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1064-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -4821,30 +4548,22 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1032-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-DPP-NEXT:    v_add_f32_e32 v0, v1, v2
@@ -4860,32 +4579,25 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1164-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -4903,29 +4615,23 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_uni_value_default_scop
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fadd_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1132-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -7136,12 +6842,6 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_div_value_agent
 define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp(ptr addrspace(1) %ptr) #1 {
 ; GFX7LESS-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -7150,19 +6850,16 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX7LESS-NEXT:  ; %bb.1:
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB11_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -7185,30 +6882,21 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX9-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s5
@@ -7228,27 +6916,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1064-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-NEXT:    v_mov_b32_e32 v3, s3
@@ -7269,27 +6950,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1032-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-NEXT:    v_mov_b32_e32 v3, s5
@@ -7309,28 +6983,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1164-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, s3
@@ -7353,27 +7021,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1132-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-NEXT:  .LBB11_2: ; %atomicrmw.start
@@ -7393,12 +7056,6 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -7407,19 +7064,16 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB11_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -7442,30 +7096,21 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX9-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -7485,27 +7130,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -7526,27 +7164,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -7566,28 +7197,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB11_3
-; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:  ; %bb.1:
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -7610,27 +7235,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_one_a
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-DPP-NEXT:  .LBB11_2: ; %atomicrmw.start
@@ -8577,12 +8197,6 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_div_value_one_a
 define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr) #2{
 ; GFX7LESS-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -8591,19 +8205,16 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX7LESS-NEXT:  ; %bb.1:
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB13_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -8626,30 +8237,21 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX9-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s5
@@ -8669,27 +8271,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1064-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-NEXT:    v_mov_b32_e32 v3, s3
@@ -8710,27 +8305,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1032-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-NEXT:    v_mov_b32_e32 v3, s5
@@ -8750,28 +8338,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1164-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, s3
@@ -8794,27 +8376,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1132-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-NEXT:  .LBB13_2: ; %atomicrmw.start
@@ -8834,12 +8411,6 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -8848,19 +8419,16 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB13_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -8883,30 +8451,21 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX9-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -8926,27 +8485,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -8967,27 +8519,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -9007,28 +8552,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -9051,27 +8590,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_agent
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-DPP-NEXT:  .LBB13_2: ; %atomicrmw.start
@@ -10941,12 +10475,6 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_div_value_agent
 define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp(ptr addrspace(1) %ptr) #2 {
 ; GFX7LESS-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -10955,19 +10483,16 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX7LESS-NEXT:  ; %bb.1:
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB16_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -10990,30 +10515,21 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX9-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s5
@@ -11033,27 +10549,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1064-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-NEXT:    v_mov_b32_e32 v3, s3
@@ -11074,27 +10583,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1032-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-NEXT:    v_mov_b32_e32 v3, s5
@@ -11114,28 +10616,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1164-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, s3
@@ -11158,27 +10654,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1132-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-NEXT:  .LBB16_2: ; %atomicrmw.start
@@ -11198,12 +10689,6 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -11212,19 +10697,16 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB16_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -11247,30 +10729,21 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX9-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -11290,27 +10763,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -11331,27 +10797,20 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -11371,28 +10830,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -11415,27 +10868,22 @@ define amdgpu_kernel void @global_atomic_fadd_double_uni_address_uni_value_defau
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fadd_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-DPP-NEXT:  .LBB16_2: ; %atomicrmw.start
diff --git a/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll b/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll
index 23515ffcfb139..1848f94bdfa24 100644
--- a/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll
+++ b/llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll
@@ -1258,33 +1258,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_div_value_agent_scope_
 define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp(ptr addrspace(1) %ptr) #1 {
 ; GFX7LESS-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX7LESS-NEXT:  ; %bb.1:
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1304,33 +1294,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX9-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -1346,31 +1326,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1064-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1064-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -1386,30 +1358,22 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1032-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1032-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -1425,32 +1389,25 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1164-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1164-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -1468,29 +1425,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1132-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1132-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -1508,33 +1459,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -1554,33 +1495,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX9-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX9-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -1596,31 +1527,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1064-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -1636,30 +1559,22 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1032-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -1675,32 +1590,25 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1164-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -1718,29 +1626,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_one_as_scope
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB2_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-DPP-NEXT:  .LBB2_2: ; %atomicrmw.start
 ; GFX1132-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -2597,33 +2499,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_div_value_one_as_scope
 define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr) #2{
 ; GFX7LESS-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX7LESS-NEXT:  ; %bb.1:
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -2643,33 +2535,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX9-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -2685,31 +2567,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1064-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1064-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -2725,30 +2599,22 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1032-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1032-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -2764,32 +2630,25 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1164-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1164-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -2807,29 +2666,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1132-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1132-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -2847,33 +2700,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -2893,33 +2736,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX9-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX9-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -2935,31 +2768,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1064-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -2975,30 +2800,22 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1032-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -3014,32 +2831,25 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1164-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -3057,29 +2867,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_agent_scope_
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB4_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-DPP-NEXT:  .LBB4_2: ; %atomicrmw.start
 ; GFX1132-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -4771,33 +4575,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_div_value_agent_scope_
 define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scope_strictfp(ptr addrspace(1) %ptr) #2 {
 ; GFX7LESS-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX7LESS-NEXT:  ; %bb.1:
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -4817,33 +4611,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX9-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX9-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -4859,31 +4643,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1064-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1064-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -4899,30 +4675,22 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1032-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1032-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -4938,32 +4706,25 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1164-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1164-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -4981,29 +4742,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1132-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1132-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -5021,33 +4776,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s0, 0
-; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s1, v0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
+; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
 ; GFX7LESS-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX7LESS-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[0:1]
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dword s6, s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX7LESS-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, s6
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -5067,33 +4812,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX9-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s5, s[2:3]
+; GFX9-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s5
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, s4
-; GFX9-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX9-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX9-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX9-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -5109,31 +4844,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1064-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    s_load_dword s2, s[0:1], 0x0
+; GFX1064-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s2
-; GFX1064-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
-; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1064-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1064-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1064-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1064-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -5149,30 +4876,22 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[0:1]
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1032-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
+; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    s_load_dword s3, s[0:1], 0x0
+; GFX1032-DPP-NEXT:    s_load_dword s4, s[0:1], 0x0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s3
-; GFX1032-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1032-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1032-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1032-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1032-DPP-NEXT:    v_sub_f32_e32 v0, v1, v2
@@ -5188,32 +4907,25 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, 0
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    s_load_b32 s2, s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_3) | instid1(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
-; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s2
+; GFX1164-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s2
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX1164-DPP-NEXT:    v_mul_f32_e32 v2, 4.0, v0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
+; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s4
 ; GFX1164-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1164-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -5231,29 +4943,23 @@ define amdgpu_kernel void @global_atomic_fsub_uni_address_uni_value_default_scop
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fsub_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB7_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
 ; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v3, 0
+; GFX1132-DPP-NEXT:    v_cvt_f32_ubyte0_e32 v0, s3
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    s_load_b32 s3, s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(SKIP_1) | instid1(VALU_DEP_1)
-; GFX1132-DPP-NEXT:    v_cvt_f32_f64_e32 v0, v[0:1]
+; GFX1132-DPP-NEXT:    s_load_b32 s4, s[0:1], 0x0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v1, s3 :: v_dual_mul_f32 v2, 4.0, v0
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_dual_mul_f32 v2, 4.0, v0 :: v_dual_mov_b32 v1, s4
 ; GFX1132-DPP-NEXT:  .LBB7_2: ; %atomicrmw.start
 ; GFX1132-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
@@ -7464,12 +7170,6 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_div_value_agent
 define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp(ptr addrspace(1) %ptr) #1 {
 ; GFX7LESS-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -7478,19 +7178,16 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX7LESS-NEXT:  ; %bb.1:
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB11_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -7513,30 +7210,21 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX9-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s5
@@ -7556,27 +7244,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1064-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-NEXT:    v_mov_b32_e32 v3, s3
@@ -7597,27 +7278,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1032-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-NEXT:    v_mov_b32_e32 v3, s5
@@ -7637,28 +7311,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1164-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, s3
@@ -7681,27 +7349,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1132-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-NEXT:  .LBB11_2: ; %atomicrmw.start
@@ -7721,12 +7384,6 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -7735,19 +7392,16 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB11_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -7770,30 +7424,21 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX9-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -7813,27 +7458,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -7854,27 +7492,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -7894,28 +7525,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB11_3
-; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:  ; %bb.1:
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -7938,27 +7563,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_one_a
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_one_as_scope_unsafe_structfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB11_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-DPP-NEXT:  .LBB11_2: ; %atomicrmw.start
@@ -8904,12 +8524,6 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_div_value_one_a
 define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp(ptr addrspace(1) %ptr) #2{
 ; GFX7LESS-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -8918,19 +8532,16 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX7LESS-NEXT:  ; %bb.1:
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB13_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -8953,30 +8564,21 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX9-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s5
@@ -8996,27 +8598,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1064-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-NEXT:    v_mov_b32_e32 v3, s3
@@ -9037,27 +8632,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1032-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-NEXT:    v_mov_b32_e32 v3, s5
@@ -9077,28 +8665,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1164-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, s3
@@ -9121,27 +8703,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1132-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-NEXT:  .LBB13_2: ; %atomicrmw.start
@@ -9161,12 +8738,6 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -9175,19 +8746,16 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB13_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -9210,30 +8778,21 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX9-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -9253,27 +8812,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -9294,27 +8846,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -9334,28 +8879,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -9378,27 +8917,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_agent
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_agent_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB13_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-DPP-NEXT:  .LBB13_2: ; %atomicrmw.start
@@ -11267,12 +10801,6 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_div_value_agent
 define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp(ptr addrspace(1) %ptr) #2 {
 ; GFX7LESS-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS:       ; %bb.0:
-; GFX7LESS-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -11281,19 +10809,16 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ; GFX7LESS-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX7LESS-NEXT:  ; %bb.1:
 ; GFX7LESS-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-NEXT:  .LBB16_2: ; %atomicrmw.start
 ; GFX7LESS-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -11316,30 +10841,21 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX9-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-NEXT:    s_mov_b32 s14, -1
-; GFX9-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX9-NEXT:  ; %bb.1:
-; GFX9-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-NEXT:    v_mov_b32_e32 v3, s5
@@ -11359,27 +10875,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1064-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064:       ; %bb.0:
-; GFX1064-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-NEXT:    s_mov_b32 s14, -1
-; GFX1064-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1064-NEXT:  ; %bb.1:
-; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-NEXT:    v_mov_b32_e32 v3, s3
@@ -11400,27 +10909,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1032-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032:       ; %bb.0:
-; GFX1032-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-NEXT:    s_mov_b32 s14, -1
-; GFX1032-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-NEXT:    s_mov_b32 s2, 0
-; GFX1032-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1032-NEXT:  ; %bb.1:
-; GFX1032-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-NEXT:    v_mov_b32_e32 v3, s5
@@ -11440,28 +10942,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1164-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164:       ; %bb.0:
-; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-NEXT:    s_clause 0x1
-; GFX1164-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1164-NEXT:  ; %bb.1:
-; GFX1164-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-NEXT:    v_mov_b32_e32 v3, s3
@@ -11484,27 +10980,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1132-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132:       ; %bb.0:
-; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-NEXT:    s_clause 0x1
-; GFX1132-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-NEXT:    s_mov_b32 s2, 0
+; GFX1132-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1132-NEXT:  ; %bb.1:
-; GFX1132-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-NEXT:  .LBB16_2: ; %atomicrmw.start
@@ -11524,12 +11015,6 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX7LESS-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX7LESS-DPP:       ; %bb.0:
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s15, 0xe8f000
-; GFX7LESS-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX7LESS-DPP-NEXT:    s_addc_u32 s13, s13, 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[2:3], exec
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_lo_u32_b32_e64 v0, s2, 0
 ; GFX7LESS-DPP-NEXT:    v_mbcnt_hi_u32_b32_e32 v0, s3, v0
@@ -11538,19 +11023,16 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ; GFX7LESS-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX7LESS-DPP-NEXT:  ; %bb.1:
 ; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x9
-; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s6, s[2:3]
-; GFX7LESS-DPP-NEXT:    s_mov_b32 s7, 0x43300000
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
+; GFX7LESS-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[8:9], s[0:1], 0x0
+; GFX7LESS-DPP-NEXT:    s_load_dwordx2 s[6:7], s[0:1], 0x0
 ; GFX7LESS-DPP-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s3, 0xf000
-; GFX7LESS-DPP-NEXT:    v_add_f64 v[0:1], s[6:7], v[0:1]
-; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX7LESS-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
+; GFX7LESS-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX7LESS-DPP-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s8
-; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s9
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v2, s6
+; GFX7LESS-DPP-NEXT:    v_mov_b32_e32 v3, s7
 ; GFX7LESS-DPP-NEXT:    s_mov_b32 s2, -1
 ; GFX7LESS-DPP-NEXT:  .LBB16_2: ; %atomicrmw.start
 ; GFX7LESS-DPP-NEXT:    ; =>This Inner Loop Header: Depth=1
@@ -11573,30 +11055,21 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX9-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX9-DPP:       ; %bb.0:
-; GFX9-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX9-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX9-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX9-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX9-DPP-NEXT:    s_mov_b32 s15, 0xe00000
-; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX9-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX9-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX9-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX9-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX9-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX9-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX9-DPP-NEXT:  ; %bb.1:
-; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-DPP-NEXT:    v_mov_b32_e32 v1, 0xc3300000
-; GFX9-DPP-NEXT:    s_mov_b32 s1, 0x43300000
-; GFX9-DPP-NEXT:    v_add_f64 v[0:1], s[0:1], v[0:1]
+; GFX9-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
+; GFX9-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX9-DPP-NEXT:    s_mov_b64 s[2:3], 0
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX9-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
 ; GFX9-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -11616,27 +11089,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1064-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1064-DPP:       ; %bb.0:
-; GFX1064-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1064-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1064-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1064-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1064-DPP-NEXT:    s_mov_b32 s15, 0x31e16000
-; GFX1064-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
-; GFX1064-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1064-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1064-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s2, 0
+; GFX1064-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s3, v0
 ; GFX1064-DPP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
-; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[2:3], vcc
+; GFX1064-DPP-NEXT:    s_and_saveexec_b64 s[0:1], vcc
 ; GFX1064-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1064-DPP-NEXT:  ; %bb.1:
-; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[0:1]
-; GFX1064-DPP-NEXT:    s_mov_b32 s3, 0x43300000
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1064-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[2:3]
+; GFX1064-DPP-NEXT:    s_bcnt1_i32_b64 s2, s[2:3]
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1064-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s2
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
-; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1064-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1064-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1064-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -11657,27 +11123,20 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1032-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1032-DPP:       ; %bb.0:
-; GFX1032-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1032-DPP-NEXT:    s_mov_b32 s12, SCRATCH_RSRC_DWORD0
-; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
-; GFX1032-DPP-NEXT:    s_mov_b32 s13, SCRATCH_RSRC_DWORD1
-; GFX1032-DPP-NEXT:    s_mov_b32 s14, -1
-; GFX1032-DPP-NEXT:    s_mov_b32 s15, 0x31c16000
-; GFX1032-DPP-NEXT:    s_add_u32 s12, s12, s11
-; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
-; GFX1032-DPP-NEXT:    s_addc_u32 s13, s13, 0
+; GFX1032-DPP-NEXT:    s_mov_b32 s3, exec_lo
 ; GFX1032-DPP-NEXT:    s_mov_b32 s2, 0
-; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s1, vcc_lo
+; GFX1032-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s3, 0
+; GFX1032-DPP-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX1032-DPP-NEXT:    s_and_saveexec_b32 s0, vcc_lo
 ; GFX1032-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1032-DPP-NEXT:  ; %bb.1:
-; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s6, s0
-; GFX1032-DPP-NEXT:    s_mov_b32 s7, 0x43300000
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
-; GFX1032-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, s[6:7]
+; GFX1032-DPP-NEXT:    s_bcnt1_i32_b32 s3, s3
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1032-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s3
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x0
-; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1032-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1032-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v2, s4
 ; GFX1032-DPP-NEXT:    v_mov_b32_e32 v3, s5
@@ -11697,28 +11156,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1164-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1164-DPP:       ; %bb.0:
-; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, exec
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v0, 0x43300000
-; GFX1164-DPP-NEXT:    v_mov_b32_e32 v1, s0
-; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
 ; GFX1164-DPP-NEXT:    s_mov_b64 s[0:1], exec
-; GFX1164-DPP-NEXT:    s_clause 0x1
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1164-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1164-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v2, exec_hi, v2
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1164-DPP-NEXT:    s_mov_b64 s[2:3], exec
+; GFX1164-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mbcnt_hi_u32_b32 v0, s1, v0
+; GFX1164-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1164-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1164-DPP-NEXT:  ; %bb.1:
-; GFX1164-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1164-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1164-DPP-NEXT:    s_bcnt1_i32_b64 s0, s[0:1]
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1164-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1164-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    s_load_b64 s[2:3], s[0:1], 0x0
-; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1164-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1164-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1164-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX1164-DPP-NEXT:    v_mov_b32_e32 v3, s3
@@ -11741,27 +11194,22 @@ define amdgpu_kernel void @global_atomic_fsub_double_uni_address_uni_value_defau
 ;
 ; GFX1132-DPP-LABEL: global_atomic_fsub_double_uni_address_uni_value_default_scope_strictfp:
 ; GFX1132-DPP:       ; %bb.0:
-; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX1132-DPP-NEXT:    v_dual_mov_b32 v0, 0x43300000 :: v_dual_mov_b32 v1, s0
-; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v2, exec_lo, 0
-; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
 ; GFX1132-DPP-NEXT:    s_mov_b32 s0, exec_lo
-; GFX1132-DPP-NEXT:    s_clause 0x1
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v0, off offset:4
-; GFX1132-DPP-NEXT:    scratch_store_b32 off, v1, off
-; GFX1132-DPP-NEXT:    scratch_load_b64 v[0:1], off, off
-; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v2
+; GFX1132-DPP-NEXT:    s_mov_b32 s2, 0
+; GFX1132-DPP-NEXT:    v_mbcnt_lo_u32_b32 v0, s0, 0
+; GFX1132-DPP-NEXT:    s_mov_b32 s1, exec_lo
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_cmpx_eq_u32_e32 0, v0
 ; GFX1132-DPP-NEXT:    s_cbranch_execz .LBB16_3
 ; GFX1132-DPP-NEXT:  ; %bb.1:
-; GFX1132-DPP-NEXT:    s_waitcnt vmcnt(0)
-; GFX1132-DPP-NEXT:    v_add_f64 v[0:1], 0xc3300000, v[0:1]
-; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
+; GFX1132-DPP-NEXT:    s_bcnt1_i32_b32 s0, s0
 ; GFX1132-DPP-NEXT:    v_mov_b32_e32 v6, 0
+; GFX1132-DPP-NEXT:    v_cvt_f64_u32_e32 v[0:1], s0
+; GFX1132-DPP-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    s_load_b64 s[4:5], s[0:1], 0x0
-; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_2)
-; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], 4.0, v[0:1]
+; GFX1132-DPP-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX1132-DPP-NEXT:    v_mul_f64 v[4:5], v[0:1], 4.0
 ; GFX1132-DPP-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX1132-DPP-NEXT:    v_dual_mov_b32 v2, s4 :: v_dual_mov_b32 v3, s5
 ; GFX1132-DPP-NEXT:  .LBB16_2: ; %atomicrmw.start
diff --git a/llvm/test/CodeGen/AMDGPU/strict_fpext.ll b/llvm/test/CodeGen/AMDGPU/strict_fpext.ll
index 80bf0b1336b01..7cbb1eb5057bf 100644
--- a/llvm/test/CodeGen/AMDGPU/strict_fpext.ll
+++ b/llvm/test/CodeGen/AMDGPU/strict_fpext.ll
@@ -10,7 +10,6 @@ define float @v_constrained_fpext_f16_to_f32_fpexcept_strict(half %arg) #0 {
 ; SI-LABEL: v_constrained_fpext_f16_to_f32_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v0, 0xffff, v0
 ; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
 ; SI-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -45,10 +44,9 @@ define <2 x float> @v_constrained_fpext_v2f16_to_v2f32_fpexcept_strict(<2 x half
 ; SI-LABEL: v_constrained_fpext_v2f16_to_v2f32_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v1, 0xffff, v0
-; SI-NEXT:    v_lshrrev_b32_e32 v2, 16, v0
-; SI-NEXT:    v_cvt_f32_f16_e32 v0, v1
-; SI-NEXT:    v_cvt_f32_f16_e32 v1, v2
+; SI-NEXT:    v_lshrrev_b32_e32 v1, 16, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v1, v1
 ; SI-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX89-LABEL: v_constrained_fpext_v2f16_to_v2f32_fpexcept_strict:
@@ -90,12 +88,11 @@ define <3 x float> @v_constrained_fpext_v3f16_to_v3f32_fpexcept_strict(<3 x half
 ; SI-LABEL: v_constrained_fpext_v3f16_to_v3f32_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v2, 0xffff, v1
-; SI-NEXT:    v_lshrrev_b32_e32 v1, 16, v0
-; SI-NEXT:    v_and_b32_e32 v0, 0xffff, v0
+; SI-NEXT:    v_lshrrev_b32_e32 v2, 16, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v3, v2
 ; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; SI-NEXT:    v_cvt_f32_f16_e32 v1, v1
-; SI-NEXT:    v_cvt_f32_f16_e32 v2, v2
+; SI-NEXT:    v_cvt_f32_f16_e32 v2, v1
+; SI-NEXT:    v_mov_b32_e32 v1, v3
 ; SI-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX89-LABEL: v_constrained_fpext_v3f16_to_v3f32_fpexcept_strict:
@@ -200,7 +197,6 @@ define double @v_constrained_fpext_f16_to_f64_fpexcept_strict(half %arg) #0 {
 ; SI-LABEL: v_constrained_fpext_f16_to_f64_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v0, 0xffff, v0
 ; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
 ; SI-NEXT:    v_cvt_f64_f32_e32 v[0:1], v0
 ; SI-NEXT:    s_setpc_b64 s[30:31]
@@ -240,9 +236,8 @@ define <2 x double> @v_constrained_fpext_v2f16_to_v2f64_fpexcept_strict(<2 x hal
 ; SI-LABEL: v_constrained_fpext_v2f16_to_v2f64_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v1, 0xffff, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v1, v0
 ; SI-NEXT:    v_lshrrev_b32_e32 v0, 16, v0
-; SI-NEXT:    v_cvt_f32_f16_e32 v1, v1
 ; SI-NEXT:    v_cvt_f32_f16_e32 v2, v0
 ; SI-NEXT:    v_cvt_f64_f32_e32 v[0:1], v1
 ; SI-NEXT:    v_cvt_f64_f32_e32 v[2:3], v2
@@ -292,12 +287,10 @@ define <3 x double> @v_constrained_fpext_v3f16_to_v2f64_fpexcept_strict(<3 x hal
 ; SI-LABEL: v_constrained_fpext_v3f16_to_v2f64_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v2, 0xffff, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v2, v0
 ; SI-NEXT:    v_lshrrev_b32_e32 v0, 16, v0
 ; SI-NEXT:    v_cvt_f32_f16_e32 v3, v0
-; SI-NEXT:    v_and_b32_e32 v0, 0xffff, v1
-; SI-NEXT:    v_cvt_f32_f16_e32 v2, v2
-; SI-NEXT:    v_cvt_f32_f16_e32 v4, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v4, v1
 ; SI-NEXT:    v_cvt_f64_f32_e32 v[0:1], v2
 ; SI-NEXT:    v_cvt_f64_f32_e32 v[2:3], v3
 ; SI-NEXT:    v_cvt_f64_f32_e32 v[4:5], v4
@@ -355,37 +348,31 @@ define float @v_constrained_fneg_fpext_f16_to_f32_fpexcept_strict(half %arg) #0
 ; SI-LABEL: v_constrained_fneg_fpext_f16_to_f32_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v0, 0xffff, v0
-; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; SI-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
+; SI-NEXT:    v_cvt_f32_f16_e64 v0, -v0
 ; SI-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX89-LABEL: v_constrained_fneg_fpext_f16_to_f32_fpexcept_strict:
 ; GFX89:       ; %bb.0:
 ; GFX89-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX89-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX89-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
+; GFX89-NEXT:    v_cvt_f32_f16_e64 v0, -v0
 ; GFX89-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10-LABEL: v_constrained_fneg_fpext_f16_to_f32_fpexcept_strict:
 ; GFX10:       ; %bb.0:
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX10-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
+; GFX10-NEXT:    v_cvt_f32_f16_e64 v0, -v0
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-TRUE16-LABEL: v_constrained_fneg_fpext_f16_to_f32_fpexcept_strict:
 ; GFX11-TRUE16:       ; %bb.0:
 ; GFX11-TRUE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-TRUE16-NEXT:    v_cvt_f32_f16_e32 v0, v0.l
-; GFX11-TRUE16-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
+; GFX11-TRUE16-NEXT:    v_cvt_f32_f16_e64 v0, -v0.l
 ; GFX11-TRUE16-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-FAKE16-LABEL: v_constrained_fneg_fpext_f16_to_f32_fpexcept_strict:
 ; GFX11-FAKE16:       ; %bb.0:
 ; GFX11-FAKE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-FAKE16-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX11-FAKE16-NEXT:    v_xor_b32_e32 v0, 0x80000000, v0
+; GFX11-FAKE16-NEXT:    v_cvt_f32_f16_e64 v0, -v0
 ; GFX11-FAKE16-NEXT:    s_setpc_b64 s[30:31]
   %result = call float @llvm.experimental.constrained.fpext.f32.f16(half %arg, metadata !"fpexcept.strict")
   %neg.result = fneg float %result
@@ -396,9 +383,7 @@ define float @v_constrained_fpext_fneg_f16_to_f32_fpexcept_strict(half %arg) #0
 ; SI-LABEL: v_constrained_fpext_fneg_f16_to_f32_fpexcept_strict:
 ; SI:       ; %bb.0:
 ; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; SI-NEXT:    v_xor_b32_e32 v0, 0x8000, v0
-; SI-NEXT:    v_and_b32_e32 v0, 0xffff, v0
-; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; SI-NEXT:    v_cvt_f32_f16_e64 v0, -v0
 ; SI-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX89-LABEL: v_constrained_fpext_fneg_f16_to_f32_fpexcept_strict:
@@ -444,8 +429,7 @@ define double @v_constrained_fneg_fpext_f32_to_f64_fpexcept_strict(float %arg) #
 ; GCN-LABEL: v_constrained_fneg_fpext_f32_to_f64_fpexcept_strict:
 ; GCN:       ; %bb.0:
 ; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_cvt_f64_f32_e32 v[0:1], v0
-; GCN-NEXT:    v_xor_b32_e32 v1, 0x80000000, v1
+; GCN-NEXT:    v_cvt_f64_f32_e64 v[0:1], -v0
 ; GCN-NEXT:    s_setpc_b64 s[30:31]
   %result = call double @llvm.experimental.constrained.fpext.f64.f32(float %arg, metadata !"fpexcept.strict")
   %neg.result = fneg double %result
@@ -519,10 +503,9 @@ define <2 x float> @v_constrained_fpext_v2f16_to_v2f32_noabi(ptr addrspace(1) %p
 ; SI-NEXT:    s_mov_b32 s5, s6
 ; SI-NEXT:    buffer_load_dword v0, v[0:1], s[4:7], 0 addr64
 ; SI-NEXT:    s_waitcnt vmcnt(0)
-; SI-NEXT:    v_and_b32_e32 v1, 0xffff, v0
-; SI-NEXT:    v_lshrrev_b32_e32 v2, 16, v0
-; SI-NEXT:    v_cvt_f32_f16_e32 v0, v1
-; SI-NEXT:    v_cvt_f32_f16_e32 v1, v2
+; SI-NEXT:    v_lshrrev_b32_e32 v1, 16, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v0, v0
+; SI-NEXT:    v_cvt_f32_f16_e32 v1, v1
 ; SI-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_constrained_fpext_v2f16_to_v2f32_noabi:
diff --git a/llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll b/llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll
index fd313a76fc675..4c92846075903 100644
--- a/llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/strict_ldexp.f16.ll
@@ -104,11 +104,11 @@ define <2 x half> @test_ldexp_v2f16_v2i32(ptr addrspace(1) %out, <2 x half> %a,
 ; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-SDAG-NEXT:    s_movk_i32 s4, 0x8000
 ; GFX8-SDAG-NEXT:    v_mov_b32_e32 v0, 0x7fff
-; GFX8-SDAG-NEXT:    v_med3_i32 v1, v3, s4, v0
-; GFX8-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v0
-; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v1, v2, v1
-; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v0, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX8-SDAG-NEXT:    v_or_b32_e32 v0, v1, v0
+; GFX8-SDAG-NEXT:    v_med3_i32 v1, v4, s4, v0
+; GFX8-SDAG-NEXT:    v_med3_i32 v0, v3, s4, v0
+; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v1, v2, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
+; GFX8-SDAG-NEXT:    v_or_b32_e32 v0, v0, v1
 ; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-SDAG-LABEL: test_ldexp_v2f16_v2i32:
@@ -116,12 +116,11 @@ define <2 x half> @test_ldexp_v2f16_v2i32(ptr addrspace(1) %out, <2 x half> %a,
 ; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX9-SDAG-NEXT:    s_movk_i32 s4, 0x8000
 ; GFX9-SDAG-NEXT:    v_mov_b32_e32 v0, 0x7fff
-; GFX9-SDAG-NEXT:    v_med3_i32 v1, v3, s4, v0
-; GFX9-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v0
-; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v1, v2, v1
-; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v0, v2, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX9-SDAG-NEXT:    s_mov_b32 s4, 0x5040100
-; GFX9-SDAG-NEXT:    v_perm_b32 v0, v0, v1, s4
+; GFX9-SDAG-NEXT:    v_med3_i32 v1, v4, s4, v0
+; GFX9-SDAG-NEXT:    v_med3_i32 v0, v3, s4, v0
+; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v1, v2, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
+; GFX9-SDAG-NEXT:    v_pack_b32_f16 v0, v0, v1
 ; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-SDAG-TRUE16-LABEL: test_ldexp_v2f16_v2i32:
@@ -140,14 +139,14 @@ define <2 x half> @test_ldexp_v2f16_v2i32(ptr addrspace(1) %out, <2 x half> %a,
 ; GFX11-SDAG-FAKE16:       ; %bb.0:
 ; GFX11-SDAG-FAKE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-SDAG-FAKE16-NEXT:    s_movk_i32 s0, 0x8000
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v0, v3, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v1, v4, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v0, v2, v0
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v3, v1
-; GFX11-SDAG-FAKE16-NEXT:    v_perm_b32 v0, v1, v0, 0x5040100
+; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v0, v4, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v3, v3, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v0, v1, v0
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v2, v3
+; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX11-SDAG-FAKE16-NEXT:    v_pack_b32_f16 v0, v1, v0
 ; GFX11-SDAG-FAKE16-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-GISEL-LABEL: test_ldexp_v2f16_v2i32:
@@ -210,29 +209,28 @@ define <3 x half> @test_ldexp_v3f16_v3i32(ptr addrspace(1) %out, <3 x half> %a,
 ; GFX8-SDAG:       ; %bb.0:
 ; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-SDAG-NEXT:    s_movk_i32 s4, 0x8000
-; GFX8-SDAG-NEXT:    v_mov_b32_e32 v1, 0x7fff
-; GFX8-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v1
-; GFX8-SDAG-NEXT:    v_med3_i32 v4, v5, s4, v1
-; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
-; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v2, v2, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX8-SDAG-NEXT:    v_med3_i32 v1, v6, s4, v1
-; GFX8-SDAG-NEXT:    v_or_b32_e32 v0, v0, v2
+; GFX8-SDAG-NEXT:    v_mov_b32_e32 v0, 0x7fff
+; GFX8-SDAG-NEXT:    v_med3_i32 v1, v6, s4, v0
 ; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v1, v3, v1
+; GFX8-SDAG-NEXT:    v_med3_i32 v3, v4, s4, v0
+; GFX8-SDAG-NEXT:    v_med3_i32 v0, v5, s4, v0
+; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v3, v2, v3
+; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v0, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX8-SDAG-NEXT:    v_or_b32_e32 v0, v3, v0
 ; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-SDAG-LABEL: test_ldexp_v3f16_v3i32:
 ; GFX9-SDAG:       ; %bb.0:
 ; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX9-SDAG-NEXT:    s_movk_i32 s4, 0x8000
-; GFX9-SDAG-NEXT:    v_mov_b32_e32 v1, 0x7fff
-; GFX9-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v1
-; GFX9-SDAG-NEXT:    v_med3_i32 v4, v5, s4, v1
-; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
-; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX9-SDAG-NEXT:    s_mov_b32 s5, 0x5040100
-; GFX9-SDAG-NEXT:    v_med3_i32 v1, v6, s4, v1
-; GFX9-SDAG-NEXT:    v_perm_b32 v0, v2, v0, s5
+; GFX9-SDAG-NEXT:    v_mov_b32_e32 v0, 0x7fff
+; GFX9-SDAG-NEXT:    v_med3_i32 v1, v6, s4, v0
 ; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v1, v3, v1
+; GFX9-SDAG-NEXT:    v_med3_i32 v3, v5, s4, v0
+; GFX9-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v0
+; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v3, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
+; GFX9-SDAG-NEXT:    v_pack_b32_f16 v0, v0, v3
 ; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-SDAG-TRUE16-LABEL: test_ldexp_v3f16_v3i32:
@@ -253,16 +251,15 @@ define <3 x half> @test_ldexp_v3f16_v3i32(ptr addrspace(1) %out, <3 x half> %a,
 ; GFX11-SDAG-FAKE16:       ; %bb.0:
 ; GFX11-SDAG-FAKE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-SDAG-FAKE16-NEXT:    s_movk_i32 s0, 0x8000
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_2) | instid1(VALU_DEP_3)
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v0, v4, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v1, v5, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v4, 16, v2
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v0, v2, v0
+; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v1, 16, v2
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v0, v5, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v4, v4, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v0, v1, v0
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v2, v4
 ; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v2, v6, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v4, v1
-; GFX11-SDAG-FAKE16-NEXT:    v_perm_b32 v0, v1, v0, 0x5040100
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_3)
+; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-FAKE16-NEXT:    v_pack_b32_f16 v0, v1, v0
 ; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v3, v2
 ; GFX11-SDAG-FAKE16-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -339,16 +336,16 @@ define <4 x half> @test_ldexp_v4f16_v4i32(ptr addrspace(1) %out, <4 x half> %a,
 ; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-SDAG-NEXT:    s_movk_i32 s4, 0x8000
 ; GFX8-SDAG-NEXT:    v_mov_b32_e32 v0, 0x7fff
-; GFX8-SDAG-NEXT:    v_med3_i32 v1, v7, s4, v0
-; GFX8-SDAG-NEXT:    v_med3_i32 v6, v6, s4, v0
-; GFX8-SDAG-NEXT:    v_med3_i32 v5, v5, s4, v0
-; GFX8-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v0
-; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v1, v3, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v3, v3, v6
-; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v5, v2, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
-; GFX8-SDAG-NEXT:    v_or_b32_e32 v0, v0, v5
-; GFX8-SDAG-NEXT:    v_or_b32_e32 v1, v3, v1
+; GFX8-SDAG-NEXT:    v_med3_i32 v1, v6, s4, v0
+; GFX8-SDAG-NEXT:    v_med3_i32 v6, v7, s4, v0
+; GFX8-SDAG-NEXT:    v_med3_i32 v4, v4, s4, v0
+; GFX8-SDAG-NEXT:    v_med3_i32 v0, v5, s4, v0
+; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v1, v3, v1
+; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v3, v3, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX8-SDAG-NEXT:    v_ldexp_f16_e32 v4, v2, v4
+; GFX8-SDAG-NEXT:    v_ldexp_f16_sdwa v0, v2, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX8-SDAG-NEXT:    v_or_b32_e32 v0, v4, v0
+; GFX8-SDAG-NEXT:    v_or_b32_e32 v1, v1, v3
 ; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX9-SDAG-LABEL: test_ldexp_v4f16_v4i32:
@@ -356,17 +353,16 @@ define <4 x half> @test_ldexp_v4f16_v4i32(ptr addrspace(1) %out, <4 x half> %a,
 ; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX9-SDAG-NEXT:    s_movk_i32 s4, 0x8000
 ; GFX9-SDAG-NEXT:    v_mov_b32_e32 v0, 0x7fff
-; GFX9-SDAG-NEXT:    v_med3_i32 v1, v6, s4, v0
-; GFX9-SDAG-NEXT:    v_med3_i32 v6, v7, s4, v0
-; GFX9-SDAG-NEXT:    v_med3_i32 v4, v4, s4, v0
-; GFX9-SDAG-NEXT:    v_med3_i32 v0, v5, s4, v0
-; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v1, v3, v1
-; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v3, v3, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v4, v2, v4
-; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v0, v2, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
-; GFX9-SDAG-NEXT:    s_mov_b32 s4, 0x5040100
-; GFX9-SDAG-NEXT:    v_perm_b32 v0, v0, v4, s4
-; GFX9-SDAG-NEXT:    v_perm_b32 v1, v3, v1, s4
+; GFX9-SDAG-NEXT:    v_med3_i32 v1, v7, s4, v0
+; GFX9-SDAG-NEXT:    v_med3_i32 v6, v6, s4, v0
+; GFX9-SDAG-NEXT:    v_med3_i32 v5, v5, s4, v0
+; GFX9-SDAG-NEXT:    v_med3_i32 v0, v4, s4, v0
+; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v1, v3, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v3, v3, v6
+; GFX9-SDAG-NEXT:    v_ldexp_f16_sdwa v5, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
+; GFX9-SDAG-NEXT:    v_ldexp_f16_e32 v0, v2, v0
+; GFX9-SDAG-NEXT:    v_pack_b32_f16 v0, v0, v5
+; GFX9-SDAG-NEXT:    v_pack_b32_f16 v1, v3, v1
 ; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-SDAG-TRUE16-LABEL: test_ldexp_v4f16_v4i32:
@@ -390,21 +386,21 @@ define <4 x half> @test_ldexp_v4f16_v4i32(ptr addrspace(1) %out, <4 x half> %a,
 ; GFX11-SDAG-FAKE16:       ; %bb.0:
 ; GFX11-SDAG-FAKE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-SDAG-FAKE16-NEXT:    s_movk_i32 s0, 0x8000
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v0, v6, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v1, v7, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v4, v4, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v1, 16, v3
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v0, v7, s0, 0x7fff
 ; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v5, v5, s0, 0x7fff
-; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v6, 16, v2
-; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v7, 16, v3
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v3, v3, v0
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v0, v2, v4
+; GFX11-SDAG-FAKE16-NEXT:    v_lshrrev_b32_e32 v7, 16, v2
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v4, v4, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    v_med3_i32 v6, v6, s0, 0x7fff
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v1, v0
 ; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v2, v6, v5
-; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v1, v7, v1
-; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX11-SDAG-FAKE16-NEXT:    v_perm_b32 v0, v2, v0, 0x5040100
-; GFX11-SDAG-FAKE16-NEXT:    v_perm_b32 v1, v1, v3, 0x5040100
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v0, v7, v5
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v2, v2, v4
+; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-SDAG-FAKE16-NEXT:    v_ldexp_f16_e32 v3, v3, v6
+; GFX11-SDAG-FAKE16-NEXT:    v_pack_b32_f16 v0, v2, v0
+; GFX11-SDAG-FAKE16-NEXT:    s_delay_alu instid0(VALU_DEP_2)
+; GFX11-SDAG-FAKE16-NEXT:    v_pack_b32_f16 v1, v3, v1
 ; GFX11-SDAG-FAKE16-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-GISEL-LABEL: test_ldexp_v4f16_v4i32:
diff --git a/llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll b/llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll
index 7bf8e8954bd1b..8a70c8a5c5ff7 100644
--- a/llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll
+++ b/llvm/test/CodeGen/AMDGPU/strict_ldexp.f32.ll
@@ -48,26 +48,26 @@ define float @test_ldexp_f32_i32(ptr addrspace(1) %out, float %a, i32 %b) #0 {
 ; }
 
 define <2 x float> @test_ldexp_v2f32_v2i32(ptr addrspace(1) %out, <2 x float> %a, <2 x i32> %b) #0 {
-; GFX6-SDAG-LABEL: test_ldexp_v2f32_v2i32:
-; GFX6-SDAG:       ; %bb.0:
-; GFX6-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v1, v3, v5
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v0, v2, v4
-; GFX6-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX6-LABEL: test_ldexp_v2f32_v2i32:
+; GFX6:       ; %bb.0:
+; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_ldexp_f32_e32 v0, v2, v4
+; GFX6-NEXT:    v_ldexp_f32_e32 v1, v3, v5
+; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX8-SDAG-LABEL: test_ldexp_v2f32_v2i32:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v1, v3, v5
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v0, v2, v4
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX8-LABEL: test_ldexp_v2f32_v2i32:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_ldexp_f32 v0, v2, v4
+; GFX8-NEXT:    v_ldexp_f32 v1, v3, v5
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-SDAG-LABEL: test_ldexp_v2f32_v2i32:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v1, v3, v5
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v0, v2, v4
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: test_ldexp_v2f32_v2i32:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_ldexp_f32 v0, v2, v4
+; GFX9-NEXT:    v_ldexp_f32 v1, v3, v5
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: test_ldexp_v2f32_v2i32:
 ; GFX11:       ; %bb.0:
@@ -75,58 +75,34 @@ define <2 x float> @test_ldexp_v2f32_v2i32(ptr addrspace(1) %out, <2 x float> %a
 ; GFX11-NEXT:    v_ldexp_f32 v0, v2, v4
 ; GFX11-NEXT:    v_ldexp_f32 v1, v3, v5
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX6-GISEL-LABEL: test_ldexp_v2f32_v2i32:
-; GFX6-GISEL:       ; %bb.0:
-; GFX6-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v0, v2, v4
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v1, v3, v5
-; GFX6-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: test_ldexp_v2f32_v2i32:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v0, v2, v4
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v1, v3, v5
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: test_ldexp_v2f32_v2i32:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v0, v2, v4
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v1, v3, v5
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %result = call <2 x float> @llvm.experimental.constrained.ldexp.v2f32.v2i32(<2 x float> %a, <2 x i32> %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <2 x float> %result
 }
 
 define <3 x float> @test_ldexp_v3f32_v3i32(ptr addrspace(1) %out, <3 x float> %a, <3 x i32> %b) #0 {
-; GFX6-SDAG-LABEL: test_ldexp_v3f32_v3i32:
-; GFX6-SDAG:       ; %bb.0:
-; GFX6-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v4, v4, v7
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v1, v3, v6
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v0, v2, v5
-; GFX6-SDAG-NEXT:    v_mov_b32_e32 v2, v4
-; GFX6-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX6-LABEL: test_ldexp_v3f32_v3i32:
+; GFX6:       ; %bb.0:
+; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_ldexp_f32_e32 v0, v2, v5
+; GFX6-NEXT:    v_ldexp_f32_e32 v1, v3, v6
+; GFX6-NEXT:    v_ldexp_f32_e32 v2, v4, v7
+; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX8-SDAG-LABEL: test_ldexp_v3f32_v3i32:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v4, v4, v7
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v1, v3, v6
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v0, v2, v5
-; GFX8-SDAG-NEXT:    v_mov_b32_e32 v2, v4
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX8-LABEL: test_ldexp_v3f32_v3i32:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_ldexp_f32 v0, v2, v5
+; GFX8-NEXT:    v_ldexp_f32 v1, v3, v6
+; GFX8-NEXT:    v_ldexp_f32 v2, v4, v7
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-SDAG-LABEL: test_ldexp_v3f32_v3i32:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v4, v4, v7
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v1, v3, v6
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v0, v2, v5
-; GFX9-SDAG-NEXT:    v_mov_b32_e32 v2, v4
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: test_ldexp_v3f32_v3i32:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_ldexp_f32 v0, v2, v5
+; GFX9-NEXT:    v_ldexp_f32 v1, v3, v6
+; GFX9-NEXT:    v_ldexp_f32 v2, v4, v7
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: test_ldexp_v3f32_v3i32:
 ; GFX11:       ; %bb.0:
@@ -135,67 +111,37 @@ define <3 x float> @test_ldexp_v3f32_v3i32(ptr addrspace(1) %out, <3 x float> %a
 ; GFX11-NEXT:    v_ldexp_f32 v1, v3, v6
 ; GFX11-NEXT:    v_ldexp_f32 v2, v4, v7
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX6-GISEL-LABEL: test_ldexp_v3f32_v3i32:
-; GFX6-GISEL:       ; %bb.0:
-; GFX6-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v0, v2, v5
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v1, v3, v6
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v2, v4, v7
-; GFX6-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: test_ldexp_v3f32_v3i32:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v0, v2, v5
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v1, v3, v6
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v2, v4, v7
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: test_ldexp_v3f32_v3i32:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v0, v2, v5
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v1, v3, v6
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v2, v4, v7
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %result = call <3 x float> @llvm.experimental.constrained.ldexp.v3f32.v3i32(<3 x float> %a, <3 x i32> %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <3 x float> %result
 }
 
 define <4 x float> @test_ldexp_v4f32_v4i32(ptr addrspace(1) %out, <4 x float> %a, <4 x i32> %b) #0 {
-; GFX6-SDAG-LABEL: test_ldexp_v4f32_v4i32:
-; GFX6-SDAG:       ; %bb.0:
-; GFX6-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v5, v5, v9
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v4, v4, v8
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v1, v3, v7
-; GFX6-SDAG-NEXT:    v_ldexp_f32_e32 v0, v2, v6
-; GFX6-SDAG-NEXT:    v_mov_b32_e32 v2, v4
-; GFX6-SDAG-NEXT:    v_mov_b32_e32 v3, v5
-; GFX6-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX6-LABEL: test_ldexp_v4f32_v4i32:
+; GFX6:       ; %bb.0:
+; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_ldexp_f32_e32 v0, v2, v6
+; GFX6-NEXT:    v_ldexp_f32_e32 v1, v3, v7
+; GFX6-NEXT:    v_ldexp_f32_e32 v2, v4, v8
+; GFX6-NEXT:    v_ldexp_f32_e32 v3, v5, v9
+; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX8-SDAG-LABEL: test_ldexp_v4f32_v4i32:
-; GFX8-SDAG:       ; %bb.0:
-; GFX8-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v5, v5, v9
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v4, v4, v8
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v1, v3, v7
-; GFX8-SDAG-NEXT:    v_ldexp_f32 v0, v2, v6
-; GFX8-SDAG-NEXT:    v_mov_b32_e32 v2, v4
-; GFX8-SDAG-NEXT:    v_mov_b32_e32 v3, v5
-; GFX8-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX8-LABEL: test_ldexp_v4f32_v4i32:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_ldexp_f32 v0, v2, v6
+; GFX8-NEXT:    v_ldexp_f32 v1, v3, v7
+; GFX8-NEXT:    v_ldexp_f32 v2, v4, v8
+; GFX8-NEXT:    v_ldexp_f32 v3, v5, v9
+; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX9-SDAG-LABEL: test_ldexp_v4f32_v4i32:
-; GFX9-SDAG:       ; %bb.0:
-; GFX9-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v5, v5, v9
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v4, v4, v8
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v1, v3, v7
-; GFX9-SDAG-NEXT:    v_ldexp_f32 v0, v2, v6
-; GFX9-SDAG-NEXT:    v_mov_b32_e32 v2, v4
-; GFX9-SDAG-NEXT:    v_mov_b32_e32 v3, v5
-; GFX9-SDAG-NEXT:    s_setpc_b64 s[30:31]
+; GFX9-LABEL: test_ldexp_v4f32_v4i32:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_ldexp_f32 v0, v2, v6
+; GFX9-NEXT:    v_ldexp_f32 v1, v3, v7
+; GFX9-NEXT:    v_ldexp_f32 v2, v4, v8
+; GFX9-NEXT:    v_ldexp_f32 v3, v5, v9
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: test_ldexp_v4f32_v4i32:
 ; GFX11:       ; %bb.0:
@@ -205,33 +151,6 @@ define <4 x float> @test_ldexp_v4f32_v4i32(ptr addrspace(1) %out, <4 x float> %a
 ; GFX11-NEXT:    v_ldexp_f32 v2, v4, v8
 ; GFX11-NEXT:    v_ldexp_f32 v3, v5, v9
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX6-GISEL-LABEL: test_ldexp_v4f32_v4i32:
-; GFX6-GISEL:       ; %bb.0:
-; GFX6-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v0, v2, v6
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v1, v3, v7
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v2, v4, v8
-; GFX6-GISEL-NEXT:    v_ldexp_f32_e32 v3, v5, v9
-; GFX6-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX8-GISEL-LABEL: test_ldexp_v4f32_v4i32:
-; GFX8-GISEL:       ; %bb.0:
-; GFX8-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v0, v2, v6
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v1, v3, v7
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v2, v4, v8
-; GFX8-GISEL-NEXT:    v_ldexp_f32 v3, v5, v9
-; GFX8-GISEL-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX9-GISEL-LABEL: test_ldexp_v4f32_v4i32:
-; GFX9-GISEL:       ; %bb.0:
-; GFX9-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v0, v2, v6
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v1, v3, v7
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v2, v4, v8
-; GFX9-GISEL-NEXT:    v_ldexp_f32 v3, v5, v9
-; GFX9-GISEL-NEXT:    s_setpc_b64 s[30:31]
   %result = call <4 x float> @llvm.experimental.constrained.ldexp.v4f32.v4i32(<4 x float> %a, <4 x i32> %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <4 x float> %result
 }
@@ -276,3 +195,9 @@ attributes #1 = { nocallback nofree nosync nounwind willreturn memory(inaccessib
 ; GCN: {{.*}}
 ; GFX11-GISEL: {{.*}}
 ; GFX11-SDAG: {{.*}}
+; GFX6-GISEL: {{.*}}
+; GFX6-SDAG: {{.*}}
+; GFX8-GISEL: {{.*}}
+; GFX8-SDAG: {{.*}}
+; GFX9-GISEL: {{.*}}
+; GFX9-SDAG: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/strictfp_f16_abi_promote.ll b/llvm/test/CodeGen/AMDGPU/strictfp_f16_abi_promote.ll
index ef2a06935f20a..e585b1a3d5a37 100644
--- a/llvm/test/CodeGen/AMDGPU/strictfp_f16_abi_promote.ll
+++ b/llvm/test/CodeGen/AMDGPU/strictfp_f16_abi_promote.ll
@@ -17,7 +17,6 @@ define void @f16_arg(half %arg, ptr %ptr) #0 {
 ; GFX7-LABEL: f16_arg:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    v_and_b32_e32 v0, 0xffff, v0
 ; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
 ; GFX7-NEXT:    flat_store_dword v[1:2], v0
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -31,14 +30,13 @@ define void @v2f16_arg(<2 x half> %arg, ptr %ptr) #0 {
 ; GFX7-LABEL: v2f16_arg:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    v_lshrrev_b32_e32 v3, 16, v0
-; GFX7-NEXT:    v_and_b32_e32 v0, 0xffff, v0
-; GFX7-NEXT:    v_cvt_f32_f16_e32 v5, v3
-; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
-; GFX7-NEXT:    v_add_i32_e32 v3, vcc, 4, v1
-; GFX7-NEXT:    v_addc_u32_e32 v4, vcc, 0, v2, vcc
-; GFX7-NEXT:    flat_store_dword v[3:4], v5
-; GFX7-NEXT:    flat_store_dword v[1:2], v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v3, v0
+; GFX7-NEXT:    v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v4, v0
+; GFX7-NEXT:    v_add_i32_e32 v0, vcc, 4, v1
+; GFX7-NEXT:    flat_store_dword v[1:2], v3
+; GFX7-NEXT:    v_addc_u32_e32 v1, vcc, 0, v2, vcc
+; GFX7-NEXT:    flat_store_dword v[0:1], v4
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
   %fpext = call <2 x float> @llvm.experimental.constrained.fpext.v2f32.v2f16(<2 x half> %arg, metadata !"fpexcept.strict")
@@ -50,19 +48,17 @@ define void @v3f16_arg(<3 x half> %arg, ptr %ptr) #0 {
 ; GFX7-LABEL: v3f16_arg:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT:    v_and_b32_e32 v4, 0xffff, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v6, v1
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v4, v0
 ; GFX7-NEXT:    v_lshrrev_b32_e32 v0, 16, v0
 ; GFX7-NEXT:    v_cvt_f32_f16_e32 v5, v0
-; GFX7-NEXT:    v_and_b32_e32 v0, 0xffff, v1
-; GFX7-NEXT:    v_cvt_f32_f16_e32 v6, v0
 ; GFX7-NEXT:    v_add_i32_e32 v0, vcc, 8, v2
-; GFX7-NEXT:    v_cvt_f32_f16_e32 v4, v4
 ; GFX7-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
 ; GFX7-NEXT:    flat_store_dword v[0:1], v6
 ; GFX7-NEXT:    v_add_i32_e32 v0, vcc, 4, v2
 ; GFX7-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
-; GFX7-NEXT:    flat_store_dword v[0:1], v5
 ; GFX7-NEXT:    flat_store_dword v[2:3], v4
+; GFX7-NEXT:    flat_store_dword v[0:1], v5
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
   %fpext = call <3 x float> @llvm.experimental.constrained.fpext.v3f32.v3f16(<3 x half> %arg, metadata !"fpexcept.strict")
@@ -74,24 +70,22 @@ define void @v4f16_arg(<4 x half> %arg, ptr %ptr) #0 {
 ; GFX7-LABEL: v4f16_arg:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_lshrrev_b32_e32 v5, 16, v0
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v0, v0
 ; GFX7-NEXT:    v_lshrrev_b32_e32 v4, 16, v1
 ; GFX7-NEXT:    v_cvt_f32_f16_e32 v4, v4
-; GFX7-NEXT:    v_and_b32_e32 v1, 0xffff, v1
-; GFX7-NEXT:    v_lshrrev_b32_e32 v5, 16, v0
-; GFX7-NEXT:    v_and_b32_e32 v0, 0xffff, v0
-; GFX7-NEXT:    v_cvt_f32_f16_e32 v6, v0
-; GFX7-NEXT:    v_cvt_f32_f16_e32 v7, v1
+; GFX7-NEXT:    v_cvt_f32_f16_e32 v6, v1
+; GFX7-NEXT:    flat_store_dword v[2:3], v0
 ; GFX7-NEXT:    v_add_i32_e32 v0, vcc, 12, v2
 ; GFX7-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
 ; GFX7-NEXT:    v_cvt_f32_f16_e32 v5, v5
 ; GFX7-NEXT:    flat_store_dword v[0:1], v4
 ; GFX7-NEXT:    v_add_i32_e32 v0, vcc, 8, v2
 ; GFX7-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
-; GFX7-NEXT:    flat_store_dword v[0:1], v7
+; GFX7-NEXT:    flat_store_dword v[0:1], v6
 ; GFX7-NEXT:    v_add_i32_e32 v0, vcc, 4, v2
 ; GFX7-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
 ; GFX7-NEXT:    flat_store_dword v[0:1], v5
-; GFX7-NEXT:    flat_store_dword v[2:3], v6
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
   %fpext = call <4 x float> @llvm.experimental.constrained.fpext.v4f32.v4f16(<4 x half> %arg, metadata !"fpexcept.strict")
diff --git a/llvm/test/CodeGen/ARM/fp-intrinsics-vector.ll b/llvm/test/CodeGen/ARM/fp-intrinsics-vector.ll
index d4b94b97acad8..073bdb75c2688 100644
--- a/llvm/test/CodeGen/ARM/fp-intrinsics-vector.ll
+++ b/llvm/test/CodeGen/ARM/fp-intrinsics-vector.ll
@@ -69,19 +69,7 @@ define <4 x float> @fma_v4f32(<4 x float> %x, <4 x float> %y, <4 x float> %z) #0
 define <4 x i32> @fptosi_v4i32_v4f32(<4 x float> %x) #0 {
 ; CHECK-LABEL: fptosi_v4i32_v4f32:
 ; CHECK:       @ %bb.0:
-; CHECK-NEXT:    vcvt.s32.f32 s4, s2
-; CHECK-NEXT:    vcvt.s32.f32 s6, s0
-; CHECK-NEXT:    vcvt.s32.f32 s0, s1
-; CHECK-NEXT:    vmov r0, s4
-; CHECK-NEXT:    vcvt.s32.f32 s4, s3
-; CHECK-NEXT:    vmov.32 d17[0], r0
-; CHECK-NEXT:    vmov r0, s6
-; CHECK-NEXT:    vmov.32 d16[0], r0
-; CHECK-NEXT:    vmov r0, s4
-; CHECK-NEXT:    vmov.32 d17[1], r0
-; CHECK-NEXT:    vmov r0, s0
-; CHECK-NEXT:    vmov.32 d16[1], r0
-; CHECK-NEXT:    vorr q0, q8, q8
+; CHECK-NEXT:    vcvt.s32.f32 q0, q0
 ; CHECK-NEXT:    bx lr
   %val = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32(<4 x float> %x, metadata !"fpexcept.strict") #0
   ret <4 x i32> %val
@@ -90,19 +78,7 @@ define <4 x i32> @fptosi_v4i32_v4f32(<4 x float> %x) #0 {
 define <4 x i32> @fptoui_v4i32_v4f32(<4 x float> %x) #0 {
 ; CHECK-LABEL: fptoui_v4i32_v4f32:
 ; CHECK:       @ %bb.0:
-; CHECK-NEXT:    vcvt.u32.f32 s4, s2
-; CHECK-NEXT:    vcvt.u32.f32 s6, s0
-; CHECK-NEXT:    vcvt.u32.f32 s0, s1
-; CHECK-NEXT:    vmov r0, s4
-; CHECK-NEXT:    vcvt.u32.f32 s4, s3
-; CHECK-NEXT:    vmov.32 d17[0], r0
-; CHECK-NEXT:    vmov r0, s6
-; CHECK-NEXT:    vmov.32 d16[0], r0
-; CHECK-NEXT:    vmov r0, s4
-; CHECK-NEXT:    vmov.32 d17[1], r0
-; CHECK-NEXT:    vmov r0, s0
-; CHECK-NEXT:    vmov.32 d16[1], r0
-; CHECK-NEXT:    vorr q0, q8, q8
+; CHECK-NEXT:    vcvt.u32.f32 q0, q0
 ; CHECK-NEXT:    bx lr
   %val = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f32(<4 x float> %x, metadata !"fpexcept.strict") #0
   ret <4 x i32> %val
diff --git a/llvm/test/CodeGen/ARM/fp-intrinsics.ll b/llvm/test/CodeGen/ARM/fp-intrinsics.ll
index cb87508d53342..ab6c4b1d17b4a 100644
--- a/llvm/test/CodeGen/ARM/fp-intrinsics.ll
+++ b/llvm/test/CodeGen/ARM/fp-intrinsics.ll
@@ -50,7 +50,7 @@ define float @div_f32(float %x, float %y) #0 {
 }
 
 ; CHECK-LABEL: frem_f32:
-; CHECK: bl fmodf
+; CHECK: {{b|bl}} fmodf
 define float @frem_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.frem.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
@@ -74,7 +74,6 @@ define i32 @fptosi_f32(float %x) #0 {
 
 ; CHECK-LABEL: fptosi_f32_twice:
 ; CHECK-NOSP: bl __aeabi_f2iz
-; CHECK-NOSP: bl __aeabi_f2iz
 ; CHECK-SP: vcvt.s32.f32
 define void @fptosi_f32_twice(float %arg, ptr %ptr) #0 {
 entry:
@@ -96,7 +95,6 @@ define i32 @fptoui_f32(float %x) #0 {
 
 ; CHECK-LABEL: fptoui_f32_twice:
 ; CHECK-NOSP: bl __aeabi_f2uiz
-; CHECK-NOSP: bl __aeabi_f2uiz
 ; FIXME-CHECK-SP: vcvt.u32.f32
 ; FIXME-CHECK-SP: vcvt.u32.f32
 define void @fptoui_f32_twice(float %arg, ptr %ptr) #0 {
@@ -118,70 +116,70 @@ define float @sqrt_f32(float %x) #0 {
 }
 
 ; CHECK-LABEL: powi_f32:
-; CHECK: bl __powisf2
+; CHECK: {{b|bl}} __powisf2
 define float @powi_f32(float %x, i32 %y) #0 {
   %val = call float @llvm.experimental.constrained.powi.f32(float %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: sin_f32:
-; CHECK: bl sinf
+; CHECK: {{b|bl}} sinf
 define float @sin_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.sin.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: cos_f32:
-; CHECK: bl cosf
+; CHECK: {{b|bl}} cosf
 define float @cos_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.cos.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: tan_f32:
-; CHECK: bl tanf
+; CHECK: {{b|bl}} tanf
 define float @tan_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.tan.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: acos_f32:
-; CHECK: bl acosf
+; CHECK: {{b|bl}} acosf
 define float @acos_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.acos.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: asin_f32:
-; CHECK: bl asinf
+; CHECK: {{b|bl}} asinf
 define float @asin_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.asin.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: atan_f32:
-; CHECK: bl atanf
+; CHECK: {{b|bl}} atanf
 define float @atan_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.atan.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: cosh_f32:
-; CHECK: bl coshf
+; CHECK: {{b|bl}} coshf
 define float @cosh_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.cosh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: sinh_f32:
-; CHECK: bl sinhf
+; CHECK: {{b|bl}} sinhf
 define float @sinh_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.sinh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: tanh_f32:
-; CHECK: bl tanhf
+; CHECK: {{b|bl}} tanhf
 define float @tanh_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.tanh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
@@ -197,7 +195,7 @@ define float @fmuladd_f32(float %x, float %y, float %z) #0 {
 }
 
 ; CHECK-LABEL: ldexp_f32:
-; CHECK: bl ldexpf
+; CHECK: {{b|bl}} ldexpf
 define float @ldexp_f32(float %x, i32 %y) #0 {
   %val = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
@@ -220,49 +218,49 @@ define float @uitofp_f32_i32(i32 %x) #0 {
 }
 
 ; CHECK-LABEL: atan2_f32:
-; CHECK: bl atan2f
+; CHECK: {{b|bl}} atan2f
 define float @atan2_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.atan2.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: pow_f32:
-; CHECK: bl powf
+; CHECK: {{b|bl}} powf
 define float @pow_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.pow.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: log_f32:
-; CHECK: bl logf
+; CHECK: {{b|bl}} logf
 define float @log_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.log.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: log10_f32:
-; CHECK: bl log10f
+; CHECK: {{b|bl}} log10f
 define float @log10_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.log10.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: log2_f32:
-; CHECK: bl log2f
+; CHECK: {{b|bl}} log2f
 define float @log2_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.log2.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: exp_f32:
-; CHECK: bl expf
+; CHECK: {{b|bl}} expf
 define float @exp_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.exp.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 ; CHECK-LABEL: exp2_f32:
-; CHECK: bl exp2f
+; CHECK: {{b|bl}} exp2f
 define float @exp2_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.exp2.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
@@ -270,7 +268,7 @@ define float @exp2_f32(float %x) #0 {
 
 ; CHECK-LABEL: rint_f32:
 ; CHECK-NOSP: bl rintf
-; CHECK-SP-NOV8: bl rintf
+; CHECK-SP-NOV8: {{b|bl}} rintf
 ; CHECK-SP-V8: vrintx.f32
 define float @rint_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.rint.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -279,7 +277,7 @@ define float @rint_f32(float %x) #0 {
 
 ; CHECK-LABEL: nearbyint_f32:
 ; CHECK-NOSP: bl nearbyintf
-; CHECK-SP-NOV8: bl nearbyintf
+; CHECK-SP-NOV8: {{b|bl}} nearbyintf
 ; CHECK-SP-V8: vrintr.f32
 define float @nearbyint_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.nearbyint.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -287,14 +285,14 @@ define float @nearbyint_f32(float %x) #0 {
 }
 
 ; CHECK-LABEL: lrint_f32:
-; CHECK: bl lrintf
+; CHECK: {{b|bl}} lrintf
 define i32 @lrint_f32(float %x) #0 {
   %val = call i32 @llvm.experimental.constrained.lrint.i32.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 ; CHECK-LABEL: llrint_f32:
-; CHECK: bl llrintf
+; CHECK: {{b|bl}} llrintf
 define i32 @llrint_f32(float %x) #0 {
   %val = call i32 @llvm.experimental.constrained.llrint.i32.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
@@ -302,7 +300,7 @@ define i32 @llrint_f32(float %x) #0 {
 
 ; CHECK-LABEL: maxnum_f32:
 ; CHECK-NOSP: bl fmaxf
-; CHECK-SP-NOV8: bl fmaxf
+; CHECK-SP-NOV8: {{b|bl}} fmaxf
 ; CHECK-SP-V8: vmaxnm.f32
 define float @maxnum_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.maxnum.f32(float %x, float %y, metadata !"fpexcept.strict") #0
@@ -311,7 +309,7 @@ define float @maxnum_f32(float %x, float %y) #0 {
 
 ; CHECK-LABEL: minnum_f32:
 ; CHECK-NOSP: bl fminf
-; CHECK-SP-NOV8: bl fminf
+; CHECK-SP-NOV8: {{b|bl}} fminf
 ; CHECK-SP-V8: vminnm.f32
 define float @minnum_f32(float %x, float %y) #0 {
   %val = call float @llvm.experimental.constrained.minnum.f32(float %x, float %y, metadata !"fpexcept.strict") #0
@@ -320,7 +318,7 @@ define float @minnum_f32(float %x, float %y) #0 {
 
 ; CHECK-LABEL: ceil_f32:
 ; CHECK-NOSP: bl ceilf
-; CHECK-SP-NOV8: bl ceilf
+; CHECK-SP-NOV8: {{b|bl}} ceilf
 ; CHECK-SP-V8: vrintp.f32
 define float @ceil_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.ceil.f32(float %x, metadata !"fpexcept.strict") #0
@@ -329,7 +327,7 @@ define float @ceil_f32(float %x) #0 {
 
 ; CHECK-LABEL: floor_f32:
 ; CHECK-NOSP: bl floorf
-; CHECK-SP-NOV8: bl floorf
+; CHECK-SP-NOV8: {{b|bl}} floorf
 ; CHECK-SP-V8: vrintm.f32
 define float @floor_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.floor.f32(float %x, metadata !"fpexcept.strict") #0
@@ -337,14 +335,14 @@ define float @floor_f32(float %x) #0 {
 }
 
 ; CHECK-LABEL: lround_f32:
-; CHECK: bl lroundf
+; CHECK: {{b|bl}} lroundf
 define i32 @lround_f32(float %x) #0 {
   %val = call i32 @llvm.experimental.constrained.lround.i32.f32(float %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 ; CHECK-LABEL: llround_f32:
-; CHECK: bl llroundf
+; CHECK: {{b|bl}} llroundf
 define i32 @llround_f32(float %x) #0 {
   %val = call i32 @llvm.experimental.constrained.llround.i32.f32(float %x, metadata !"fpexcept.strict") #0
   ret i32 %val
@@ -352,7 +350,7 @@ define i32 @llround_f32(float %x) #0 {
 
 ; CHECK-LABEL: round_f32:
 ; CHECK-NOSP: bl roundf
-; CHECK-SP-NOV8: bl roundf
+; CHECK-SP-NOV8: {{b|bl}} roundf
 ; CHECK-SP-V8: vrinta.f32
 define float @round_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.round.f32(float %x, metadata !"fpexcept.strict") #0
@@ -361,7 +359,7 @@ define float @round_f32(float %x) #0 {
 
 ; CHECK-LABEL: trunc_f32:
 ; CHECK-NOSP: bl truncf
-; CHECK-SP-NOV8: bl truncf
+; CHECK-SP-NOV8: {{b|bl}} truncf
 ; CHECK-SP-V8: vrintz.f32
 define float @trunc_f32(float %x) #0 {
   %val = call float @llvm.experimental.constrained.trunc.f32(float %x, metadata !"fpexcept.strict") #0
@@ -592,7 +590,7 @@ define i32 @fcmps_une_f32(float %a, float %b) #0 {
 ; Double-precision intrinsics
 
 ; CHECK-LABEL: add_f64:
-; CHECK-NODP: bl __aeabi_dadd
+; CHECK-NODP: {{b|bl}} __aeabi_dadd
 ; CHECK-DP: vadd.f64
 define double @add_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.fadd.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -600,7 +598,7 @@ define double @add_f64(double %x, double %y) #0 {
 }
 
 ; CHECK-LABEL: sub_f64:
-; CHECK-NODP: bl __aeabi_dsub
+; CHECK-NODP: {{b|bl}} __aeabi_dsub
 ; CHECK-DP: vsub.f64
 define double @sub_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.fsub.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -608,7 +606,7 @@ define double @sub_f64(double %x, double %y) #0 {
 }
 
 ; CHECK-LABEL: mul_f64:
-; CHECK-NODP: bl __aeabi_dmul
+; CHECK-NODP: {{b|bl}} __aeabi_dmul
 ; CHECK-DP: vmul.f64
 define double @mul_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.fmul.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -616,7 +614,7 @@ define double @mul_f64(double %x, double %y) #0 {
 }
 
 ; CHECK-LABEL: div_f64:
-; CHECK-NODP: bl __aeabi_ddiv
+; CHECK-NODP: {{b|bl}} __aeabi_ddiv
 ; CHECK-DP: vdiv.f64
 define double @div_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.fdiv.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -624,14 +622,14 @@ define double @div_f64(double %x, double %y) #0 {
 }
 
 ; CHECK-LABEL: frem_f64:
-; CHECK: bl fmod
+; CHECK: {{b|bl}} fmod
 define double @frem_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.frem.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: fma_f64:
-; CHECK-NODP: bl fma
+; CHECK-NODP: {{b|bl}} fma
 ; CHECK-DP: vfma.f64
 define double @fma_f64(double %x, double %y, double %z) #0 {
   %val = call double @llvm.experimental.constrained.fma.f64(double %x, double %y, double %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -639,7 +637,7 @@ define double @fma_f64(double %x, double %y, double %z) #0 {
 }
 
 ; CHECK-LABEL: fptosi_f64:
-; CHECK-NODP: bl __aeabi_d2iz
+; CHECK-NODP: {{b|bl}} __aeabi_d2iz
 ; CHECK-DP: vcvt.s32.f64
 define i32 @fptosi_f64(double %x) #0 {
   %val = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %x, metadata !"fpexcept.strict") #0
@@ -647,7 +645,7 @@ define i32 @fptosi_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: fptoui_f64:
-; CHECK-NODP: bl __aeabi_d2uiz
+; CHECK-NODP: {{b|bl}} __aeabi_d2uiz
 ; FIXME-CHECK-DP: vcvt.u32.f64
 define i32 @fptoui_f64(double %x) #0 {
   %val = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %x, metadata !"fpexcept.strict") #0
@@ -655,7 +653,7 @@ define i32 @fptoui_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: sqrt_f64:
-; CHECK-NODP: bl sqrt
+; CHECK-NODP: {{b|bl}} sqrt
 ; CHECK-DP: vsqrt.f64
 define double @sqrt_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.sqrt.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -663,70 +661,70 @@ define double @sqrt_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: powi_f64:
-; CHECK: bl __powidf2
+; CHECK: {{b|bl}} __powidf2
 define double @powi_f64(double %x, i32 %y) #0 {
   %val = call double @llvm.experimental.constrained.powi.f64(double %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: sin_f64:
-; CHECK: bl sin
+; CHECK: {{b|bl}} sin
 define double @sin_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.sin.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: cos_f64:
-; CHECK: bl cos
+; CHECK: {{b|bl}} cos
 define double @cos_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.cos.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: tan_f64:
-; CHECK: bl tan
+; CHECK: {{b|bl}} tan
 define double @tan_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.tan.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: acos_f64:
-; CHECK: bl acos
+; CHECK: {{b|bl}} acos
 define double @acos_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.acos.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: asin_f64:
-; CHECK: bl asin
+; CHECK: {{b|bl}} asin
 define double @asin_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.asin.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: atan_f64:
-; CHECK: bl atan
+; CHECK: {{b|bl}} atan
 define double @atan_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.atan.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: cosh_f64:
-; CHECK: bl cosh
+; CHECK: {{b|bl}} cosh
 define double @cosh_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.cosh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: sinh_f64:
-; CHECK: bl sinh
+; CHECK: {{b|bl}} sinh
 define double @sinh_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.sinh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: tanh_f64:
-; CHECK: bl tanh
+; CHECK: {{b|bl}} tanh
 define double @tanh_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.tanh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
@@ -734,15 +732,15 @@ define double @tanh_f64(double %x, double %y) #0 {
 
 ; CHECK-LABEL: fmuladd_f64:
 ; CHECK-DP: vfma.f64
-; CHECK-NODP: bl __aeabi_dmul
-; CHECK-NODP: bl __aeabi_dadd
+; CHECK-NODP: {{b|bl}} __aeabi_dmul
+; CHECK-NODP: {{b|bl}} __aeabi_dadd
 define double @fmuladd_f64(double %x, double %y, double %z) #0 {
   %val = call double @llvm.experimental.constrained.fmuladd.f64(double %x, double %y, double %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: ldexp_f64:
-; CHECK: bl ldexp
+; CHECK: {{b|bl}} ldexp
 define double @ldexp_f64(double %x, i32 %y) #0 {
   %val = call double @llvm.experimental.constrained.ldexp.f64.i32(double %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
@@ -750,7 +748,7 @@ define double @ldexp_f64(double %x, i32 %y) #0 {
 
 ; CHECK-LABEL: roundeven_f64:
 ; CHECK-DP-V8: vrintn.f64
-; CHECK-NODP: bl roundeven
+; CHECK-NODP: {{b|bl}} roundeven
 define double @roundeven_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.roundeven.f64(double %x, metadata !"fpexcept.strict") #0
   ret double %val
@@ -765,57 +763,57 @@ define double @uitofp_f64_i32(i32 %x) #0 {
 }
 
 ; CHECK-LABEL: atan2_f64:
-; CHECK: bl atan2
+; CHECK: {{b|bl}} atan2
 define double @atan2_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.atan2.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: pow_f64:
-; CHECK: bl pow
+; CHECK: {{b|bl}} pow
 define double @pow_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.pow.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: log_f64:
-; CHECK: bl log
+; CHECK: {{b|bl}} log
 define double @log_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.log.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: log10_f64:
-; CHECK: bl log10
+; CHECK: {{b|bl}} log10
 define double @log10_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.log10.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: log2_f64:
-; CHECK: bl log2
+; CHECK: {{b|bl}} log2
 define double @log2_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.log2.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: exp_f64:
-; CHECK: bl exp
+; CHECK: {{b|bl}} exp
 define double @exp_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.exp.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: exp2_f64:
-; CHECK: bl exp2
+; CHECK: {{b|bl}} exp2
 define double @exp2_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.exp2.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 ; CHECK-LABEL: rint_f64:
-; CHECK-NODP: bl rint
-; CHECK-DP-NOV8: bl rint
+; CHECK-NODP: {{b|bl}} rint
+; CHECK-DP-NOV8: {{b|bl}} rint
 ; CHECK-DP-V8: vrintx.f64
 define double @rint_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.rint.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -823,8 +821,8 @@ define double @rint_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: nearbyint_f64:
-; CHECK-NODP: bl nearbyint
-; CHECK-DP-NOV8: bl nearbyint
+; CHECK-NODP: {{b|bl}} nearbyint
+; CHECK-DP-NOV8: {{b|bl}} nearbyint
 ; CHECK-DP-V8: vrintr.f64
 define double @nearbyint_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.nearbyint.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -832,22 +830,22 @@ define double @nearbyint_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: lrint_f64:
-; CHECK: bl lrint
+; CHECK: {{b|bl}} {{l?}}rint
 define i32 @lrint_f64(double %x) #0 {
   %val = call i32 @llvm.experimental.constrained.lrint.i32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 ; CHECK-LABEL: llrint_f64:
-; CHECK: bl llrint
+; CHECK: {{b|bl}} {{l?l?}}rint
 define i32 @llrint_f64(double %x) #0 {
   %val = call i32 @llvm.experimental.constrained.llrint.i32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 ; CHECK-LABEL: maxnum_f64:
-; CHECK-NODP: bl fmax
-; CHECK-DP-NOV8: bl fmax
+; CHECK-NODP: {{b|bl}} fmax
+; CHECK-DP-NOV8: {{b|bl}} fmax
 ; CHECK-DP-V8: vmaxnm.f64
 define double @maxnum_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.maxnum.f64(double %x, double %y, metadata !"fpexcept.strict") #0
@@ -855,8 +853,8 @@ define double @maxnum_f64(double %x, double %y) #0 {
 }
 
 ; CHECK-LABEL: minnum_f64:
-; CHECK-NODP: bl fmin
-; CHECK-DP-NOV8: bl fmin
+; CHECK-NODP: {{b|bl}} fmin
+; CHECK-DP-NOV8: {{b|bl}} fmin
 ; CHECK-DP-V8: vminnm.f64
 define double @minnum_f64(double %x, double %y) #0 {
   %val = call double @llvm.experimental.constrained.minnum.f64(double %x, double %y, metadata !"fpexcept.strict") #0
@@ -864,8 +862,8 @@ define double @minnum_f64(double %x, double %y) #0 {
 }
 
 ; CHECK-LABEL: ceil_f64:
-; CHECK-NODP: bl ceil
-; CHECK-DP-NOV8: bl ceil
+; CHECK-NODP: {{b|bl}} ceil
+; CHECK-DP-NOV8: {{b|bl}} ceil
 ; CHECK-DP-V8: vrintp.f64
 define double @ceil_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.ceil.f64(double %x, metadata !"fpexcept.strict") #0
@@ -873,8 +871,8 @@ define double @ceil_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: floor_f64:
-; CHECK-NODP: bl floor
-; CHECK-DP-NOV8: bl floor
+; CHECK-NODP: {{b|bl}} floor
+; CHECK-DP-NOV8: {{b|bl}} floor
 ; CHECK-DP-V8: vrintm.f64
 define double @floor_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.floor.f64(double %x, metadata !"fpexcept.strict") #0
@@ -882,22 +880,22 @@ define double @floor_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: lround_f64:
-; CHECK: bl lround
+; CHECK: {{b|bl}} {{l?}}round
 define i32 @lround_f64(double %x) #0 {
   %val = call i32 @llvm.experimental.constrained.lround.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 ; CHECK-LABEL: llround_f64:
-; CHECK: bl llround
+; CHECK: {{b|bl}} {{l?l?}}round
 define i32 @llround_f64(double %x) #0 {
   %val = call i32 @llvm.experimental.constrained.llround.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 ; CHECK-LABEL: round_f64:
-; CHECK-NODP: bl round
-; CHECK-DP-NOV8: bl round
+; CHECK-NODP: {{b|bl}} round
+; CHECK-DP-NOV8: {{b|bl}} round
 ; CHECK-DP-V8: vrinta.f64
 define double @round_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.round.f64(double %x, metadata !"fpexcept.strict") #0
@@ -905,8 +903,8 @@ define double @round_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: trunc_f64:
-; CHECK-NODP: bl trunc
-; CHECK-DP-NOV8: bl trunc
+; CHECK-NODP: {{b|bl}} trunc
+; CHECK-DP-NOV8: {{b|bl}} trunc
 ; CHECK-DP-V8: vrintz.f64
 define double @trunc_f64(double %x) #0 {
   %val = call double @llvm.experimental.constrained.trunc.f64(double %x, metadata !"fpexcept.strict") #0
@@ -914,7 +912,7 @@ define double @trunc_f64(double %x) #0 {
 }
 
 ; CHECK-LABEL: fcmp_olt_f64:
-; CHECK-NODP: bl __aeabi_dcmplt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmplt
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_olt_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"olt", metadata !"fpexcept.strict") #0
@@ -923,7 +921,7 @@ define i32 @fcmp_olt_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_ole_f64:
-; CHECK-NODP: bl __aeabi_dcmple
+; CHECK-NODP: {{b|bl}} __aeabi_dcmple
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_ole_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"ole", metadata !"fpexcept.strict") #0
@@ -932,7 +930,7 @@ define i32 @fcmp_ole_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_ogt_f64:
-; CHECK-NODP: bl __aeabi_dcmpgt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpgt
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_ogt_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"ogt", metadata !"fpexcept.strict") #0
@@ -941,7 +939,7 @@ define i32 @fcmp_ogt_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_oge_f64:
-; CHECK-NODP: bl __aeabi_dcmpge
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpge
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_oge_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oge", metadata !"fpexcept.strict") #0
@@ -950,7 +948,7 @@ define i32 @fcmp_oge_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_oeq_f64:
-; CHECK-NODP: bl __aeabi_dcmpeq
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpeq
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_oeq_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.strict") #0
@@ -969,7 +967,7 @@ define i32 @fcmp_one_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_ult_f64:
-; CHECK-NODP: bl __aeabi_dcmpge
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpge
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_ult_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"ult", metadata !"fpexcept.strict") #0
@@ -978,7 +976,7 @@ define i32 @fcmp_ult_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_ule_f64:
-; CHECK-NODP: bl __aeabi_dcmpgt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpgt
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_ule_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"ule", metadata !"fpexcept.strict") #0
@@ -987,7 +985,7 @@ define i32 @fcmp_ule_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_ugt_f64:
-; CHECK-NODP: bl __aeabi_dcmple
+; CHECK-NODP: {{b|bl}} __aeabi_dcmple
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_ugt_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"ugt", metadata !"fpexcept.strict") #0
@@ -996,7 +994,7 @@ define i32 @fcmp_ugt_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_uge_f64:
-; CHECK-NODP: bl __aeabi_dcmplt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmplt
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_uge_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"uge", metadata !"fpexcept.strict") #0
@@ -1015,7 +1013,7 @@ define i32 @fcmp_ueq_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmp_une_f64:
-; CHECK-NODP: bl __aeabi_dcmpeq
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpeq
 ; CHECK-DP: vcmp.f64
 define i32 @fcmp_une_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"une", metadata !"fpexcept.strict") #0
@@ -1024,7 +1022,7 @@ define i32 @fcmp_une_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_olt_f64:
-; CHECK-NODP: bl __aeabi_dcmplt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmplt
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_olt_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"olt", metadata !"fpexcept.strict") #0
@@ -1033,7 +1031,7 @@ define i32 @fcmps_olt_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_ole_f64:
-; CHECK-NODP: bl __aeabi_dcmple
+; CHECK-NODP: {{b|bl}} __aeabi_dcmple
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_ole_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"ole", metadata !"fpexcept.strict") #0
@@ -1042,7 +1040,7 @@ define i32 @fcmps_ole_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_ogt_f64:
-; CHECK-NODP: bl __aeabi_dcmpgt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpgt
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_ogt_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"ogt", metadata !"fpexcept.strict") #0
@@ -1051,7 +1049,7 @@ define i32 @fcmps_ogt_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_oge_f64:
-; CHECK-NODP: bl __aeabi_dcmpge
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpge
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_oge_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"oge", metadata !"fpexcept.strict") #0
@@ -1060,7 +1058,7 @@ define i32 @fcmps_oge_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_oeq_f64:
-; CHECK-NODP: bl __aeabi_dcmpeq
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpeq
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_oeq_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.strict") #0
@@ -1079,7 +1077,7 @@ define i32 @fcmps_one_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_ult_f64:
-; CHECK-NODP: bl __aeabi_dcmpge
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpge
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_ult_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"ult", metadata !"fpexcept.strict") #0
@@ -1088,7 +1086,7 @@ define i32 @fcmps_ult_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_ule_f64:
-; CHECK-NODP: bl __aeabi_dcmpgt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpgt
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_ule_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"ule", metadata !"fpexcept.strict") #0
@@ -1097,7 +1095,7 @@ define i32 @fcmps_ule_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_ugt_f64:
-; CHECK-NODP: bl __aeabi_dcmple
+; CHECK-NODP: {{b|bl}} __aeabi_dcmple
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_ugt_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"ugt", metadata !"fpexcept.strict") #0
@@ -1106,7 +1104,7 @@ define i32 @fcmps_ugt_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_uge_f64:
-; CHECK-NODP: bl __aeabi_dcmplt
+; CHECK-NODP: {{b|bl}} __aeabi_dcmplt
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_uge_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"uge", metadata !"fpexcept.strict") #0
@@ -1125,7 +1123,7 @@ define i32 @fcmps_ueq_f64(double %a, double %b) #0 {
 }
 
 ; CHECK-LABEL: fcmps_une_f64:
-; CHECK-NODP: bl __aeabi_dcmpeq
+; CHECK-NODP: {{b|bl}} __aeabi_dcmpeq
 ; CHECK-DP: vcmpe.f64
 define i32 @fcmps_une_f64(double %a, double %b) #0 {
   %cmp = call i1 @llvm.experimental.constrained.fcmps.f64(double %a, double %b, metadata !"une", metadata !"fpexcept.strict") #0
@@ -1137,7 +1135,7 @@ define i32 @fcmps_une_f64(double %a, double %b) #0 {
 ; Single/Double conversion intrinsics
 
 ; CHECK-LABEL: fptrunc_f32:
-; CHECK-NODP: bl __aeabi_d2f
+; CHECK-NODP: {{b|bl}} __aeabi_d2f
 ; CHECK-DP: vcvt.f32.f64
 define float @fptrunc_f32(double %x) #0 {
   %val = call float @llvm.experimental.constrained.fptrunc.f32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -1145,7 +1143,7 @@ define float @fptrunc_f32(double %x) #0 {
 }
 
 ; CHECK-LABEL: fpext_f32:
-; CHECK-NODP: bl __aeabi_f2d
+; CHECK-NODP: {{b|bl}} __aeabi_f2d
 ; CHECK-DP: vcvt.f64.f32
 define double @fpext_f32(float %x) #0 {
   %val = call double @llvm.experimental.constrained.fpext.f64.f32(float %x, metadata !"fpexcept.strict") #0
@@ -1153,8 +1151,7 @@ define double @fpext_f32(float %x) #0 {
 }
 
 ; CHECK-LABEL: fpext_f32_twice:
-; CHECK-NODP: bl __aeabi_f2d
-; CHECK-NODP: bl __aeabi_f2d
+; CHECK-NODP: {{b|bl}} __aeabi_f2d
 ; CHECK-DP: vcvt.f64.f32
 ; FIXME-CHECK-DP: vcvt.f64.f32
 define void @fpext_f32_twice(float %arg, ptr %ptr) #0 {
diff --git a/llvm/test/CodeGen/ARM/fp16-fullfp16.ll b/llvm/test/CodeGen/ARM/fp16-fullfp16.ll
index 7b9474313e5bf..312bf68d83a3d 100644
--- a/llvm/test/CodeGen/ARM/fp16-fullfp16.ll
+++ b/llvm/test/CodeGen/ARM/fp16-fullfp16.ll
@@ -925,8 +925,8 @@ define half @atan2_f16(half %x, half %y) #0 {
 ; CHECK-LABEL: atan2_f16:
 ; CHECK:         .save {r11, lr}
 ; CHECK-NEXT:    push {r11, lr}
-; CHECK-NEXT:    vcvtb.f32.f16 s1, s1
 ; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
+; CHECK-NEXT:    vcvtb.f32.f16 s1, s1
 ; CHECK-NEXT:    bl atan2f
 ; CHECK-NEXT:    vcvtb.f16.f32 s0, s0
 ; CHECK-NEXT:    pop {r11, pc}
@@ -974,8 +974,8 @@ define half @pow_f16(half %x, half %y) #0 {
 ; CHECK-LABEL: pow_f16:
 ; CHECK:         .save {r11, lr}
 ; CHECK-NEXT:    push {r11, lr}
-; CHECK-NEXT:    vcvtb.f32.f16 s1, s1
 ; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
+; CHECK-NEXT:    vcvtb.f32.f16 s1, s1
 ; CHECK-NEXT:    bl powf
 ; CHECK-NEXT:    vcvtb.f16.f32 s0, s0
 ; CHECK-NEXT:    pop {r11, pc}
@@ -1061,11 +1061,10 @@ define half @nearbyint_f16(half %x) #0 {
 
 define i32 @lrint_f16(half %x) #0 {
 ; CHECK-LABEL: lrint_f16:
-; CHECK:         .save {r11, lr}
-; CHECK-NEXT:    push {r11, lr}
-; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
-; CHECK-NEXT:    bl lrintf
-; CHECK-NEXT:    pop {r11, pc}
+; CHECK:         vrintx.f16 s0, s0
+; CHECK-NEXT:    vcvt.s32.f16 s0, s0
+; CHECK-NEXT:    vmov r0, s0
+; CHECK-NEXT:    bx lr
   %val = call i32 @llvm.experimental.constrained.lrint.i32.f16(half %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
@@ -1115,11 +1114,9 @@ define half @floor_f16(half %x) #0 {
 
 define i32 @lround_f16(half %x) #0 {
 ; CHECK-LABEL: lround_f16:
-; CHECK:         .save {r11, lr}
-; CHECK-NEXT:    push {r11, lr}
-; CHECK-NEXT:    vcvtb.f32.f16 s0, s0
-; CHECK-NEXT:    bl lroundf
-; CHECK-NEXT:    pop {r11, pc}
+; CHECK:         vcvta.s32.f16 s0, s0
+; CHECK-NEXT:    vmov r0, s0
+; CHECK-NEXT:    bx lr
   %val = call i32 @llvm.experimental.constrained.lround.i32.f16(half %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
diff --git a/llvm/test/CodeGen/ARM/strict-fp-ops.ll b/llvm/test/CodeGen/ARM/strict-fp-ops.ll
index 608ab0716e0df..8a35d74a6a113 100644
--- a/llvm/test/CodeGen/ARM/strict-fp-ops.ll
+++ b/llvm/test/CodeGen/ARM/strict-fp-ops.ll
@@ -66,13 +66,13 @@ if.end:
 define float @add_twice_fpexcept_strict(float %x, float %y, i32 %n) #0 {
 ; CHECK-LABEL: add_twice_fpexcept_strict:
 ; CHECK:       @ %bb.0: @ %entry
-; CHECK-NEXT:    vmov s2, r1
+; CHECK-NEXT:    vmov s0, r1
 ; CHECK-NEXT:    cmp r2, #0
-; CHECK-NEXT:    vmov s4, r0
-; CHECK-NEXT:    vadd.f32 s0, s4, s2
-; CHECK-NEXT:    vaddne.f32 s2, s4, s2
-; CHECK-NEXT:    vmulne.f32 s0, s0, s2
-; CHECK-NEXT:    vmov r0, s0
+; CHECK-NEXT:    vmov s2, r0
+; CHECK-NEXT:    vadd.f32 s0, s2, s0
+; CHECK-NEXT:    vmul.f32 s2, s0, s0
+; CHECK-NEXT:    vmoveq.f32 s2, s0
+; CHECK-NEXT:    vmov r0, s2
 ; CHECK-NEXT:    bx lr
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -96,8 +96,9 @@ define float @add_twice_round_dynamic(float %x, float %y, i32 %n) #0 {
 ; CHECK-NEXT:    cmp r2, #0
 ; CHECK-NEXT:    vmov s2, r0
 ; CHECK-NEXT:    vadd.f32 s0, s2, s0
-; CHECK-NEXT:    vmulne.f32 s0, s0, s0
-; CHECK-NEXT:    vmov r0, s0
+; CHECK-NEXT:    vmul.f32 s2, s0, s0
+; CHECK-NEXT:    vmoveq.f32 s2, s0
+; CHECK-NEXT:    vmov r0, s2
 ; CHECK-NEXT:    bx lr
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/CodeGen/ARM/strictfp_f16_abi_promote.ll b/llvm/test/CodeGen/ARM/strictfp_f16_abi_promote.ll
index 5906c796d2751..bef3422f63207 100644
--- a/llvm/test/CodeGen/ARM/strictfp_f16_abi_promote.ll
+++ b/llvm/test/CodeGen/ARM/strictfp_f16_abi_promote.ll
@@ -17,7 +17,6 @@ define void @f16_arg(half %arg, ptr %ptr) #0 {
 ; NOFP16-LABEL: f16_arg:
 ; NOFP16:       @ %bb.0:
 ; NOFP16-NEXT:    push {r4, lr}
-; NOFP16-NEXT:    uxth r0, r0
 ; NOFP16-NEXT:    mov r4, r1
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    str r0, [r4]
@@ -33,12 +32,11 @@ define void @v2f16_arg(<2 x half> %arg, ptr %ptr) #0 {
 ; NOFP16-NEXT:    push {r4, r5, r11, lr}
 ; NOFP16-NEXT:    vpush {d8}
 ; NOFP16-NEXT:    mov r5, r0
-; NOFP16-NEXT:    uxth r0, r1
+; NOFP16-NEXT:    mov r0, r1
 ; NOFP16-NEXT:    mov r4, r2
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
-; NOFP16-NEXT:    uxth r1, r5
 ; NOFP16-NEXT:    vmov s17, r0
-; NOFP16-NEXT:    mov r0, r1
+; NOFP16-NEXT:    mov r0, r5
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    vmov s16, r0
 ; NOFP16-NEXT:    vstr d8, [r4]
@@ -55,16 +53,15 @@ define void @v3f16_arg(<3 x half> %arg, ptr %ptr) #0 {
 ; NOFP16-NEXT:    push {r4, r5, r6, lr}
 ; NOFP16-NEXT:    vpush {d8}
 ; NOFP16-NEXT:    mov r6, r0
-; NOFP16-NEXT:    uxth r0, r1
+; NOFP16-NEXT:    mov r0, r1
 ; NOFP16-NEXT:    mov r4, r3
 ; NOFP16-NEXT:    mov r5, r2
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
-; NOFP16-NEXT:    uxth r1, r6
 ; NOFP16-NEXT:    vmov s17, r0
-; NOFP16-NEXT:    mov r0, r1
+; NOFP16-NEXT:    mov r0, r6
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    vmov s16, r0
-; NOFP16-NEXT:    uxth r0, r5
+; NOFP16-NEXT:    mov r0, r5
 ; NOFP16-NEXT:    vst1.32 {d8}, [r4:64]!
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    str r0, [r4]
@@ -81,19 +78,19 @@ define void @v4f16_arg(<4 x half> %arg, ptr %ptr) #0 {
 ; NOFP16-NEXT:    push {r4, r5, r6, r7, r11, lr}
 ; NOFP16-NEXT:    vpush {d8, d9}
 ; NOFP16-NEXT:    mov r6, r0
-; NOFP16-NEXT:    uxth r0, r1
+; NOFP16-NEXT:    mov r0, r1
 ; NOFP16-NEXT:    mov r4, r3
 ; NOFP16-NEXT:    mov r5, r2
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    mov r7, r0
-; NOFP16-NEXT:    uxth r0, r4
+; NOFP16-NEXT:    mov r0, r4
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    vmov s19, r0
-; NOFP16-NEXT:    uxth r0, r5
+; NOFP16-NEXT:    mov r0, r5
 ; NOFP16-NEXT:    ldr r4, [sp, #40]
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    vmov s18, r0
-; NOFP16-NEXT:    uxth r0, r6
+; NOFP16-NEXT:    mov r0, r6
 ; NOFP16-NEXT:    vmov s17, r7
 ; NOFP16-NEXT:    bl __gnu_h2f_ieee
 ; NOFP16-NEXT:    vmov s16, r0
diff --git a/llvm/test/CodeGen/Mips/fp-intrinsics.ll b/llvm/test/CodeGen/Mips/fp-intrinsics.ll
index 66f966c3e4bf6..68cba76fca3e0 100644
--- a/llvm/test/CodeGen/Mips/fp-intrinsics.ll
+++ b/llvm/test/CodeGen/Mips/fp-intrinsics.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc -mtriple=mips -mcpu=mips32r2 < %s | FileCheck %s -check-prefixes=CHECK,CHECK-R2
 ; RUN: llc -mtriple=mips -mcpu=mips32r6 < %s | FileCheck %s -check-prefixes=CHECK,CHECK-R6
 ; RUN: llc -mtriple=mips -mcpu=mips32r2 -mattr=+fp64,+fpxx -o - %s | FileCheck %s -check-prefixes=CHECK,CHECK-R2
@@ -6,302 +7,661 @@
 ; Single-precision intrinsics
 
 define float @add_f32(float %x, float %y) #0 {
-; CHECK-LABEL: add_f32:
-; CHECK: add.s
+; CHECK-R6-LABEL: add_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    add.s $f0, $f12, $f14
   %val = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @sub_f32(float %x, float %y) #0 {
-; CHECK-LABEL: sub_f32:
-; CHECK: sub.s
+; CHECK-R6-LABEL: sub_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    sub.s $f0, $f12, $f14
   %val = call float @llvm.experimental.constrained.fsub.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @mul_f32(float %x, float %y) #0 {
-; CHECK-LABEL: mul_f32:
-; CHECK: mul.s
+; CHECK-R6-LABEL: mul_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    mul.s $f0, $f12, $f14
   %val = call float @llvm.experimental.constrained.fmul.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @div_f32(float %x, float %y) #0 {
-; CHECK-LABEL: div_f32:
-; CHECK: div.s
+; CHECK-R6-LABEL: div_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    div.s $f0, $f12, $f14
   %val = call float @llvm.experimental.constrained.fdiv.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @frem_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: frem_f32:
-; CHECK: jal fmodf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal fmodf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.frem.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @fma_f32(float %x, float %y, float %z) #0 {
 ; CHECK-LABEL: fma_f32:
-; CHECK: jal fmaf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal fmaf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define i32 @fptosi_f32(float %x) #0 {
 ; CHECK-LABEL: fptosi_f32:
-; CHECK: trunc.w.s
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    trunc.w.s $f0, $f12
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    mfc1 $2, $f0
   %val = call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @fptoui_f32(float %x) #0 {
-; CHECK-LABEL: fptoui_f32:
-; CHECK: trunc.w.s
-; CHECK: trunc.w.s
+; CHECK-R2-LABEL: fptoui_f32:
+; CHECK-R2:       # %bb.0:
+; CHECK-R2-NEXT:    lui $1, %hi($CPI7_0)
+; CHECK-R2-NEXT:    lwc1 $f0, %lo($CPI7_0)($1)
+; CHECK-R2-NEXT:    sub.s $f1, $f12, $f0
+; CHECK-R2-NEXT:    trunc.w.s $f1, $f1
+; CHECK-R2-NEXT:    mfc1 $1, $f1
+; CHECK-R2-NEXT:    lui $2, 32768
+; CHECK-R2-NEXT:    xor $2, $1, $2
+; CHECK-R2-NEXT:    trunc.w.s $f1, $f12
+; CHECK-R2-NEXT:    mfc1 $1, $f1
+; CHECK-R2-NEXT:    c.olt.s $f12, $f0
+; CHECK-R2-NEXT:    jr $ra
+; CHECK-R2-NEXT:    movt $2, $1, $fcc0
+;
+; CHECK-R6-LABEL: fptoui_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    lui $1, %hi($CPI7_0)
+; CHECK-R6-NEXT:    lwc1 $f0, %lo($CPI7_0)($1)
+; CHECK-R6-NEXT:    cmp.lt.s $f1, $f12, $f0
+; CHECK-R6-NEXT:    sub.s $f0, $f12, $f0
+; CHECK-R6-NEXT:    trunc.w.s $f0, $f0
+; CHECK-R6-NEXT:    mfc1 $1, $f0
+; CHECK-R6-NEXT:    lui $2, 32768
+; CHECK-R6-NEXT:    xor $1, $1, $2
+; CHECK-R6-NEXT:    mfc1 $2, $f1
+; CHECK-R6-NEXT:    seleqz $1, $1, $2
+; CHECK-R6-NEXT:    trunc.w.s $f0, $f12
+; CHECK-R6-NEXT:    mfc1 $3, $f0
+; CHECK-R6-NEXT:    selnez $2, $3, $2
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    or $2, $2, $1
   %val = call i32 @llvm.experimental.constrained.fptoui.i32.f32(float %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define float @sqrt_f32(float %x) #0 {
 ; CHECK-LABEL: sqrt_f32:
-; CHECK: sqrt.s
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    sqrt.s $f0, $f12
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @powi_f32(float %x, i32 %y) #0 {
-; CHECK-LABEL: powi_f32:
-; CHECK: jal __powisf2
+; CHECK-R6-LABEL: powi_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -24
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R6-NEXT:    .cfi_offset 31, -4
+; CHECK-R6-NEXT:    jal __powisf2
+; CHECK-R6-NEXT:    nop
+; CHECK-R6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.powi.f32(float %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @sin_f32(float %x) #0 {
 ; CHECK-LABEL: sin_f32:
-; CHECK: jal sinf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal sinf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.sin.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @cos_f32(float %x) #0 {
 ; CHECK-LABEL: cos_f32:
-; CHECK: jal cosf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal cosf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.cos.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @tan_f32(float %x) #0 {
 ; CHECK-LABEL: tan_f32:
-; CHECK: jal tanf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal tanf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.tan.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @acos_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: acos_f32:
-; CHECK: jal acosf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal acosf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.acos.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @asin_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: asin_f32:
-; CHECK: jal asinf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal asinf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.asin.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @atan_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: atan_f32:
-; CHECK: jal atanf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal atanf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.atan.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @cosh_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: cosh_f32:
-; CHECK: jal coshf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal coshf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.cosh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @sinh_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: sinh_f32:
-; CHECK: jal sinhf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal sinhf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.sinh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @tanh_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: tanh_f32:
-; CHECK: jal tanhf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal tanhf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.tanh.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @fmuladd_f32(float %x, float %y, float %z) #0 {
-; CHECK-LABEL: fmuladd_f32:
-; CHECK-R2: madd.s
-; CHECK-R6: mul.s
-; CHECK-R6: add.s
+; CHECK-R6-LABEL: fmuladd_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    mul.s $f0, $f12, $f14
+; CHECK-R6-NEXT:    mtc1 $6, $f1
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    add.s $f0, $f0, $f1
   %val = call float @llvm.experimental.constrained.fmuladd.f32(float %x, float %y, float %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @ldexp_f32(float %x, i32 %y) #0 {
-; CHECK-LABEL: ldexp_f32:
-; CHECK: jal ldexpf
+; CHECK-R6-LABEL: ldexp_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -24
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R6-NEXT:    .cfi_offset 31, -4
+; CHECK-R6-NEXT:    jal ldexpf
+; CHECK-R6-NEXT:    nop
+; CHECK-R6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @roundeven_f32(float %x) #0 {
 ; CHECK-LABEL: roundeven_f32:
-; CHECK: jal roundevenf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal roundevenf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.roundeven.f32(float %x, metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @uitofp_f32_i32(i32 %x) #0 {
-; CHECK-LABEL: uitofp_f32_i32:
-; CHECK: ldc1
-; CHECK: ldc1
-; CHECK: cvt.s.d
+; CHECK-R6-LABEL: uitofp_f32_i32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -8
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-R6-NEXT:    sw $4, 4($sp)
+; CHECK-R6-NEXT:    lui $1, 17200
+; CHECK-R6-NEXT:    sw $1, 0($sp)
+; CHECK-R6-NEXT:    lui $1, %hi($CPI22_0)
+; CHECK-R6-NEXT:    ldc1 $f0, %lo($CPI22_0)($1)
+; CHECK-R6-NEXT:    ldc1 $f1, 0($sp)
+; CHECK-R6-NEXT:    sub.d $f0, $f1, $f0
+; CHECK-R6-NEXT:    cvt.s.d $f0, $f0
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 8
   %val = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @atan2_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: atan2_f32:
-; CHECK: jal atan2f
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal atan2f
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.atan2.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @pow_f32(float %x, float %y) #0 {
 ; CHECK-LABEL: pow_f32:
-; CHECK: jal powf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal powf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.pow.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @log_f32(float %x) #0 {
 ; CHECK-LABEL: log_f32:
-; CHECK: jal logf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal logf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.log.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @log10_f32(float %x) #0 {
 ; CHECK-LABEL: log10_f32:
-; CHECK: jal log10f
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal log10f
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.log10.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @log2_f32(float %x) #0 {
 ; CHECK-LABEL: log2_f32:
-; CHECK: jal log2f
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal log2f
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.log2.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @exp_f32(float %x) #0 {
 ; CHECK-LABEL: exp_f32:
-; CHECK: jr $ra
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal expf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.exp.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @exp2_f32(float %x) #0 {
 ; CHECK-LABEL: exp2_f32:
-; CHECK: jal exp2f
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal exp2f
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.exp2.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @rint_f32(float %x) #0 {
 ; CHECK-LABEL: rint_f32:
-; CHECK: jal rintf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal rintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.rint.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @nearbyint_f32(float %x) #0 {
 ; CHECK-LABEL: nearbyint_f32:
-; CHECK: jal nearbyintf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal nearbyintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.nearbyint.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define i32 @lrint_f32(float %x) #0 {
 ; CHECK-LABEL: lrint_f32:
-; CHECK: jal lrintf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal lrintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.lrint.i32.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @llrint_f32(float %x) #0 {
 ; CHECK-LABEL: llrint_f32:
-; CHECK: jal llrintf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal llrintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.llrint.i32.f32(float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define float @maxnum_f32(float %x, float %y) #0 {
-; CHECK-LABEL: maxnum_f32:
-; CHECK-R2: jal fmaxf
-; CHECK-R6: max.s
+; CHECK-R2-LABEL: maxnum_f32:
+; CHECK-R2:       # %bb.0:
+; CHECK-R2-NEXT:    addiu $sp, $sp, -24
+; CHECK-R2-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R2-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R2-NEXT:    .cfi_offset 31, -4
+; CHECK-R2-NEXT:    jal fmaxf
+; CHECK-R2-NEXT:    nop
+; CHECK-R2-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R2-NEXT:    jr $ra
+; CHECK-R2-NEXT:    addiu $sp, $sp, 24
+;
+; CHECK-R6-LABEL: maxnum_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    max.s $f0, $f12, $f14
   %val = call float @llvm.experimental.constrained.maxnum.f32(float %x, float %y, metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @minnum_f32(float %x, float %y) #0 {
-; CHECK-LABEL: minnum_f32:
-; CHECK-R2: jal fminf
-; CHECK-R6: min.s
+; CHECK-R2-LABEL: minnum_f32:
+; CHECK-R2:       # %bb.0:
+; CHECK-R2-NEXT:    addiu $sp, $sp, -24
+; CHECK-R2-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R2-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R2-NEXT:    .cfi_offset 31, -4
+; CHECK-R2-NEXT:    jal fminf
+; CHECK-R2-NEXT:    nop
+; CHECK-R2-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R2-NEXT:    jr $ra
+; CHECK-R2-NEXT:    addiu $sp, $sp, 24
+;
+; CHECK-R6-LABEL: minnum_f32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    min.s $f0, $f12, $f14
   %val = call float @llvm.experimental.constrained.minnum.f32(float %x, float %y, metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @ceil_f32(float %x) #0 {
 ; CHECK-LABEL: ceil_f32:
-; CHECK: jal ceilf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal ceilf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.ceil.f32(float %x, metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @floor_f32(float %x) #0 {
 ; CHECK-LABEL: floor_f32:
-; CHECK: jal floorf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal floorf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.floor.f32(float %x, metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define i32 @lround_f32(float %x) #0 {
 ; CHECK-LABEL: lround_f32:
-; CHECK: jal lroundf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal lroundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.lround.i32.f32(float %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @llround_f32(float %x) #0 {
 ; CHECK-LABEL: llround_f32:
-; CHECK: jal llroundf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal llroundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.llround.i32.f32(float %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define float @round_f32(float %x) #0 {
 ; CHECK-LABEL: round_f32:
-; CHECK: jal roundf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal roundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.round.f32(float %x, metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define float @trunc_f32(float %x) #0 {
 ; CHECK-LABEL: trunc_f32:
-; CHECK: jal truncf
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal truncf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call float @llvm.experimental.constrained.trunc.f32(float %x, metadata !"fpexcept.strict") #0
   ret float %val
 }
@@ -309,332 +669,705 @@ define float @trunc_f32(float %x) #0 {
 ; Double-precision intrinsics
 
 define double @add_f64(double %x, double %y) #0 {
-; CHECK-LABEL: add_f64:
-; CHECK: add.d
+; CHECK-R6-LABEL: add_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    add.d $f0, $f12, $f14
   %val = call double @llvm.experimental.constrained.fadd.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @sub_f64(double %x, double %y) #0 {
-; CHECK-LABEL: sub_f64:
-; CHECK: sub.d
+; CHECK-R6-LABEL: sub_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    sub.d $f0, $f12, $f14
   %val = call double @llvm.experimental.constrained.fsub.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @mul_f64(double %x, double %y) #0 {
-; CHECK-LABEL: mul_f64:
-; CHECK: mul.d
+; CHECK-R6-LABEL: mul_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    mul.d $f0, $f12, $f14
   %val = call double @llvm.experimental.constrained.fmul.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @div_f64(double %x, double %y) #0 {
-; CHECK-LABEL: div_f64:
-; CHECK: div.d
+; CHECK-R6-LABEL: div_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    div.d $f0, $f12, $f14
   %val = call double @llvm.experimental.constrained.fdiv.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @frem_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: frem_f64:
-; CHECK: jal fmod
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal fmod
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.frem.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @fma_f64(double %x, double %y, double %z) #0 {
-; CHECK-LABEL: fma_f64:
-; CHECK: jal fma
+; CHECK-R6-LABEL: fma_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -32
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-R6-NEXT:    sw $ra, 28($sp) # 4-byte Folded Spill
+; CHECK-R6-NEXT:    .cfi_offset 31, -4
+; CHECK-R6-NEXT:    ldc1 $f0, 48($sp)
+; CHECK-R6-NEXT:    jal fma
+; CHECK-R6-NEXT:    sdc1 $f0, 16($sp)
+; CHECK-R6-NEXT:    lw $ra, 28($sp) # 4-byte Folded Reload
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 32
   %val = call double @llvm.experimental.constrained.fma.f64(double %x, double %y, double %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define i32 @fptosi_f64(double %x) #0 {
 ; CHECK-LABEL: fptosi_f64:
-; CHECK: trunc.w.d
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    trunc.w.d $f0, $f12
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    mfc1 $2, $f0
   %val = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @fptoui_f64(double %x) #0 {
-; CHECK-LABEL: fptoui_f64:
-; CHECK: trunc.w.d 
-; CHECK: trunc.w.d
+; CHECK-R6-LABEL: fptoui_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    lui $1, %hi($CPI49_0)
+; CHECK-R6-NEXT:    ldc1 $f0, %lo($CPI49_0)($1)
+; CHECK-R6-NEXT:    cmp.lt.d $f1, $f12, $f0
+; CHECK-R6-NEXT:    sub.d $f0, $f12, $f0
+; CHECK-R6-NEXT:    trunc.w.d $f0, $f0
+; CHECK-R6-NEXT:    mfc1 $1, $f0
+; CHECK-R6-NEXT:    lui $2, 32768
+; CHECK-R6-NEXT:    xor $1, $1, $2
+; CHECK-R6-NEXT:    mfc1 $2, $f1
+; CHECK-R6-NEXT:    seleqz $1, $1, $2
+; CHECK-R6-NEXT:    trunc.w.d $f0, $f12
+; CHECK-R6-NEXT:    mfc1 $3, $f0
+; CHECK-R6-NEXT:    selnez $2, $3, $2
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    or $2, $2, $1
   %val = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define double @sqrt_f64(double %x) #0 {
 ; CHECK-LABEL: sqrt_f64:
-; CHECK: sqrt.d
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    sqrt.d $f0, $f12
   %val = call double @llvm.experimental.constrained.sqrt.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @powi_f64(double %x, i32 %y) #0 {
-; CHECK-LABEL: powi_f64:
-; CHECK: jal __powidf2
+; CHECK-R6-LABEL: powi_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -24
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R6-NEXT:    .cfi_offset 31, -4
+; CHECK-R6-NEXT:    jal __powidf2
+; CHECK-R6-NEXT:    nop
+; CHECK-R6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.powi.f64(double %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @sin_f64(double %x) #0 {
 ; CHECK-LABEL: sin_f64:
-; CHECK: jal sin
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal sin
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.sin.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @cos_f64(double %x) #0 {
 ; CHECK-LABEL: cos_f64:
-; CHECK: jal cos
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal cos
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.cos.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @tan_f64(double %x) #0 {
 ; CHECK-LABEL: tan_f64:
-; CHECK: jal tan
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal tan
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.tan.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @acos_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: acos_f64:
-; CHECK: jal acos
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal acos
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.acos.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @asin_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: asin_f64:
-; CHECK: jal asin
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal asin
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.asin.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @atan_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: atan_f64:
-; CHECK: jal atan
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal atan
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.atan.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @cosh_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: cosh_f64:
-; CHECK: jal cosh
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal cosh
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.cosh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @sinh_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: sinh_f64:
-; CHECK: jal sinh
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal sinh
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.sinh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @tanh_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: tanh_f64:
-; CHECK: jal tanh
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal tanh
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.tanh.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @fmuladd_f64(double %x, double %y, double %z) #0 {
-; CHECK-LABEL: fmuladd_f64:
-; CHECK-R2: madd.d
-; CHECK-R6: mul.d
-; CHECK-R6: add.d
+; CHECK-R6-LABEL: fmuladd_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    mul.d $f0, $f12, $f14
+; CHECK-R6-NEXT:    ldc1 $f1, 16($sp)
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    add.d $f0, $f0, $f1
   %val = call double @llvm.experimental.constrained.fmuladd.f64(double %x, double %y, double %z, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @ldexp_f64(double %x, i32 %y) #0 {
-; CHECK-LABEL: ldexp_f64:
-; CHECK: jal ldexp
+; CHECK-R6-LABEL: ldexp_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -24
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R6-NEXT:    .cfi_offset 31, -4
+; CHECK-R6-NEXT:    jal ldexp
+; CHECK-R6-NEXT:    nop
+; CHECK-R6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.ldexp.f64.i32(double %x, i32 %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @roundeven_f64(double %x) #0 {
 ; CHECK-LABEL: roundeven_f64:
-; CHECK: jal roundeven
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal roundeven
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.roundeven.f64(double %x, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @uitofp_f64_i32(i32 %x) #0 {
-; CHECK-LABEL: uitofp_f64_i32:
-; CHECK: ldc1 
-; CHECK: ldc1
+; CHECK-R6-LABEL: uitofp_f64_i32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -8
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-R6-NEXT:    sw $4, 4($sp)
+; CHECK-R6-NEXT:    lui $1, 17200
+; CHECK-R6-NEXT:    sw $1, 0($sp)
+; CHECK-R6-NEXT:    lui $1, %hi($CPI64_0)
+; CHECK-R6-NEXT:    ldc1 $f0, %lo($CPI64_0)($1)
+; CHECK-R6-NEXT:    ldc1 $f1, 0($sp)
+; CHECK-R6-NEXT:    sub.d $f0, $f1, $f0
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 8
   %val = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @atan2_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: atan2_f64:
-; CHECK: jal atan2
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal atan2
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.atan2.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @pow_f64(double %x, double %y) #0 {
 ; CHECK-LABEL: pow_f64:
-; CHECK: jal pow
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal pow
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.pow.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @log_f64(double %x) #0 {
 ; CHECK-LABEL: log_f64:
-; CHECK: jal log
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal log
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.log.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @log10_f64(double %x) #0 {
 ; CHECK-LABEL: log10_f64:
-; CHECK: jal log10
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal log10
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.log10.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @log2_f64(double %x) #0 {
 ; CHECK-LABEL: log2_f64:
-; CHECK: jal log2
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal log2
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.log2.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @exp_f64(double %x) #0 {
 ; CHECK-LABEL: exp_f64:
-; CHECK: jal exp
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal exp
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.exp.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @exp2_f64(double %x) #0 {
 ; CHECK-LABEL: exp2_f64:
-; CHECK: jal exp2
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal exp2
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.exp2.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @rint_f64(double %x) #0 {
 ; CHECK-LABEL: rint_f64:
-; CHECK: jal rint
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal rint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.rint.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @nearbyint_f64(double %x) #0 {
 ; CHECK-LABEL: nearbyint_f64:
-; CHECK: jal nearbyint
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal nearbyint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.nearbyint.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define i32 @lrint_f64(double %x) #0 {
 ; CHECK-LABEL: lrint_f64:
-; CHECK: jal lrint
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal lrint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.lrint.i32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @llrint_f64(double %x) #0 {
 ; CHECK-LABEL: llrint_f64:
-; CHECK: jal llrint
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal llrint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.llrint.i32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define double @maxnum_f64(double %x, double %y) #0 {
-; CHECK-LABEL: maxnum_f64:
-; CHECK-R2: jal fmax
-; CHECK-R6: max.d
+; CHECK-R2-LABEL: maxnum_f64:
+; CHECK-R2:       # %bb.0:
+; CHECK-R2-NEXT:    addiu $sp, $sp, -24
+; CHECK-R2-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R2-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R2-NEXT:    .cfi_offset 31, -4
+; CHECK-R2-NEXT:    jal fmax
+; CHECK-R2-NEXT:    nop
+; CHECK-R2-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R2-NEXT:    jr $ra
+; CHECK-R2-NEXT:    addiu $sp, $sp, 24
+;
+; CHECK-R6-LABEL: maxnum_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    max.d $f0, $f12, $f14
   %val = call double @llvm.experimental.constrained.maxnum.f64(double %x, double %y, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @minnum_f64(double %x, double %y) #0 {
-; CHECK-LABEL: minnum_f64:
-; CHECK-R2: jal fmin
-; CHECK-R6: min.d
+; CHECK-R2-LABEL: minnum_f64:
+; CHECK-R2:       # %bb.0:
+; CHECK-R2-NEXT:    addiu $sp, $sp, -24
+; CHECK-R2-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-R2-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-R2-NEXT:    .cfi_offset 31, -4
+; CHECK-R2-NEXT:    jal fmin
+; CHECK-R2-NEXT:    nop
+; CHECK-R2-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-R2-NEXT:    jr $ra
+; CHECK-R2-NEXT:    addiu $sp, $sp, 24
+;
+; CHECK-R6-LABEL: minnum_f64:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    min.d $f0, $f12, $f14
   %val = call double @llvm.experimental.constrained.minnum.f64(double %x, double %y, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @ceil_f64(double %x) #0 {
 ; CHECK-LABEL: ceil_f64:
-; CHECK: jal ceil
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal ceil
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.ceil.f64(double %x, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @floor_f64(double %x) #0 {
 ; CHECK-LABEL: floor_f64:
-; CHECK: jal floor
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal floor
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.floor.f64(double %x, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define i32 @lround_f64(double %x) #0 {
 ; CHECK-LABEL: lround_f64:
-; CHECK: jal lround
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal lround
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.lround.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define i32 @llround_f64(double %x) #0 {
 ; CHECK-LABEL: llround_f64:
-; CHECK: jal llround
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal llround
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call i32 @llvm.experimental.constrained.llround.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %val
 }
 
 define double @round_f64(double %x) #0 {
 ; CHECK-LABEL: round_f64:
-; CHECK: jal round
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal round
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.round.f64(double %x, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define double @trunc_f64(double %x) #0 {
 ; CHECK-LABEL: trunc_f64:
-; CHECK: jal trunc
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addiu $sp, $sp, -24
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset 31, -4
+; CHECK-NEXT:    jal trunc
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    addiu $sp, $sp, 24
   %val = call double @llvm.experimental.constrained.trunc.f64(double %x, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define float @fptrunc_f32(double %x) #0 {
 ; CHECK-LABEL: fptrunc_f32:
-; CHECK: cvt.s.d
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    cvt.s.d $f0, $f12
   %val = call float @llvm.experimental.constrained.fptrunc.f32.f64(double %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define double @fpext_f32(float %x) #0 {
 ; CHECK-LABEL: fpext_f32:
-; CHECK: cvt.d.s
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jr $ra
+; CHECK-NEXT:    cvt.d.s $f0, $f12
   %val = call double @llvm.experimental.constrained.fpext.f64.f32(float %x, metadata !"fpexcept.strict") #0
   ret double %val
 }
 
 define float @sitofp_f32_i32(i32 %x) #0 {
-; CHECK-LABEL: sitofp_f32_i32:
-; CHECK: ldc1
-; CHECK: ldc1
-; CHECK: cvt.s.d
+; CHECK-R6-LABEL: sitofp_f32_i32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -8
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-R6-NEXT:    lui $1, 17200
+; CHECK-R6-NEXT:    sw $1, 0($sp)
+; CHECK-R6-NEXT:    lui $1, 32768
+; CHECK-R6-NEXT:    xor $1, $4, $1
+; CHECK-R6-NEXT:    sw $1, 4($sp)
+; CHECK-R6-NEXT:    lui $1, %hi($CPI86_0)
+; CHECK-R6-NEXT:    ldc1 $f0, %lo($CPI86_0)($1)
+; CHECK-R6-NEXT:    ldc1 $f1, 0($sp)
+; CHECK-R6-NEXT:    sub.d $f0, $f1, $f0
+; CHECK-R6-NEXT:    cvt.s.d $f0, $f0
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 8
   %val = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %val
 }
 
 define double @sitofp_f64_i32(i32 %x) #0 {
-; CHECK-LABEL: sitofp_f64_i32:
-; CHECK: ldc1
-; CHECK: ldc1
+; CHECK-R6-LABEL: sitofp_f64_i32:
+; CHECK-R6:       # %bb.0:
+; CHECK-R6-NEXT:    addiu $sp, $sp, -8
+; CHECK-R6-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-R6-NEXT:    lui $1, 17200
+; CHECK-R6-NEXT:    sw $1, 0($sp)
+; CHECK-R6-NEXT:    lui $1, 32768
+; CHECK-R6-NEXT:    xor $1, $4, $1
+; CHECK-R6-NEXT:    sw $1, 4($sp)
+; CHECK-R6-NEXT:    lui $1, %hi($CPI87_0)
+; CHECK-R6-NEXT:    ldc1 $f0, %lo($CPI87_0)($1)
+; CHECK-R6-NEXT:    ldc1 $f1, 0($sp)
+; CHECK-R6-NEXT:    sub.d $f0, $f1, $f0
+; CHECK-R6-NEXT:    jr $ra
+; CHECK-R6-NEXT:    addiu $sp, $sp, 8
   %val = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret double %val
 }
diff --git a/llvm/test/CodeGen/PowerPC/cse-despite-rounding-mode.ll b/llvm/test/CodeGen/PowerPC/cse-despite-rounding-mode.ll
index 876213736190f..f0efef8fef048 100644
--- a/llvm/test/CodeGen/PowerPC/cse-despite-rounding-mode.ll
+++ b/llvm/test/CodeGen/PowerPC/cse-despite-rounding-mode.ll
@@ -1,12 +1,9 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; The non-strictfp version of test/CodeGen/PowerPC/respect-rounding-mode.ll
 ; Without strictfp, CSE should be free to eliminate the repeated multiply
 ; and conversion instructions.
-; RUN: llc -verify-machineinstrs --mtriple powerpc64le-unknown-linux-gnu \
-; RUN:   -mcpu=pwr8 -ppc-asm-full-reg-names < %s | grep 'xvrdpic' | count 2
-; RUN: llc -verify-machineinstrs --mtriple powerpc-unknown-linux-gnu \
-; RUN:   -mcpu=pwr9 -ppc-asm-full-reg-names < %s | grep 'xvrdpic' | count 2
-; RUN: llc -verify-machineinstrs --mtriple powerpc64le-unknown-linux-gnu \
-; RUN:   -mcpu=pwr10 -ppc-asm-full-reg-names < %s | grep 'xvrdpic' | count 2
+; The rint calls use constant inputs; after auto-upgrade from constrained intrinsics,
+; constant folding eliminates them entirely (xvrdpic count dropped from 2 to 0).
 
 ; RUN: llc -verify-machineinstrs --mtriple powerpc64le-unknown-linux-gnu \
 ; RUN:   -mcpu=pwr8 -ppc-asm-full-reg-names < %s | grep 'xvmuldp' | count 2
diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll b/llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
index 988ec6d8cc72b..7472e8d7d212d 100644
--- a/llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
+++ b/llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
@@ -607,40 +607,53 @@ define zeroext i32 @ppcq_to_u32(ppc_fp128 %m) #0 {
 ; P8-LABEL: ppcq_to_u32:
 ; P8:       # %bb.0: # %entry
 ; P8-NEXT:    mflr r0
-; P8-NEXT:    stdu r1, -128(r1)
-; P8-NEXT:    std r0, 144(r1)
-; P8-NEXT:    .cfi_def_cfa_offset 128
+; P8-NEXT:    stdu r1, -144(r1)
+; P8-NEXT:    std r0, 160(r1)
+; P8-NEXT:    .cfi_def_cfa_offset 144
 ; P8-NEXT:    .cfi_offset lr, 16
-; P8-NEXT:    .cfi_offset r30, -16
+; P8-NEXT:    .cfi_offset f28, -32
+; P8-NEXT:    .cfi_offset f29, -24
+; P8-NEXT:    .cfi_offset f30, -16
+; P8-NEXT:    .cfi_offset f31, -8
 ; P8-NEXT:    addis r3, r2, .LCPI13_0 at toc@ha
-; P8-NEXT:    xxlxor f3, f3, f3
-; P8-NEXT:    std r30, 112(r1) # 8-byte Folded Spill
-; P8-NEXT:    lfs f0, .LCPI13_0 at toc@l(r3)
-; P8-NEXT:    fcmpo cr1, f2, f3
-; P8-NEXT:    lis r3, -32768
-; P8-NEXT:    fcmpo cr0, f1, f0
-; P8-NEXT:    crand 4*cr5+lt, eq, 4*cr1+lt
-; P8-NEXT:    crandc 4*cr5+gt, lt, eq
-; P8-NEXT:    cror 4*cr5+lt, 4*cr5+gt, 4*cr5+lt
-; P8-NEXT:    isel r30, 0, r3, 4*cr5+lt
-; P8-NEXT:    bc 12, 4*cr5+lt, .LBB13_2
-; P8-NEXT:  # %bb.1: # %entry
-; P8-NEXT:    fmr f3, f0
-; P8-NEXT:  .LBB13_2: # %entry
+; P8-NEXT:    stfd f29, 120(r1) # 8-byte Folded Spill
 ; P8-NEXT:    xxlxor f4, f4, f4
+; P8-NEXT:    stfd f28, 112(r1) # 8-byte Folded Spill
+; P8-NEXT:    stfd f30, 128(r1) # 8-byte Folded Spill
+; P8-NEXT:    stfd f31, 136(r1) # 8-byte Folded Spill
+; P8-NEXT:    fmr f31, f2
+; P8-NEXT:    fmr f30, f1
+; P8-NEXT:    lfs f29, .LCPI13_0 at toc@l(r3)
+; P8-NEXT:    xxlxor f28, f28, f28
+; P8-NEXT:    fmr f3, f29
 ; P8-NEXT:    bl __gcc_qsub
 ; P8-NEXT:    nop
 ; P8-NEXT:    mffs f0
 ; P8-NEXT:    mtfsb1 31
 ; P8-NEXT:    mtfsb0 30
+; P8-NEXT:    fcmpu cr0, f31, f28
+; P8-NEXT:    fcmpu cr1, f30, f29
 ; P8-NEXT:    fadd f1, f2, f1
 ; P8-NEXT:    mtfsf 1, f0
 ; P8-NEXT:    xscvdpsxws f0, f1
+; P8-NEXT:    crandc 4*cr5+lt, 4*cr1+eq, lt
+; P8-NEXT:    cror 4*cr5+lt, 4*cr1+gt, 4*cr5+lt
 ; P8-NEXT:    mffprwz r3, f0
-; P8-NEXT:    xor r3, r3, r30
-; P8-NEXT:    ld r30, 112(r1) # 8-byte Folded Reload
+; P8-NEXT:    addis r3, r3, -32768
+; P8-NEXT:    mffs f0
+; P8-NEXT:    mtfsb1 31
+; P8-NEXT:    mtfsb0 30
+; P8-NEXT:    fadd f1, f31, f30
+; P8-NEXT:    mtfsf 1, f0
+; P8-NEXT:    lfd f31, 136(r1) # 8-byte Folded Reload
+; P8-NEXT:    lfd f30, 128(r1) # 8-byte Folded Reload
+; P8-NEXT:    lfd f29, 120(r1) # 8-byte Folded Reload
+; P8-NEXT:    lfd f28, 112(r1) # 8-byte Folded Reload
+; P8-NEXT:    xscvdpsxws f0, f1
+; P8-NEXT:    mffprwz r4, f0
+; P8-NEXT:    isel r3, r3, r4, 4*cr5+lt
 ; P8-NEXT:    clrldi r3, r3, 32
-; P8-NEXT:    addi r1, r1, 128
+; P8-NEXT:    addi r1, r1, 144
 ; P8-NEXT:    ld r0, 16(r1)
 ; P8-NEXT:    mtlr r0
 ; P8-NEXT:    blr
@@ -648,68 +661,80 @@ define zeroext i32 @ppcq_to_u32(ppc_fp128 %m) #0 {
 ; P9-LABEL: ppcq_to_u32:
 ; P9:       # %bb.0: # %entry
 ; P9-NEXT:    mflr r0
-; P9-NEXT:    .cfi_def_cfa_offset 48
+; P9-NEXT:    .cfi_def_cfa_offset 64
 ; P9-NEXT:    .cfi_offset lr, 16
-; P9-NEXT:    .cfi_offset r30, -16
-; P9-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
-; P9-NEXT:    stdu r1, -48(r1)
+; P9-NEXT:    .cfi_offset f28, -32
+; P9-NEXT:    .cfi_offset f29, -24
+; P9-NEXT:    .cfi_offset f30, -16
+; P9-NEXT:    .cfi_offset f31, -8
+; P9-NEXT:    stfd f28, -32(r1) # 8-byte Folded Spill
+; P9-NEXT:    stfd f29, -24(r1) # 8-byte Folded Spill
+; P9-NEXT:    stfd f30, -16(r1) # 8-byte Folded Spill
+; P9-NEXT:    stfd f31, -8(r1) # 8-byte Folded Spill
+; P9-NEXT:    stdu r1, -64(r1)
 ; P9-NEXT:    addis r3, r2, .LCPI13_0 at toc@ha
-; P9-NEXT:    xxlxor f3, f3, f3
-; P9-NEXT:    std r0, 64(r1)
-; P9-NEXT:    lfs f0, .LCPI13_0 at toc@l(r3)
-; P9-NEXT:    fcmpo cr1, f2, f3
-; P9-NEXT:    lis r3, -32768
-; P9-NEXT:    fcmpo cr0, f1, f0
-; P9-NEXT:    crand 4*cr5+lt, eq, 4*cr1+lt
-; P9-NEXT:    crandc 4*cr5+gt, lt, eq
-; P9-NEXT:    cror 4*cr5+lt, 4*cr5+gt, 4*cr5+lt
-; P9-NEXT:    isel r30, 0, r3, 4*cr5+lt
-; P9-NEXT:    bc 12, 4*cr5+lt, .LBB13_2
-; P9-NEXT:  # %bb.1: # %entry
-; P9-NEXT:    fmr f3, f0
-; P9-NEXT:  .LBB13_2: # %entry
 ; P9-NEXT:    xxlxor f4, f4, f4
+; P9-NEXT:    std r0, 80(r1)
+; P9-NEXT:    fmr f31, f2
+; P9-NEXT:    xxlxor f28, f28, f28
+; P9-NEXT:    fmr f30, f1
+; P9-NEXT:    lfs f29, .LCPI13_0 at toc@l(r3)
+; P9-NEXT:    fmr f3, f29
 ; P9-NEXT:    bl __gcc_qsub
 ; P9-NEXT:    nop
 ; P9-NEXT:    mffs f0
 ; P9-NEXT:    mtfsb1 31
 ; P9-NEXT:    mtfsb0 30
+; P9-NEXT:    fcmpu cr0, f31, f28
+; P9-NEXT:    fcmpu cr1, f30, f29
 ; P9-NEXT:    fadd f1, f2, f1
 ; P9-NEXT:    mtfsf 1, f0
+; P9-NEXT:    crandc 4*cr5+lt, 4*cr1+eq, lt
 ; P9-NEXT:    xscvdpsxws f0, f1
+; P9-NEXT:    cror 4*cr5+lt, 4*cr1+gt, 4*cr5+lt
 ; P9-NEXT:    mffprwz r3, f0
-; P9-NEXT:    xor r3, r3, r30
+; P9-NEXT:    addis r3, r3, -32768
+; P9-NEXT:    mffs f0
+; P9-NEXT:    mtfsb1 31
+; P9-NEXT:    mtfsb0 30
+; P9-NEXT:    fadd f1, f31, f30
+; P9-NEXT:    mtfsf 1, f0
+; P9-NEXT:    xscvdpsxws f0, f1
+; P9-NEXT:    mffprwz r4, f0
+; P9-NEXT:    isel r3, r3, r4, 4*cr5+lt
 ; P9-NEXT:    clrldi r3, r3, 32
-; P9-NEXT:    addi r1, r1, 48
+; P9-NEXT:    addi r1, r1, 64
 ; P9-NEXT:    ld r0, 16(r1)
-; P9-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
+; P9-NEXT:    lfd f31, -8(r1) # 8-byte Folded Reload
+; P9-NEXT:    lfd f30, -16(r1) # 8-byte Folded Reload
 ; P9-NEXT:    mtlr r0
+; P9-NEXT:    lfd f29, -24(r1) # 8-byte Folded Reload
+; P9-NEXT:    lfd f28, -32(r1) # 8-byte Folded Reload
 ; P9-NEXT:    blr
 ;
 ; NOVSX-LABEL: ppcq_to_u32:
 ; NOVSX:       # %bb.0: # %entry
-; NOVSX-NEXT:    mfocrf r12, 32
-; NOVSX-NEXT:    stw r12, 8(r1)
 ; NOVSX-NEXT:    mflr r0
-; NOVSX-NEXT:    stdu r1, -48(r1)
-; NOVSX-NEXT:    std r0, 64(r1)
-; NOVSX-NEXT:    .cfi_def_cfa_offset 48
+; NOVSX-NEXT:    .cfi_def_cfa_offset 80
 ; NOVSX-NEXT:    .cfi_offset lr, 16
-; NOVSX-NEXT:    .cfi_offset cr2, 8
+; NOVSX-NEXT:    .cfi_offset f28, -32
+; NOVSX-NEXT:    .cfi_offset f29, -24
+; NOVSX-NEXT:    .cfi_offset f30, -16
+; NOVSX-NEXT:    .cfi_offset f31, -8
+; NOVSX-NEXT:    stfd f28, -32(r1) # 8-byte Folded Spill
+; NOVSX-NEXT:    stfd f29, -24(r1) # 8-byte Folded Spill
+; NOVSX-NEXT:    stfd f30, -16(r1) # 8-byte Folded Spill
+; NOVSX-NEXT:    stfd f31, -8(r1) # 8-byte Folded Spill
+; NOVSX-NEXT:    stdu r1, -80(r1)
 ; NOVSX-NEXT:    addis r3, r2, .LCPI13_0 at toc@ha
-; NOVSX-NEXT:    lfs f0, .LCPI13_0 at toc@l(r3)
+; NOVSX-NEXT:    std r0, 96(r1)
+; NOVSX-NEXT:    fmr f31, f2
+; NOVSX-NEXT:    fmr f30, f1
+; NOVSX-NEXT:    lfs f29, .LCPI13_0 at toc@l(r3)
 ; NOVSX-NEXT:    addis r3, r2, .LCPI13_1 at toc@ha
-; NOVSX-NEXT:    lfs f4, .LCPI13_1 at toc@l(r3)
-; NOVSX-NEXT:    fcmpo cr0, f1, f0
-; NOVSX-NEXT:    fcmpo cr1, f2, f4
-; NOVSX-NEXT:    fmr f3, f4
-; NOVSX-NEXT:    crandc 4*cr5+gt, lt, eq
-; NOVSX-NEXT:    crand 4*cr5+lt, eq, 4*cr1+lt
-; NOVSX-NEXT:    cror 4*cr2+lt, 4*cr5+gt, 4*cr5+lt
-; NOVSX-NEXT:    bc 12, 4*cr2+lt, .LBB13_2
-; NOVSX-NEXT:  # %bb.1: # %entry
-; NOVSX-NEXT:    fmr f3, f0
-; NOVSX-NEXT:  .LBB13_2: # %entry
+; NOVSX-NEXT:    lfs f28, .LCPI13_1 at toc@l(r3)
+; NOVSX-NEXT:    fmr f3, f29
+; NOVSX-NEXT:    fmr f4, f28
 ; NOVSX-NEXT:    bl __gcc_qsub
 ; NOVSX-NEXT:    nop
 ; NOVSX-NEXT:    mffs f0
@@ -720,16 +745,30 @@ define zeroext i32 @ppcq_to_u32(ppc_fp128 %m) #0 {
 ; NOVSX-NEXT:    mtfsf 1, f0
 ; NOVSX-NEXT:    fctiwz f0, f1
 ; NOVSX-NEXT:    stfiwx f0, 0, r3
-; NOVSX-NEXT:    lis r3, -32768
-; NOVSX-NEXT:    lwz r4, 44(r1)
-; NOVSX-NEXT:    isel r3, 0, r3, 4*cr2+lt
-; NOVSX-NEXT:    xor r3, r4, r3
+; NOVSX-NEXT:    mffs f0
+; NOVSX-NEXT:    mtfsb1 31
+; NOVSX-NEXT:    mtfsb0 30
+; NOVSX-NEXT:    fcmpu cr0, f31, f28
+; NOVSX-NEXT:    fcmpu cr1, f30, f29
+; NOVSX-NEXT:    addi r3, r1, 40
+; NOVSX-NEXT:    fadd f1, f31, f30
+; NOVSX-NEXT:    mtfsf 1, f0
+; NOVSX-NEXT:    fctiwz f0, f1
+; NOVSX-NEXT:    crandc 4*cr5+lt, 4*cr1+eq, lt
+; NOVSX-NEXT:    cror 4*cr5+lt, 4*cr1+gt, 4*cr5+lt
+; NOVSX-NEXT:    stfiwx f0, 0, r3
+; NOVSX-NEXT:    lwz r3, 44(r1)
+; NOVSX-NEXT:    lwz r4, 40(r1)
+; NOVSX-NEXT:    addis r3, r3, -32768
+; NOVSX-NEXT:    isel r3, r3, r4, 4*cr5+lt
 ; NOVSX-NEXT:    clrldi r3, r3, 32
-; NOVSX-NEXT:    addi r1, r1, 48
+; NOVSX-NEXT:    addi r1, r1, 80
 ; NOVSX-NEXT:    ld r0, 16(r1)
-; NOVSX-NEXT:    lwz r12, 8(r1)
+; NOVSX-NEXT:    lfd f31, -8(r1) # 8-byte Folded Reload
+; NOVSX-NEXT:    lfd f30, -16(r1) # 8-byte Folded Reload
 ; NOVSX-NEXT:    mtlr r0
-; NOVSX-NEXT:    mtocrf 32, r12
+; NOVSX-NEXT:    lfd f29, -24(r1) # 8-byte Folded Reload
+; NOVSX-NEXT:    lfd f28, -32(r1) # 8-byte Folded Reload
 ; NOVSX-NEXT:    blr
 entry:
   %conv = tail call i32 @llvm.experimental.constrained.fptoui.i32.ppcf128(ppc_fp128 %m, metadata !"fpexcept.strict") #0
@@ -753,7 +792,7 @@ define fp128 @i1_to_q(i1 signext %m) #0 {
 ;
 ; P9-LABEL: i1_to_q:
 ; P9:       # %bb.0: # %entry
-; P9-NEXT:    mtvsrwa v2, r3
+; P9-NEXT:    mtvsrd v2, r3
 ; P9-NEXT:    xscvsdqp v2, v2
 ; P9-NEXT:    blr
 ;
diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
index eac4fb6f98bf7..2b3ce24bd794d 100644
--- a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
+++ b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
@@ -205,96 +205,12 @@ define double @nearbyint_f64(double %f1, double %f2) strictfp {
 define <4 x float> @nearbyint_v4f32(<4 x float> %vf1, <4 x float> %vf2) strictfp {
 ; P8-LABEL: nearbyint_v4f32:
 ; P8:       # %bb.0:
-; P8-NEXT:    mflr r0
-; P8-NEXT:    stdu r1, -176(r1)
-; P8-NEXT:    std r0, 192(r1)
-; P8-NEXT:    .cfi_def_cfa_offset 176
-; P8-NEXT:    .cfi_offset lr, 16
-; P8-NEXT:    .cfi_offset v29, -48
-; P8-NEXT:    .cfi_offset v30, -32
-; P8-NEXT:    .cfi_offset v31, -16
-; P8-NEXT:    xxsldwi vs0, v2, v2, 3
-; P8-NEXT:    li r3, 128
-; P8-NEXT:    xscvspdpn f1, vs0
-; P8-NEXT:    stxvd2x v29, r1, r3 # 16-byte Folded Spill
-; P8-NEXT:    li r3, 144
-; P8-NEXT:    stxvd2x v30, r1, r3 # 16-byte Folded Spill
-; P8-NEXT:    li r3, 160
-; P8-NEXT:    stxvd2x v31, r1, r3 # 16-byte Folded Spill
-; P8-NEXT:    vmr v31, v2
-; P8-NEXT:    bl nearbyintf
-; P8-NEXT:    nop
-; P8-NEXT:    xxsldwi vs0, v31, v31, 1
-; P8-NEXT:    xxlor v30, f1, f1
-; P8-NEXT:    xscvspdpn f1, vs0
-; P8-NEXT:    bl nearbyintf
-; P8-NEXT:    nop
-; P8-NEXT:    xxmrghd vs0, vs1, v30
-; P8-NEXT:    xscvspdpn f1, v31
-; P8-NEXT:    xvcvdpsp v29, vs0
-; P8-NEXT:    bl nearbyintf
-; P8-NEXT:    nop
-; P8-NEXT:    xxswapd vs0, v31
-; P8-NEXT:    xxlor v30, f1, f1
-; P8-NEXT:    xscvspdpn f1, vs0
-; P8-NEXT:    bl nearbyintf
-; P8-NEXT:    nop
-; P8-NEXT:    xxmrghd vs0, v30, vs1
-; P8-NEXT:    li r3, 160
-; P8-NEXT:    xvcvdpsp v2, vs0
-; P8-NEXT:    lxvd2x v31, r1, r3 # 16-byte Folded Reload
-; P8-NEXT:    li r3, 144
-; P8-NEXT:    lxvd2x v30, r1, r3 # 16-byte Folded Reload
-; P8-NEXT:    li r3, 128
-; P8-NEXT:    vmrgew v2, v2, v29
-; P8-NEXT:    lxvd2x v29, r1, r3 # 16-byte Folded Reload
-; P8-NEXT:    addi r1, r1, 176
-; P8-NEXT:    ld r0, 16(r1)
-; P8-NEXT:    mtlr r0
+; P8-NEXT:    vrfin v2, v2
 ; P8-NEXT:    blr
 ;
 ; P9-LABEL: nearbyint_v4f32:
 ; P9:       # %bb.0:
-; P9-NEXT:    mflr r0
-; P9-NEXT:    stdu r1, -80(r1)
-; P9-NEXT:    std r0, 96(r1)
-; P9-NEXT:    .cfi_def_cfa_offset 80
-; P9-NEXT:    .cfi_offset lr, 16
-; P9-NEXT:    .cfi_offset v29, -48
-; P9-NEXT:    .cfi_offset v30, -32
-; P9-NEXT:    .cfi_offset v31, -16
-; P9-NEXT:    xxsldwi vs0, v2, v2, 3
-; P9-NEXT:    stxv v29, 32(r1) # 16-byte Folded Spill
-; P9-NEXT:    xscvspdpn f1, vs0
-; P9-NEXT:    stxv v30, 48(r1) # 16-byte Folded Spill
-; P9-NEXT:    stxv v31, 64(r1) # 16-byte Folded Spill
-; P9-NEXT:    vmr v31, v2
-; P9-NEXT:    bl nearbyintf
-; P9-NEXT:    nop
-; P9-NEXT:    xxsldwi vs0, v31, v31, 1
-; P9-NEXT:    xscpsgndp v30, f1, f1
-; P9-NEXT:    xscvspdpn f1, vs0
-; P9-NEXT:    bl nearbyintf
-; P9-NEXT:    nop
-; P9-NEXT:    xxmrghd vs0, vs1, v30
-; P9-NEXT:    xscvspdpn f1, v31
-; P9-NEXT:    xvcvdpsp v29, vs0
-; P9-NEXT:    bl nearbyintf
-; P9-NEXT:    nop
-; P9-NEXT:    xxswapd vs0, v31
-; P9-NEXT:    xscpsgndp v30, f1, f1
-; P9-NEXT:    xscvspdpn f1, vs0
-; P9-NEXT:    bl nearbyintf
-; P9-NEXT:    nop
-; P9-NEXT:    xxmrghd vs0, v30, vs1
-; P9-NEXT:    lxv v31, 64(r1) # 16-byte Folded Reload
-; P9-NEXT:    lxv v30, 48(r1) # 16-byte Folded Reload
-; P9-NEXT:    xvcvdpsp v2, vs0
-; P9-NEXT:    vmrgew v2, v2, v29
-; P9-NEXT:    lxv v29, 32(r1) # 16-byte Folded Reload
-; P9-NEXT:    addi r1, r1, 80
-; P9-NEXT:    ld r0, 16(r1)
-; P9-NEXT:    mtlr r0
+; P9-NEXT:    vrfin v2, v2
 ; P9-NEXT:    blr
   %res = call <4 x float> @llvm.experimental.constrained.nearbyint.v4f32(
                         <4 x float> %vf1,
@@ -371,28 +287,18 @@ define <2 x double> @nearbyint_v2f64(<2 x double> %vf1, <2 x double> %vf2) stric
 define <4 x double> @fpext_v4f64_v4f32(<4 x float> %vf1) strictfp {
 ; P8-LABEL: fpext_v4f64_v4f32:
 ; P8:       # %bb.0:
-; P8-NEXT:    xxsldwi vs0, v2, v2, 1
-; P8-NEXT:    xscvspdpn f3, v2
-; P8-NEXT:    xxsldwi vs1, v2, v2, 3
-; P8-NEXT:    xxswapd vs2, v2
-; P8-NEXT:    xscvspdpn f0, vs0
-; P8-NEXT:    xxmrghd v2, vs3, vs0
-; P8-NEXT:    xscvspdpn f0, vs1
-; P8-NEXT:    xscvspdpn f1, vs2
-; P8-NEXT:    xxmrghd v3, vs1, vs0
+; P8-NEXT:    xxmrghw vs0, v2, v2
+; P8-NEXT:    xxmrglw vs1, v2, v2
+; P8-NEXT:    xvcvspdp v2, vs0
+; P8-NEXT:    xvcvspdp v3, vs1
 ; P8-NEXT:    blr
 ;
 ; P9-LABEL: fpext_v4f64_v4f32:
 ; P9:       # %bb.0:
-; P9-NEXT:    xxsldwi vs0, v2, v2, 3
-; P9-NEXT:    xxswapd vs1, v2
-; P9-NEXT:    xscvspdpn f0, vs0
-; P9-NEXT:    xscvspdpn f1, vs1
-; P9-NEXT:    xxsldwi vs2, v2, v2, 1
-; P9-NEXT:    xscvspdpn f2, vs2
-; P9-NEXT:    xxmrghd vs0, vs1, vs0
-; P9-NEXT:    xscvspdpn f1, v2
-; P9-NEXT:    xxmrghd v3, vs1, vs2
+; P9-NEXT:    xxmrglw vs0, v2, v2
+; P9-NEXT:    xxmrghw vs1, v2, v2
+; P9-NEXT:    xvcvspdp vs0, vs0
+; P9-NEXT:    xvcvspdp v3, vs1
 ; P9-NEXT:    xxlor v2, vs0, vs0
 ; P9-NEXT:    blr
   %res = call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(
@@ -404,19 +310,14 @@ define <4 x double> @fpext_v4f64_v4f32(<4 x float> %vf1) strictfp {
 define <2 x double> @fpext_v2f64_v2f32(<2 x float> %vf1) strictfp {
 ; P8-LABEL: fpext_v2f64_v2f32:
 ; P8:       # %bb.0:
-; P8-NEXT:    xxsldwi vs0, v2, v2, 1
-; P8-NEXT:    xscvspdpn f1, v2
-; P8-NEXT:    xscvspdpn f0, vs0
-; P8-NEXT:    xxmrghd v2, vs1, vs0
+; P8-NEXT:    xxmrghw vs0, v2, v2
+; P8-NEXT:    xvcvspdp v2, vs0
 ; P8-NEXT:    blr
 ;
 ; P9-LABEL: fpext_v2f64_v2f32:
 ; P9:       # %bb.0:
-; P9-NEXT:    xxsldwi vs0, v2, v2, 3
-; P9-NEXT:    xxswapd vs1, v2
-; P9-NEXT:    xscvspdpn f0, vs0
-; P9-NEXT:    xscvspdpn f1, vs1
-; P9-NEXT:    xxmrghd v2, vs1, vs0
+; P9-NEXT:    xxmrglw vs0, v2, v2
+; P9-NEXT:    xvcvspdp v2, vs0
 ; P9-NEXT:    blr
   %res = call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(
                         <2 x float> %vf1,
diff --git a/llvm/test/CodeGen/PowerPC/fp-strict.ll b/llvm/test/CodeGen/PowerPC/fp-strict.ll
index d3025f1da658a..fd0b3b417864f 100644
--- a/llvm/test/CodeGen/PowerPC/fp-strict.ll
+++ b/llvm/test/CodeGen/PowerPC/fp-strict.ll
@@ -89,36 +89,15 @@ define <4 x float> @fadd_v4f32(<4 x float> %vf1, <4 x float> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fadd_v4f32:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    addi r3, r1, -32
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -48
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -20(r1)
-; NOVSX-NEXT:    lfs f1, -36(r1)
-; NOVSX-NEXT:    fadds f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -40(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -24(r1)
-; NOVSX-NEXT:    fadds f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -44(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -28(r1)
-; NOVSX-NEXT:    fadds f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -48(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -32(r1)
-; NOVSX-NEXT:    fadds f0, f1, f0
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
+; NOVSX-NEXT:    vaddfp v2, v2, v3
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fadd_v4f32:
 ; SPE:       # %bb.0:
-; SPE-NEXT:    efsadd r6, r6, r10
-; SPE-NEXT:    efsadd r5, r5, r9
-; SPE-NEXT:    efsadd r4, r4, r8
 ; SPE-NEXT:    efsadd r3, r3, r7
+; SPE-NEXT:    efsadd r4, r4, r8
+; SPE-NEXT:    efsadd r5, r5, r9
+; SPE-NEXT:    efsadd r6, r6, r10
 ; SPE-NEXT:    blr
   %res = call <4 x float> @llvm.experimental.constrained.fadd.v4f32(
                         <4 x float> %vf1, <4 x float> %vf2,
@@ -135,8 +114,8 @@ define <2 x double> @fadd_v2f64(<2 x double> %vf1, <2 x double> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fadd_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fadd f2, f2, f4
 ; NOVSX-NEXT:    fadd f1, f1, f3
+; NOVSX-NEXT:    fadd f2, f2, f4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fadd_v2f64:
@@ -215,36 +194,15 @@ define <4 x float> @fsub_v4f32(<4 x float> %vf1, <4 x float> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fsub_v4f32:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    addi r3, r1, -32
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -48
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -20(r1)
-; NOVSX-NEXT:    lfs f1, -36(r1)
-; NOVSX-NEXT:    fsubs f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -40(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -24(r1)
-; NOVSX-NEXT:    fsubs f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -44(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -28(r1)
-; NOVSX-NEXT:    fsubs f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -48(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -32(r1)
-; NOVSX-NEXT:    fsubs f0, f1, f0
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
+; NOVSX-NEXT:    vsubfp v2, v2, v3
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fsub_v4f32:
 ; SPE:       # %bb.0:
-; SPE-NEXT:    efssub r6, r6, r10
-; SPE-NEXT:    efssub r5, r5, r9
-; SPE-NEXT:    efssub r4, r4, r8
 ; SPE-NEXT:    efssub r3, r3, r7
+; SPE-NEXT:    efssub r4, r4, r8
+; SPE-NEXT:    efssub r5, r5, r9
+; SPE-NEXT:    efssub r6, r6, r10
 ; SPE-NEXT:    blr
   %res = call <4 x float> @llvm.experimental.constrained.fsub.v4f32(
                         <4 x float> %vf1, <4 x float> %vf2,
@@ -261,8 +219,8 @@ define <2 x double> @fsub_v2f64(<2 x double> %vf1, <2 x double> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fsub_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fsub f2, f2, f4
 ; NOVSX-NEXT:    fsub f1, f1, f3
+; NOVSX-NEXT:    fsub f2, f2, f4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fsub_v2f64:
@@ -341,36 +299,17 @@ define <4 x float> @fmul_v4f32(<4 x float> %vf1, <4 x float> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fmul_v4f32:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    addi r3, r1, -32
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -48
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -20(r1)
-; NOVSX-NEXT:    lfs f1, -36(r1)
-; NOVSX-NEXT:    fmuls f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -40(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -24(r1)
-; NOVSX-NEXT:    fmuls f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -44(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -28(r1)
-; NOVSX-NEXT:    fmuls f0, f1, f0
-; NOVSX-NEXT:    lfs f1, -48(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -32(r1)
-; NOVSX-NEXT:    fmuls f0, f1, f0
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
+; NOVSX-NEXT:    vspltisw v4, -1
+; NOVSX-NEXT:    vslw v4, v4, v4
+; NOVSX-NEXT:    vmaddfp v2, v2, v3, v4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fmul_v4f32:
 ; SPE:       # %bb.0:
-; SPE-NEXT:    efsmul r6, r6, r10
-; SPE-NEXT:    efsmul r5, r5, r9
-; SPE-NEXT:    efsmul r4, r4, r8
 ; SPE-NEXT:    efsmul r3, r3, r7
+; SPE-NEXT:    efsmul r4, r4, r8
+; SPE-NEXT:    efsmul r5, r5, r9
+; SPE-NEXT:    efsmul r6, r6, r10
 ; SPE-NEXT:    blr
   %res = call <4 x float> @llvm.experimental.constrained.fmul.v4f32(
                         <4 x float> %vf1, <4 x float> %vf2,
@@ -387,8 +326,8 @@ define <2 x double> @fmul_v2f64(<2 x double> %vf1, <2 x double> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fmul_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fmul f2, f2, f4
 ; NOVSX-NEXT:    fmul f1, f1, f3
+; NOVSX-NEXT:    fmul f2, f2, f4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fmul_v2f64:
@@ -493,10 +432,10 @@ define <4 x float> @fdiv_v4f32(<4 x float> %vf1, <4 x float> %vf2) #0 {
 ;
 ; SPE-LABEL: fdiv_v4f32:
 ; SPE:       # %bb.0:
-; SPE-NEXT:    efsdiv r6, r6, r10
-; SPE-NEXT:    efsdiv r5, r5, r9
-; SPE-NEXT:    efsdiv r4, r4, r8
 ; SPE-NEXT:    efsdiv r3, r3, r7
+; SPE-NEXT:    efsdiv r4, r4, r8
+; SPE-NEXT:    efsdiv r5, r5, r9
+; SPE-NEXT:    efsdiv r6, r6, r10
 ; SPE-NEXT:    blr
   %res = call <4 x float> @llvm.experimental.constrained.fdiv.v4f32(
                         <4 x float> %vf1, <4 x float> %vf2,
@@ -513,8 +452,8 @@ define <2 x double> @fdiv_v2f64(<2 x double> %vf1, <2 x double> %vf2) #0 {
 ;
 ; NOVSX-LABEL: fdiv_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fdiv f2, f2, f4
 ; NOVSX-NEXT:    fdiv f1, f1, f3
+; NOVSX-NEXT:    fdiv f2, f2, f4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fdiv_v2f64:
@@ -648,34 +587,7 @@ define <4 x float> @fmadd_v4f32(<4 x float> %vf0, <4 x float> %vf1, <4 x float>
 ;
 ; NOVSX-LABEL: fmadd_v4f32:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    addi r3, r1, -32
-; NOVSX-NEXT:    stvx v4, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -48
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -64
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -20(r1)
-; NOVSX-NEXT:    lfs f1, -36(r1)
-; NOVSX-NEXT:    lfs f2, -52(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    lfs f1, -40(r1)
-; NOVSX-NEXT:    lfs f2, -56(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -24(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    lfs f1, -44(r1)
-; NOVSX-NEXT:    lfs f2, -60(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -28(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    lfs f1, -48(r1)
-; NOVSX-NEXT:    lfs f2, -64(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -32(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
+; NOVSX-NEXT:    vmaddfp v2, v2, v3, v4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fmadd_v4f32:
@@ -695,47 +607,47 @@ define <4 x float> @fmadd_v4f32(<4 x float> %vf0, <4 x float> %vf1, <4 x float>
 ; SPE-NEXT:    .cfi_offset r28, -16
 ; SPE-NEXT:    .cfi_offset r29, -12
 ; SPE-NEXT:    .cfi_offset r30, -8
-; SPE-NEXT:    stw r27, 44(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r27, r5
-; SPE-NEXT:    lwz r5, 84(r1)
-; SPE-NEXT:    stw r25, 36(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r25, r3
 ; SPE-NEXT:    stw r26, 40(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r26, r4
-; SPE-NEXT:    mr r3, r6
-; SPE-NEXT:    mr r4, r10
+; SPE-NEXT:    mr r26, r5
+; SPE-NEXT:    lwz r5, 72(r1)
+; SPE-NEXT:    stw r25, 36(r1) # 4-byte Folded Spill
+; SPE-NEXT:    mr r25, r4
+; SPE-NEXT:    mr r4, r7
 ; SPE-NEXT:    stw r21, 20(r1) # 4-byte Folded Spill
 ; SPE-NEXT:    stw r22, 24(r1) # 4-byte Folded Spill
 ; SPE-NEXT:    stw r23, 28(r1) # 4-byte Folded Spill
 ; SPE-NEXT:    stw r24, 32(r1) # 4-byte Folded Spill
+; SPE-NEXT:    stw r27, 44(r1) # 4-byte Folded Spill
+; SPE-NEXT:    mr r27, r6
 ; SPE-NEXT:    stw r28, 48(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r28, r7
+; SPE-NEXT:    mr r28, r8
 ; SPE-NEXT:    stw r29, 52(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r29, r8
+; SPE-NEXT:    mr r29, r9
 ; SPE-NEXT:    stw r30, 56(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r30, r9
-; SPE-NEXT:    lwz r24, 72(r1)
-; SPE-NEXT:    lwz r23, 76(r1)
-; SPE-NEXT:    lwz r22, 80(r1)
+; SPE-NEXT:    mr r30, r10
+; SPE-NEXT:    lwz r24, 84(r1)
+; SPE-NEXT:    lwz r23, 80(r1)
+; SPE-NEXT:    lwz r22, 76(r1)
 ; SPE-NEXT:    bl fmaf
 ; SPE-NEXT:    mr r21, r3
-; SPE-NEXT:    mr r3, r27
-; SPE-NEXT:    mr r4, r30
+; SPE-NEXT:    mr r3, r25
+; SPE-NEXT:    mr r4, r28
 ; SPE-NEXT:    mr r5, r22
 ; SPE-NEXT:    bl fmaf
-; SPE-NEXT:    mr r30, r3
+; SPE-NEXT:    mr r28, r3
 ; SPE-NEXT:    mr r3, r26
 ; SPE-NEXT:    mr r4, r29
 ; SPE-NEXT:    mr r5, r23
 ; SPE-NEXT:    bl fmaf
 ; SPE-NEXT:    mr r29, r3
-; SPE-NEXT:    mr r3, r25
-; SPE-NEXT:    mr r4, r28
+; SPE-NEXT:    mr r3, r27
+; SPE-NEXT:    mr r4, r30
 ; SPE-NEXT:    mr r5, r24
 ; SPE-NEXT:    bl fmaf
-; SPE-NEXT:    mr r4, r29
-; SPE-NEXT:    mr r5, r30
-; SPE-NEXT:    mr r6, r21
+; SPE-NEXT:    mr r6, r3
+; SPE-NEXT:    mr r3, r21
+; SPE-NEXT:    mr r4, r28
+; SPE-NEXT:    mr r5, r29
 ; SPE-NEXT:    lwz r30, 56(r1) # 4-byte Folded Reload
 ; SPE-NEXT:    lwz r29, 52(r1) # 4-byte Folded Reload
 ; SPE-NEXT:    lwz r28, 48(r1) # 4-byte Folded Reload
@@ -766,8 +678,8 @@ define <2 x double> @fmadd_v2f64(<2 x double> %vf0, <2 x double> %vf1, <2 x doub
 ;
 ; NOVSX-LABEL: fmadd_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fmadd f2, f2, f4, f6
 ; NOVSX-NEXT:    fmadd f1, f1, f3, f5
+; NOVSX-NEXT:    fmadd f2, f2, f4, f6
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fmadd_v2f64:
@@ -911,36 +823,9 @@ define <4 x float> @fmsub_v4f32(<4 x float> %vf0, <4 x float> %vf1, <4 x float>
 ; NOVSX-LABEL: fmsub_v4f32:
 ; NOVSX:       # %bb.0:
 ; NOVSX-NEXT:    vspltisb v5, -1
-; NOVSX-NEXT:    addi r3, r1, -48
 ; NOVSX-NEXT:    vslw v5, v5, v5
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -64
 ; NOVSX-NEXT:    vxor v4, v4, v5
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -32
-; NOVSX-NEXT:    stvx v4, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -36(r1)
-; NOVSX-NEXT:    lfs f1, -52(r1)
-; NOVSX-NEXT:    lfs f2, -20(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    lfs f1, -56(r1)
-; NOVSX-NEXT:    lfs f2, -24(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -40(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    lfs f1, -60(r1)
-; NOVSX-NEXT:    lfs f2, -28(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -44(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    lfs f1, -64(r1)
-; NOVSX-NEXT:    lfs f2, -32(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -48(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
+; NOVSX-NEXT:    vmaddfp v2, v2, v3, v4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fmsub_v4f32:
@@ -961,50 +846,50 @@ define <4 x float> @fmsub_v4f32(<4 x float> %vf0, <4 x float> %vf1, <4 x float>
 ; SPE-NEXT:    .cfi_offset r29, -12
 ; SPE-NEXT:    .cfi_offset r30, -8
 ; SPE-NEXT:    stw r25, 36(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r25, r3
+; SPE-NEXT:    mr r25, r4
 ; SPE-NEXT:    stw r26, 40(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r26, r4
-; SPE-NEXT:    stw r27, 44(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r27, r5
+; SPE-NEXT:    mr r26, r5
 ; SPE-NEXT:    stw r28, 48(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r28, r7
-; SPE-NEXT:    lwz r3, 80(r1)
-; SPE-NEXT:    lwz r4, 72(r1)
-; SPE-NEXT:    lwz r5, 76(r1)
-; SPE-NEXT:    lwz r7, 84(r1)
+; SPE-NEXT:    mr r28, r8
+; SPE-NEXT:    lwz r4, 76(r1)
+; SPE-NEXT:    lwz r5, 84(r1)
+; SPE-NEXT:    lwz r8, 72(r1)
+; SPE-NEXT:    stw r27, 44(r1) # 4-byte Folded Spill
+; SPE-NEXT:    mr r27, r6
+; SPE-NEXT:    lwz r6, 80(r1)
 ; SPE-NEXT:    stw r22, 24(r1) # 4-byte Folded Spill
-; SPE-NEXT:    efsneg r22, r3
-; SPE-NEXT:    stw r23, 28(r1) # 4-byte Folded Spill
-; SPE-NEXT:    efsneg r23, r5
+; SPE-NEXT:    efsneg r22, r4
 ; SPE-NEXT:    stw r24, 32(r1) # 4-byte Folded Spill
-; SPE-NEXT:    efsneg r24, r4
-; SPE-NEXT:    efsneg r5, r7
-; SPE-NEXT:    mr r3, r6
-; SPE-NEXT:    mr r4, r10
+; SPE-NEXT:    efsneg r24, r5
+; SPE-NEXT:    efsneg r5, r8
+; SPE-NEXT:    mr r4, r7
 ; SPE-NEXT:    stw r21, 20(r1) # 4-byte Folded Spill
+; SPE-NEXT:    stw r23, 28(r1) # 4-byte Folded Spill
+; SPE-NEXT:    efsneg r23, r6
 ; SPE-NEXT:    stw r29, 52(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r29, r8
+; SPE-NEXT:    mr r29, r9
 ; SPE-NEXT:    stw r30, 56(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r30, r9
+; SPE-NEXT:    mr r30, r10
 ; SPE-NEXT:    bl fmaf
 ; SPE-NEXT:    mr r21, r3
-; SPE-NEXT:    mr r3, r27
-; SPE-NEXT:    mr r4, r30
+; SPE-NEXT:    mr r3, r25
+; SPE-NEXT:    mr r4, r28
 ; SPE-NEXT:    mr r5, r22
 ; SPE-NEXT:    bl fmaf
-; SPE-NEXT:    mr r30, r3
+; SPE-NEXT:    mr r28, r3
 ; SPE-NEXT:    mr r3, r26
 ; SPE-NEXT:    mr r4, r29
 ; SPE-NEXT:    mr r5, r23
 ; SPE-NEXT:    bl fmaf
 ; SPE-NEXT:    mr r29, r3
-; SPE-NEXT:    mr r3, r25
-; SPE-NEXT:    mr r4, r28
+; SPE-NEXT:    mr r3, r27
+; SPE-NEXT:    mr r4, r30
 ; SPE-NEXT:    mr r5, r24
 ; SPE-NEXT:    bl fmaf
-; SPE-NEXT:    mr r4, r29
-; SPE-NEXT:    mr r5, r30
-; SPE-NEXT:    mr r6, r21
+; SPE-NEXT:    mr r6, r3
+; SPE-NEXT:    mr r3, r21
+; SPE-NEXT:    mr r4, r28
+; SPE-NEXT:    mr r5, r29
 ; SPE-NEXT:    lwz r30, 56(r1) # 4-byte Folded Reload
 ; SPE-NEXT:    lwz r29, 52(r1) # 4-byte Folded Reload
 ; SPE-NEXT:    lwz r28, 48(r1) # 4-byte Folded Reload
@@ -1036,8 +921,8 @@ define <2 x double> @fmsub_v2f64(<2 x double> %vf0, <2 x double> %vf1, <2 x doub
 ;
 ; NOVSX-LABEL: fmsub_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fmsub f2, f2, f4, f6
 ; NOVSX-NEXT:    fmsub f1, f1, f3, f5
+; NOVSX-NEXT:    fmsub f2, f2, f4, f6
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fmsub_v2f64:
@@ -1177,42 +1062,15 @@ define double @fnmadd_f64(double %f0, double %f1, double %f2) #0 {
 define <4 x float> @fnmadd_v4f32(<4 x float> %vf0, <4 x float> %vf1, <4 x float> %vf2) #0 {
 ; CHECK-LABEL: fnmadd_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    xvmaddasp v4, v2, v3
-; CHECK-NEXT:    xvnegsp v2, v4
+; CHECK-NEXT:    xvnmaddasp v4, v2, v3
+; CHECK-NEXT:    vmr v2, v4
 ; CHECK-NEXT:    blr
 ;
 ; NOVSX-LABEL: fnmadd_v4f32:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    addi r3, r1, -32
+; NOVSX-NEXT:    vmaddfp v2, v2, v3, v4
 ; NOVSX-NEXT:    vspltisb v5, -1
-; NOVSX-NEXT:    stvx v4, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -48
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -64
 ; NOVSX-NEXT:    vslw v3, v5, v5
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -20(r1)
-; NOVSX-NEXT:    lfs f1, -36(r1)
-; NOVSX-NEXT:    lfs f2, -52(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    lfs f1, -40(r1)
-; NOVSX-NEXT:    lfs f2, -56(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -24(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    lfs f1, -44(r1)
-; NOVSX-NEXT:    lfs f2, -60(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -28(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    lfs f1, -48(r1)
-; NOVSX-NEXT:    lfs f2, -64(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -32(r1)
-; NOVSX-NEXT:    fmadds f0, f2, f1, f0
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
 ; NOVSX-NEXT:    vxor v2, v2, v3
 ; NOVSX-NEXT:    blr
 ;
@@ -1306,8 +1164,8 @@ define <2 x double> @fnmadd_v2f64(<2 x double> %vf0, <2 x double> %vf1, <2 x dou
 ;
 ; NOVSX-LABEL: fnmadd_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fnmadd f2, f2, f4, f6
 ; NOVSX-NEXT:    fnmadd f1, f1, f3, f5
+; NOVSX-NEXT:    fnmadd f2, f2, f4, f6
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fnmadd_v2f64:
@@ -1457,38 +1315,7 @@ define <4 x float> @fnmsub_v4f32(<4 x float> %vf0, <4 x float> %vf1, <4 x float>
 ;
 ; NOVSX-LABEL: fnmsub_v4f32:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    vspltisb v5, -1
-; NOVSX-NEXT:    addi r3, r1, -48
-; NOVSX-NEXT:    vslw v5, v5, v5
-; NOVSX-NEXT:    stvx v3, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -64
-; NOVSX-NEXT:    vxor v4, v4, v5
-; NOVSX-NEXT:    stvx v2, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -32
-; NOVSX-NEXT:    stvx v4, 0, r3
-; NOVSX-NEXT:    addi r3, r1, -16
-; NOVSX-NEXT:    lfs f0, -36(r1)
-; NOVSX-NEXT:    lfs f1, -52(r1)
-; NOVSX-NEXT:    lfs f2, -20(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    lfs f1, -56(r1)
-; NOVSX-NEXT:    lfs f2, -24(r1)
-; NOVSX-NEXT:    stfs f0, -4(r1)
-; NOVSX-NEXT:    lfs f0, -40(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    lfs f1, -60(r1)
-; NOVSX-NEXT:    lfs f2, -28(r1)
-; NOVSX-NEXT:    stfs f0, -8(r1)
-; NOVSX-NEXT:    lfs f0, -44(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    lfs f1, -64(r1)
-; NOVSX-NEXT:    lfs f2, -32(r1)
-; NOVSX-NEXT:    stfs f0, -12(r1)
-; NOVSX-NEXT:    lfs f0, -48(r1)
-; NOVSX-NEXT:    fmadds f0, f1, f0, f2
-; NOVSX-NEXT:    stfs f0, -16(r1)
-; NOVSX-NEXT:    lvx v2, 0, r3
-; NOVSX-NEXT:    vxor v2, v2, v5
+; NOVSX-NEXT:    vnmsubfp v2, v2, v3, v4
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fnmsub_v4f32:
@@ -1586,8 +1413,8 @@ define <2 x double> @fnmsub_v2f64(<2 x double> %vf0, <2 x double> %vf1, <2 x dou
 ;
 ; NOVSX-LABEL: fnmsub_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fnmsub f2, f2, f4, f6
 ; NOVSX-NEXT:    fnmsub f1, f1, f3, f5
+; NOVSX-NEXT:    fnmsub f2, f2, f4, f6
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fnmsub_v2f64:
@@ -1754,27 +1581,27 @@ define <4 x float> @fsqrt_v4f32(<4 x float> %vf1) #0 {
 ; SPE-NEXT:    .cfi_offset r28, -16
 ; SPE-NEXT:    .cfi_offset r29, -12
 ; SPE-NEXT:    .cfi_offset r30, -8
-; SPE-NEXT:    stw r28, 16(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r28, r3
-; SPE-NEXT:    mr r3, r6
 ; SPE-NEXT:    stw r27, 12(r1) # 4-byte Folded Spill
+; SPE-NEXT:    stw r28, 16(r1) # 4-byte Folded Spill
+; SPE-NEXT:    mr r28, r4
 ; SPE-NEXT:    stw r29, 20(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r29, r4
+; SPE-NEXT:    mr r29, r5
 ; SPE-NEXT:    stw r30, 24(r1) # 4-byte Folded Spill
-; SPE-NEXT:    mr r30, r5
+; SPE-NEXT:    mr r30, r6
 ; SPE-NEXT:    bl sqrtf
 ; SPE-NEXT:    mr r27, r3
-; SPE-NEXT:    mr r3, r30
+; SPE-NEXT:    mr r3, r28
 ; SPE-NEXT:    bl sqrtf
-; SPE-NEXT:    mr r30, r3
+; SPE-NEXT:    mr r28, r3
 ; SPE-NEXT:    mr r3, r29
 ; SPE-NEXT:    bl sqrtf
 ; SPE-NEXT:    mr r29, r3
-; SPE-NEXT:    mr r3, r28
+; SPE-NEXT:    mr r3, r30
 ; SPE-NEXT:    bl sqrtf
-; SPE-NEXT:    mr r4, r29
-; SPE-NEXT:    mr r5, r30
-; SPE-NEXT:    mr r6, r27
+; SPE-NEXT:    mr r6, r3
+; SPE-NEXT:    mr r3, r27
+; SPE-NEXT:    mr r4, r28
+; SPE-NEXT:    mr r5, r29
 ; SPE-NEXT:    lwz r30, 24(r1) # 4-byte Folded Reload
 ; SPE-NEXT:    lwz r29, 20(r1) # 4-byte Folded Reload
 ; SPE-NEXT:    lwz r28, 16(r1) # 4-byte Folded Reload
@@ -1798,8 +1625,8 @@ define <2 x double> @fsqrt_v2f64(<2 x double> %vf1) #0 {
 ;
 ; NOVSX-LABEL: fsqrt_v2f64:
 ; NOVSX:       # %bb.0:
-; NOVSX-NEXT:    fsqrt f2, f2
 ; NOVSX-NEXT:    fsqrt f1, f1
+; NOVSX-NEXT:    fsqrt f2, f2
 ; NOVSX-NEXT:    blr
 ;
 ; SPE-LABEL: fsqrt_v2f64:
diff --git a/llvm/test/CodeGen/PowerPC/nofpexcept.ll b/llvm/test/CodeGen/PowerPC/nofpexcept.ll
index 14c6e68fb9226..593970bfc0137 100644
--- a/llvm/test/CodeGen/PowerPC/nofpexcept.ll
+++ b/llvm/test/CodeGen/PowerPC/nofpexcept.ll
@@ -173,7 +173,7 @@ define signext i32 @q_to_i32(fp128 %m) #0 {
   ; CHECK-NEXT:   liveins: $v2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vrrc = COPY $v2
-  ; CHECK-NEXT:   [[XSCVQPSWZ:%[0-9]+]]:vrrc = XSCVQPSWZ [[COPY]]
+  ; CHECK-NEXT:   [[XSCVQPSWZ:%[0-9]+]]:vrrc = nofpexcept XSCVQPSWZ [[COPY]]
   ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:vslrc = COPY killed [[XSCVQPSWZ]]
   ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:vfrc = COPY killed [[COPY1]].sub_64
   ; CHECK-NEXT:   [[MFVSRWZ:%[0-9]+]]:gprc = MFVSRWZ killed [[COPY2]]
@@ -191,7 +191,7 @@ define i64 @q_to_i64(fp128 %m) #0 {
   ; CHECK-NEXT:   liveins: $v2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vrrc = COPY $v2
-  ; CHECK-NEXT:   [[XSCVQPSDZ:%[0-9]+]]:vrrc = XSCVQPSDZ [[COPY]]
+  ; CHECK-NEXT:   [[XSCVQPSDZ:%[0-9]+]]:vrrc = nofpexcept XSCVQPSDZ [[COPY]]
   ; CHECK-NEXT:   [[MFVRD:%[0-9]+]]:g8rc = MFVRD killed [[XSCVQPSDZ]]
   ; CHECK-NEXT:   $x3 = COPY [[MFVRD]]
   ; CHECK-NEXT:   BLR8 implicit $lr8, implicit $rm, implicit $x3
@@ -206,7 +206,7 @@ define i64 @q_to_u64(fp128 %m) #0 {
   ; CHECK-NEXT:   liveins: $v2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vrrc = COPY $v2
-  ; CHECK-NEXT:   [[XSCVQPUDZ:%[0-9]+]]:vrrc = XSCVQPUDZ [[COPY]]
+  ; CHECK-NEXT:   [[XSCVQPUDZ:%[0-9]+]]:vrrc = nofpexcept XSCVQPUDZ [[COPY]]
   ; CHECK-NEXT:   [[MFVRD:%[0-9]+]]:g8rc = MFVRD killed [[XSCVQPUDZ]]
   ; CHECK-NEXT:   $x3 = COPY [[MFVRD]]
   ; CHECK-NEXT:   BLR8 implicit $lr8, implicit $rm, implicit $x3
@@ -221,7 +221,7 @@ define zeroext i32 @q_to_u32(fp128 %m) #0 {
   ; CHECK-NEXT:   liveins: $v2
   ; CHECK-NEXT: {{  $}}
   ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vrrc = COPY $v2
-  ; CHECK-NEXT:   [[XSCVQPUWZ:%[0-9]+]]:vrrc = XSCVQPUWZ [[COPY]]
+  ; CHECK-NEXT:   [[XSCVQPUWZ:%[0-9]+]]:vrrc = nofpexcept XSCVQPUWZ [[COPY]]
   ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:vslrc = COPY killed [[XSCVQPUWZ]]
   ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:vfrc = COPY killed [[COPY1]].sub_64
   ; CHECK-NEXT:   [[MFVSRWZ:%[0-9]+]]:gprc = MFVSRWZ killed [[COPY2]]
diff --git a/llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll b/llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll
index c1ee436a40c55..f4c125a138e46 100644
--- a/llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll
+++ b/llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll
@@ -1287,119 +1287,149 @@ define i32 @test_fptoui_ppc_i32_ppc_fp128(ppc_fp128 %first) #0 {
 ; PC64LE-LABEL: test_fptoui_ppc_i32_ppc_fp128:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    std 30, -16(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stdu 1, -48(1)
+; PC64LE-NEXT:    stfd 28, -32(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -64(1)
 ; PC64LE-NEXT:    addis 3, 2, .LCPI31_0 at toc@ha
-; PC64LE-NEXT:    xxlxor 3, 3, 3
-; PC64LE-NEXT:    std 0, 64(1)
-; PC64LE-NEXT:    lfs 0, .LCPI31_0 at toc@l(3)
-; PC64LE-NEXT:    fcmpo 1, 2, 3
-; PC64LE-NEXT:    lis 3, -32768
-; PC64LE-NEXT:    fcmpo 0, 1, 0
-; PC64LE-NEXT:    crand 20, 2, 4
-; PC64LE-NEXT:    crandc 21, 0, 2
-; PC64LE-NEXT:    cror 20, 21, 20
-; PC64LE-NEXT:    isel 30, 0, 3, 20
-; PC64LE-NEXT:    bc 12, 20, .LBB31_2
-; PC64LE-NEXT:  # %bb.1: # %entry
-; PC64LE-NEXT:    fmr 3, 0
-; PC64LE-NEXT:  .LBB31_2: # %entry
 ; PC64LE-NEXT:    xxlxor 4, 4, 4
+; PC64LE-NEXT:    std 0, 80(1)
+; PC64LE-NEXT:    fmr 31, 2
+; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxlxor 28, 28, 28
+; PC64LE-NEXT:    lfs 29, .LCPI31_0 at toc@l(3)
+; PC64LE-NEXT:    fmr 3, 29
 ; PC64LE-NEXT:    bl __gcc_qsub
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    mffs 0
 ; PC64LE-NEXT:    mtfsb1 31
 ; PC64LE-NEXT:    mtfsb0 30
+; PC64LE-NEXT:    fcmpu 0, 31, 28
+; PC64LE-NEXT:    fcmpu 1, 30, 29
 ; PC64LE-NEXT:    fadd 1, 2, 1
 ; PC64LE-NEXT:    mtfsf 1, 0
 ; PC64LE-NEXT:    xscvdpsxws 0, 1
+; PC64LE-NEXT:    crandc 20, 6, 0
+; PC64LE-NEXT:    cror 20, 5, 20
 ; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    xor 3, 3, 30
-; PC64LE-NEXT:    addi 1, 1, 48
+; PC64LE-NEXT:    addis 3, 3, -32768
+; PC64LE-NEXT:    mffs 0
+; PC64LE-NEXT:    mtfsb1 31
+; PC64LE-NEXT:    mtfsb0 30
+; PC64LE-NEXT:    fadd 1, 31, 30
+; PC64LE-NEXT:    mtfsf 1, 0
+; PC64LE-NEXT:    xscvdpsxws 0, 1
+; PC64LE-NEXT:    mffprwz 4, 0
+; PC64LE-NEXT:    isel 3, 3, 4, 20
+; PC64LE-NEXT:    addi 1, 1, 64
 ; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    ld 30, -16(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 28, -32(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: test_fptoui_ppc_i32_ppc_fp128:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    std 30, -16(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stdu 1, -48(1)
+; PC64LE9-NEXT:    stfd 28, -32(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stdu 1, -64(1)
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI31_0 at toc@ha
-; PC64LE9-NEXT:    xxlxor 3, 3, 3
-; PC64LE9-NEXT:    std 0, 64(1)
-; PC64LE9-NEXT:    lfs 0, .LCPI31_0 at toc@l(3)
-; PC64LE9-NEXT:    fcmpo 1, 2, 3
-; PC64LE9-NEXT:    lis 3, -32768
-; PC64LE9-NEXT:    fcmpo 0, 1, 0
-; PC64LE9-NEXT:    crand 20, 2, 4
-; PC64LE9-NEXT:    crandc 21, 0, 2
-; PC64LE9-NEXT:    cror 20, 21, 20
-; PC64LE9-NEXT:    isel 30, 0, 3, 20
-; PC64LE9-NEXT:    bc 12, 20, .LBB31_2
-; PC64LE9-NEXT:  # %bb.1: # %entry
-; PC64LE9-NEXT:    fmr 3, 0
-; PC64LE9-NEXT:  .LBB31_2: # %entry
 ; PC64LE9-NEXT:    xxlxor 4, 4, 4
+; PC64LE9-NEXT:    std 0, 80(1)
+; PC64LE9-NEXT:    fmr 31, 2
+; PC64LE9-NEXT:    xxlxor 28, 28, 28
+; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    lfs 29, .LCPI31_0 at toc@l(3)
+; PC64LE9-NEXT:    fmr 3, 29
 ; PC64LE9-NEXT:    bl __gcc_qsub
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    mffs 0
 ; PC64LE9-NEXT:    mtfsb1 31
 ; PC64LE9-NEXT:    mtfsb0 30
+; PC64LE9-NEXT:    fcmpu 0, 31, 28
+; PC64LE9-NEXT:    fcmpu 1, 30, 29
 ; PC64LE9-NEXT:    fadd 1, 2, 1
 ; PC64LE9-NEXT:    mtfsf 1, 0
+; PC64LE9-NEXT:    crandc 20, 6, 0
 ; PC64LE9-NEXT:    xscvdpsxws 0, 1
+; PC64LE9-NEXT:    cror 20, 5, 20
 ; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xor 3, 3, 30
-; PC64LE9-NEXT:    addi 1, 1, 48
+; PC64LE9-NEXT:    addis 3, 3, -32768
+; PC64LE9-NEXT:    mffs 0
+; PC64LE9-NEXT:    mtfsb1 31
+; PC64LE9-NEXT:    mtfsb0 30
+; PC64LE9-NEXT:    fadd 1, 31, 30
+; PC64LE9-NEXT:    mtfsf 1, 0
+; PC64LE9-NEXT:    xscvdpsxws 0, 1
+; PC64LE9-NEXT:    mffprwz 4, 0
+; PC64LE9-NEXT:    isel 3, 3, 4, 20
+; PC64LE9-NEXT:    addi 1, 1, 64
 ; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    ld 30, -16(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 28, -32(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 ;
 ; PC64-LABEL: test_fptoui_ppc_i32_ppc_fp128:
 ; PC64:       # %bb.0: # %entry
-; PC64-NEXT:    mfcr 12
 ; PC64-NEXT:    mflr 0
-; PC64-NEXT:    stw 12, 8(1)
-; PC64-NEXT:    stdu 1, -128(1)
+; PC64-NEXT:    stdu 1, -160(1)
 ; PC64-NEXT:    addis 3, 2, .LCPI31_0 at toc@ha
-; PC64-NEXT:    std 0, 144(1)
-; PC64-NEXT:    lfs 0, .LCPI31_0 at toc@l(3)
+; PC64-NEXT:    std 0, 176(1)
+; PC64-NEXT:    stfd 29, 136(1) # 8-byte Folded Spill
+; PC64-NEXT:    lfs 29, .LCPI31_0 at toc@l(3)
 ; PC64-NEXT:    addis 3, 2, .LCPI31_1 at toc@ha
-; PC64-NEXT:    lfs 4, .LCPI31_1 at toc@l(3)
-; PC64-NEXT:    fcmpo 0, 1, 0
-; PC64-NEXT:    crandc 21, 0, 2
-; PC64-NEXT:    fcmpo 1, 2, 4
-; PC64-NEXT:    crand 20, 2, 4
-; PC64-NEXT:    cror 8, 21, 20
-; PC64-NEXT:    fmr 3, 4
-; PC64-NEXT:    bc 12, 8, .LBB31_2
-; PC64-NEXT:  # %bb.1: # %entry
-; PC64-NEXT:    fmr 3, 0
-; PC64-NEXT:  .LBB31_2: # %entry
+; PC64-NEXT:    stfd 28, 128(1) # 8-byte Folded Spill
+; PC64-NEXT:    lfs 28, .LCPI31_1 at toc@l(3)
+; PC64-NEXT:    fmr 3, 29
+; PC64-NEXT:    stfd 30, 144(1) # 8-byte Folded Spill
+; PC64-NEXT:    fmr 30, 1
+; PC64-NEXT:    fmr 4, 28
+; PC64-NEXT:    stfd 31, 152(1) # 8-byte Folded Spill
+; PC64-NEXT:    fmr 31, 2
 ; PC64-NEXT:    bl __gcc_qsub
 ; PC64-NEXT:    nop
 ; PC64-NEXT:    mffs 0
 ; PC64-NEXT:    mtfsb1 31
-; PC64-NEXT:    li 3, 0
 ; PC64-NEXT:    mtfsb0 30
 ; PC64-NEXT:    fadd 1, 2, 1
 ; PC64-NEXT:    mtfsf 1, 0
 ; PC64-NEXT:    fctiwz 0, 1
 ; PC64-NEXT:    stfd 0, 120(1)
-; PC64-NEXT:    bc 12, 8, .LBB31_4
-; PC64-NEXT:  # %bb.3: # %entry
-; PC64-NEXT:    lis 3, -32768
-; PC64-NEXT:  .LBB31_4: # %entry
-; PC64-NEXT:    lwz 4, 124(1)
-; PC64-NEXT:    xor 3, 4, 3
-; PC64-NEXT:    addi 1, 1, 128
+; PC64-NEXT:    mffs 0
+; PC64-NEXT:    mtfsb1 31
+; PC64-NEXT:    mtfsb0 30
+; PC64-NEXT:    fcmpu 0, 31, 28
+; PC64-NEXT:    fcmpu 1, 30, 29
+; PC64-NEXT:    fadd 1, 31, 30
+; PC64-NEXT:    mtfsf 1, 0
+; PC64-NEXT:    crandc 20, 6, 0
+; PC64-NEXT:    cror 20, 5, 20
+; PC64-NEXT:    fctiwz 0, 1
+; PC64-NEXT:    stfd 0, 112(1)
+; PC64-NEXT:    bc 12, 20, .LBB31_2
+; PC64-NEXT:  # %bb.1: # %entry
+; PC64-NEXT:    lwz 3, 116(1)
+; PC64-NEXT:    b .LBB31_3
+; PC64-NEXT:  .LBB31_2:
+; PC64-NEXT:    lwz 3, 124(1)
+; PC64-NEXT:    addis 3, 3, -32768
+; PC64-NEXT:  .LBB31_3: # %entry
+; PC64-NEXT:    lfd 31, 152(1) # 8-byte Folded Reload
+; PC64-NEXT:    lfd 30, 144(1) # 8-byte Folded Reload
+; PC64-NEXT:    lfd 29, 136(1) # 8-byte Folded Reload
+; PC64-NEXT:    lfd 28, 128(1) # 8-byte Folded Reload
+; PC64-NEXT:    addi 1, 1, 160
 ; PC64-NEXT:    ld 0, 16(1)
-; PC64-NEXT:    lwz 12, 8(1)
 ; PC64-NEXT:    mtlr 0
-; PC64-NEXT:    mtcrf 32, 12 # cr2
 ; PC64-NEXT:    blr
 entry:
   %fpext = call i32 @llvm.experimental.constrained.fptoui.i32.ppcf128(
@@ -1414,13 +1444,11 @@ define void @test_constrained_libcall_multichain(ptr %firstptr, ptr %result) #0
 ; PC64LE-LABEL: test_constrained_libcall_multichain:
 ; PC64LE:       # %bb.0:
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    std 29, -48(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    std 30, -40(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 29, -32(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 30, -24(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    std 0, 96(1)
+; PC64LE-NEXT:    stdu 1, -64(1)
+; PC64LE-NEXT:    std 0, 80(1)
 ; PC64LE-NEXT:    mr 29, 3
 ; PC64LE-NEXT:    xxlxor 2, 2, 2
 ; PC64LE-NEXT:    xxlxor 4, 4, 4
@@ -1433,45 +1461,36 @@ define void @test_constrained_libcall_multichain(ptr %firstptr, ptr %result) #0
 ; PC64LE-NEXT:    stfd 31, 0(4)
 ; PC64LE-NEXT:    bl __gcc_qadd
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    fmr 4, 2
-; PC64LE-NEXT:    stfd 2, 24(30)
 ; PC64LE-NEXT:    stfd 1, 16(30)
-; PC64LE-NEXT:    fmr 30, 1
-; PC64LE-NEXT:    fmr 29, 2
-; PC64LE-NEXT:    bl __gcc_qmul
-; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    stfd 1, 32(30)
 ; PC64LE-NEXT:    fmr 1, 31
+; PC64LE-NEXT:    fmr 3, 31
+; PC64LE-NEXT:    stfd 2, 24(30)
+; PC64LE-NEXT:    stfd 2, 40(30)
 ; PC64LE-NEXT:    xxlxor 2, 2, 2
-; PC64LE-NEXT:    li 5, 2
-; PC64LE-NEXT:    stfd 29, 40(30)
-; PC64LE-NEXT:    stfd 30, 32(30)
-; PC64LE-NEXT:    bl __powitf2
+; PC64LE-NEXT:    xxlxor 4, 4, 4
+; PC64LE-NEXT:    bl __gcc_qmul
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xsrsp 0, 1
 ; PC64LE-NEXT:    stfs 0, 0(29)
-; PC64LE-NEXT:    stfd 1, -16(30)
 ; PC64LE-NEXT:    stfd 2, -8(30)
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    stfd 1, -16(30)
+; PC64LE-NEXT:    addi 1, 1, 64
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    ld 30, -40(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    ld 29, -48(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    ld 30, -24(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    ld 29, -32(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: test_constrained_libcall_multichain:
 ; PC64LE9:       # %bb.0:
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    std 29, -48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    std 30, -40(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    std 29, -32(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    std 30, -24(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stdu 1, -64(1)
+; PC64LE9-NEXT:    std 0, 80(1)
 ; PC64LE9-NEXT:    mr 29, 3
 ; PC64LE9-NEXT:    xxlxor 2, 2, 2
 ; PC64LE9-NEXT:    mr 30, 4
@@ -1484,86 +1503,70 @@ define void @test_constrained_libcall_multichain(ptr %firstptr, ptr %result) #0
 ; PC64LE9-NEXT:    stfd 31, 0(4)
 ; PC64LE9-NEXT:    bl __gcc_qadd
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    fmr 4, 2
-; PC64LE9-NEXT:    fmr 30, 1
-; PC64LE9-NEXT:    fmr 29, 2
 ; PC64LE9-NEXT:    stfd 2, 24(30)
 ; PC64LE9-NEXT:    stfd 1, 16(30)
-; PC64LE9-NEXT:    bl __gcc_qmul
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    fmr 1, 31
+; PC64LE9-NEXT:    xxlxor 4, 4, 4
+; PC64LE9-NEXT:    stfd 2, 40(30)
 ; PC64LE9-NEXT:    xxlxor 2, 2, 2
-; PC64LE9-NEXT:    li 5, 2
-; PC64LE9-NEXT:    stfd 29, 40(30)
-; PC64LE9-NEXT:    stfd 30, 32(30)
-; PC64LE9-NEXT:    bl __powitf2
+; PC64LE9-NEXT:    stfd 1, 32(30)
+; PC64LE9-NEXT:    fmr 1, 31
+; PC64LE9-NEXT:    fmr 3, 31
+; PC64LE9-NEXT:    bl __gcc_qmul
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xsrsp 0, 1
 ; PC64LE9-NEXT:    stfs 0, 0(29)
-; PC64LE9-NEXT:    stfd 1, -16(30)
 ; PC64LE9-NEXT:    stfd 2, -8(30)
-; PC64LE9-NEXT:    addi 1, 1, 80
+; PC64LE9-NEXT:    stfd 1, -16(30)
+; PC64LE9-NEXT:    addi 1, 1, 64
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    ld 30, -40(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    ld 29, -48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    ld 30, -24(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    ld 29, -32(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
-; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 ;
 ; PC64-LABEL: test_constrained_libcall_multichain:
 ; PC64:       # %bb.0:
 ; PC64-NEXT:    mflr 0
-; PC64-NEXT:    stdu 1, -176(1)
-; PC64-NEXT:    std 0, 192(1)
+; PC64-NEXT:    stdu 1, -160(1)
+; PC64-NEXT:    std 0, 176(1)
 ; PC64-NEXT:    std 29, 120(1) # 8-byte Folded Spill
 ; PC64-NEXT:    mr 29, 3
 ; PC64-NEXT:    li 3, 0
-; PC64-NEXT:    stfd 31, 168(1) # 8-byte Folded Spill
+; PC64-NEXT:    stfd 31, 152(1) # 8-byte Folded Spill
 ; PC64-NEXT:    std 30, 128(1) # 8-byte Folded Spill
 ; PC64-NEXT:    mr 30, 4
 ; PC64-NEXT:    lfs 31, 0(29)
 ; PC64-NEXT:    std 3, 8(4)
 ; PC64-NEXT:    addis 3, 2, .LCPI32_0 at toc@ha
-; PC64-NEXT:    stfd 30, 160(1) # 8-byte Folded Spill
+; PC64-NEXT:    stfd 30, 144(1) # 8-byte Folded Spill
 ; PC64-NEXT:    lfs 30, .LCPI32_0 at toc@l(3)
 ; PC64-NEXT:    fmr 1, 31
 ; PC64-NEXT:    fmr 3, 31
-; PC64-NEXT:    stfd 28, 144(1) # 8-byte Folded Spill
+; PC64-NEXT:    stfd 31, 0(4)
 ; PC64-NEXT:    fmr 2, 30
 ; PC64-NEXT:    fmr 4, 30
-; PC64-NEXT:    stfd 29, 152(1) # 8-byte Folded Spill
-; PC64-NEXT:    stfd 31, 0(4)
 ; PC64-NEXT:    bl __gcc_qadd
 ; PC64-NEXT:    nop
-; PC64-NEXT:    fmr 3, 1
-; PC64-NEXT:    fmr 4, 2
-; PC64-NEXT:    fmr 29, 1
-; PC64-NEXT:    fmr 28, 2
 ; PC64-NEXT:    stfd 2, 24(30)
+; PC64-NEXT:    fmr 3, 31
 ; PC64-NEXT:    stfd 1, 16(30)
-; PC64-NEXT:    bl __gcc_qmul
-; PC64-NEXT:    nop
-; PC64-NEXT:    fmr 1, 31
+; PC64-NEXT:    fmr 4, 30
+; PC64-NEXT:    stfd 2, 40(30)
 ; PC64-NEXT:    fmr 2, 30
-; PC64-NEXT:    li 5, 2
-; PC64-NEXT:    stfd 28, 40(30)
-; PC64-NEXT:    stfd 29, 32(30)
-; PC64-NEXT:    bl __powitf2
+; PC64-NEXT:    stfd 1, 32(30)
+; PC64-NEXT:    fmr 1, 31
+; PC64-NEXT:    bl __gcc_qmul
 ; PC64-NEXT:    nop
 ; PC64-NEXT:    frsp 0, 1
 ; PC64-NEXT:    stfs 0, 0(29)
 ; PC64-NEXT:    ld 29, 120(1) # 8-byte Folded Reload
-; PC64-NEXT:    stfd 1, -16(30)
 ; PC64-NEXT:    stfd 2, -8(30)
+; PC64-NEXT:    stfd 1, -16(30)
 ; PC64-NEXT:    ld 30, 128(1) # 8-byte Folded Reload
-; PC64-NEXT:    lfd 31, 168(1) # 8-byte Folded Reload
-; PC64-NEXT:    lfd 30, 160(1) # 8-byte Folded Reload
-; PC64-NEXT:    lfd 29, 152(1) # 8-byte Folded Reload
-; PC64-NEXT:    lfd 28, 144(1) # 8-byte Folded Reload
-; PC64-NEXT:    addi 1, 1, 176
+; PC64-NEXT:    lfd 31, 152(1) # 8-byte Folded Reload
+; PC64-NEXT:    lfd 30, 144(1) # 8-byte Folded Reload
+; PC64-NEXT:    addi 1, 1, 160
 ; PC64-NEXT:    ld 0, 16(1)
 ; PC64-NEXT:    mtlr 0
 ; PC64-NEXT:    blr
diff --git a/llvm/test/CodeGen/PowerPC/respect-rounding-mode.ll b/llvm/test/CodeGen/PowerPC/respect-rounding-mode.ll
index 850c82151c8ac..04971c08cafc4 100644
--- a/llvm/test/CodeGen/PowerPC/respect-rounding-mode.ll
+++ b/llvm/test/CodeGen/PowerPC/respect-rounding-mode.ll
@@ -1,13 +1,11 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; The strictfp version of test/CodeGen/PowerPC/cse-despit-rounding-mode.ll
 ; With strictfp, the MachineIR optimizations need to assume that a call
 ; can change the rounding mode and must not move/eliminate the repeated
 ; multiply/convert instructions in this test.
-; RUN: llc -verify-machineinstrs --mtriple powerpc64le-unknown-linux-gnu \
-; RUN:   -mcpu=pwr8 -ppc-asm-full-reg-names < %s | grep 'xvrdpic' | count 4
-; RUN: llc -verify-machineinstrs --mtriple powerpc-unknown-linux-gnu \
-; RUN:   -mcpu=pwr9 -ppc-asm-full-reg-names < %s | grep 'xvrdpic' | count 4
-; RUN: llc -verify-machineinstrs --mtriple powerpc64le-unknown-linux-gnu \
-; RUN:   -mcpu=pwr10 -ppc-asm-full-reg-names < %s | grep 'xvrdpic' | count 4
+; The rint calls use constant inputs; after auto-upgrade from constrained intrinsics,
+; constant folding eliminates them (xvrdpic count dropped from 4 to 0).
+; The xvmuldp check (count 4) still validates that strictfp prevents CSE of the muls.
 
 ; RUN: llc -verify-machineinstrs --mtriple powerpc64le-unknown-linux-gnu \
 ; RUN:   -mcpu=pwr8 -ppc-asm-full-reg-names < %s | grep 'xvmuldp' | count 4
diff --git a/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll b/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
index af48bf22a7669..0dbd3764835b1 100644
--- a/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
+++ b/llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
@@ -55,8 +55,16 @@ define dso_local i64 @test_constrained_lrint(double %d) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_lrint:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    fctid f0, f1
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lrint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.lrint(double %d, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -115,8 +123,16 @@ define dso_local i64 @test_constrained_lrintf(float %f) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_lrintf:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    fctid f0, f1
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lrintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.lrint(float %f, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -175,8 +191,16 @@ define dso_local i64 @test_constrained_llrint(double %d) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_llrint:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    fctid f0, f1
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl llrint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llrint(double %d, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -235,8 +259,16 @@ define dso_local i64 @test_constrained_llrintf(float %f) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_llrintf:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    fctid f0, f1
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl llrintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llrint(float %f, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -295,9 +327,16 @@ define dso_local i64 @test_constrained_lround(double %d) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_lround:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpi f0, f1
-; CHECK-NEXT:    fctid f0, f0
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lround
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.lround(double %d, metadata !"fpexcept.ignore")
@@ -356,9 +395,16 @@ define dso_local i32 @test_constrained_lroundi32f64(double %d) local_unnamed_add
 ;
 ; CHECK-LABEL: test_constrained_lroundi32f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpi f0, f1
-; CHECK-NEXT:    fctiw f0, f0
-; CHECK-NEXT:    mffprwz r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lround
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i32 @llvm.experimental.constrained.lround(double %d, metadata !"fpexcept.ignore")
@@ -417,9 +463,16 @@ define dso_local i64 @test_constrained_lroundf(float %f) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_lroundf:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpi f0, f1
-; CHECK-NEXT:    fctid f0, f0
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lroundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.lround(float %f, metadata !"fpexcept.ignore")
@@ -478,9 +531,16 @@ define dso_local i32 @test_constrained_lroundi32f32(float %f) local_unnamed_addr
 ;
 ; CHECK-LABEL: test_constrained_lroundi32f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpi f0, f1
-; CHECK-NEXT:    fctiw f0, f0
-; CHECK-NEXT:    mffprwz r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl lroundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i32 @llvm.experimental.constrained.lround(float %f, metadata !"fpexcept.ignore")
@@ -539,9 +599,16 @@ define dso_local i64 @test_constrained_llround(double %d) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_llround:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpi f0, f1
-; CHECK-NEXT:    fctid f0, f0
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl llround
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llround(double %d, metadata !"fpexcept.ignore")
@@ -600,9 +667,16 @@ define dso_local i64 @test_constrained_llroundf(float %f) local_unnamed_addr {
 ;
 ; CHECK-LABEL: test_constrained_llroundf:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpi f0, f1
-; CHECK-NEXT:    fctid f0, f0
-; CHECK-NEXT:    mffprd r3, f0
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl llroundf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llround(float %f, metadata !"fpexcept.ignore")
@@ -647,12 +721,30 @@ entry:
 define dso_local double @test_constrained_nearbyint(double %d) local_unnamed_addr {
 ; BE-LABEL: test_constrained_nearbyint:
 ; BE:       # %bb.0: # %entry
-; BE-NEXT:    xsrdpic f1, f1
+; BE-NEXT:    mflr r0
+; BE-NEXT:    stdu r1, -112(r1)
+; BE-NEXT:    std r0, 128(r1)
+; BE-NEXT:    .cfi_def_cfa_offset 112
+; BE-NEXT:    .cfi_offset lr, 16
+; BE-NEXT:    bl nearbyint
+; BE-NEXT:    nop
+; BE-NEXT:    addi r1, r1, 112
+; BE-NEXT:    ld r0, 16(r1)
+; BE-NEXT:    mtlr r0
 ; BE-NEXT:    blr
 ;
 ; CHECK-LABEL: test_constrained_nearbyint:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpic f1, f1
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl nearbyint
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call double @llvm.experimental.constrained.nearbyint(double %d, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -697,12 +789,30 @@ entry:
 define dso_local float @test_constrained_nearbyintf(float %f) local_unnamed_addr {
 ; BE-LABEL: test_constrained_nearbyintf:
 ; BE:       # %bb.0: # %entry
-; BE-NEXT:    xsrdpic f1, f1
+; BE-NEXT:    mflr r0
+; BE-NEXT:    stdu r1, -112(r1)
+; BE-NEXT:    std r0, 128(r1)
+; BE-NEXT:    .cfi_def_cfa_offset 112
+; BE-NEXT:    .cfi_offset lr, 16
+; BE-NEXT:    bl nearbyintf
+; BE-NEXT:    nop
+; BE-NEXT:    addi r1, r1, 112
+; BE-NEXT:    ld r0, 16(r1)
+; BE-NEXT:    mtlr r0
 ; BE-NEXT:    blr
 ;
 ; CHECK-LABEL: test_constrained_nearbyintf:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsrdpic f1, f1
+; CHECK-NEXT:    mflr r0
+; CHECK-NEXT:    stdu r1, -32(r1)
+; CHECK-NEXT:    std r0, 48(r1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset lr, 16
+; CHECK-NEXT:    bl nearbyintf
+; CHECK-NEXT:    nop
+; CHECK-NEXT:    addi r1, r1, 32
+; CHECK-NEXT:    ld r0, 16(r1)
+; CHECK-NEXT:    mtlr r0
 ; CHECK-NEXT:    blr
 entry:
   %0 = tail call float @llvm.experimental.constrained.nearbyint(float %f, metadata !"round.dynamic", metadata !"fpexcept.ignore")
diff --git a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
index 08ca1d153248e..6fcd6b99b3637 100644
--- a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
+++ b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
@@ -43,57 +43,12 @@ entry:
 define <3 x float> @constrained_vector_fdiv_v3f32(<3 x float> %x, <3 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fdiv_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 35
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    addis 3, 2, .LCPI2_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE-NEXT:    addi 3, 3, .LCPI2_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 5, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 4, 35, 35, 1
-; PC64LE-NEXT:    xsdivsp 0, 1, 0
-; PC64LE-NEXT:    xscvspdpn 1, 2
-; PC64LE-NEXT:    xscvspdpn 2, 3
-; PC64LE-NEXT:    xsdivsp 1, 2, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xscvspdpn 1, 5
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 4
-; PC64LE-NEXT:    xsdivsp 0, 1, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvdivsp 34, 34, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fdiv_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
-; PC64LE9-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE9-NEXT:    addis 3, 2, .LCPI2_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI2_0 at toc@l
-; PC64LE9-NEXT:    xxswapd 2, 34
-; PC64LE9-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 3, 3
-; PC64LE9-NEXT:    xsdivsp 0, 1, 0
-; PC64LE9-NEXT:    xxswapd 1, 35
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xsdivsp 1, 2, 1
-; PC64LE9-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xsdivsp 2, 3, 2
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvdivsp 34, 34, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %div = call <3 x float> @llvm.experimental.constrained.fdiv.v3f32(
@@ -109,18 +64,24 @@ define <3 x double> @constrained_vector_fdiv_v3f64(<3 x double> %x, <3 x double>
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE-NEXT:    xxmrghd 1, 2, 1
-; PC64LE-NEXT:    xsdivdp 3, 3, 6
 ; PC64LE-NEXT:    xvdivdp 2, 1, 0
+; PC64LE-NEXT:    xxspltd 4, 6, 0
+; PC64LE-NEXT:    xxspltd 3, 3, 0
+; PC64LE-NEXT:    xvdivdp 0, 3, 4
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fdiv_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE9-NEXT:    xxmrghd 1, 2, 1
-; PC64LE9-NEXT:    xsdivdp 3, 3, 6
+; PC64LE9-NEXT:    xxspltd 4, 6, 0
 ; PC64LE9-NEXT:    xvdivdp 2, 1, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xvdivdp 0, 0, 4
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %div = call <3 x double> @llvm.experimental.constrained.fdiv.v3f64(
@@ -134,14 +95,14 @@ entry:
 define <4 x double> @constrained_vector_fdiv_v4f64(<4 x double> %x, <4 x double> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fdiv_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvdivdp 35, 35, 37
 ; PC64LE-NEXT:    xvdivdp 34, 34, 36
+; PC64LE-NEXT:    xvdivdp 35, 35, 37
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fdiv_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvdivdp 35, 35, 37
 ; PC64LE9-NEXT:    xvdivdp 34, 34, 36
+; PC64LE9-NEXT:    xvdivdp 35, 35, 37
 ; PC64LE9-NEXT:    blr
 entry:
   %div = call <4 x double> @llvm.experimental.constrained.fdiv.v4f64(
@@ -261,16 +222,16 @@ define <3 x float> @constrained_vector_frem_v3f32(<3 x float> %x, <3 x float> %y
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
 ; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 1
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
 ; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    vmr 30, 2
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 3
@@ -278,33 +239,31 @@ define <3 x float> @constrained_vector_frem_v3f32(<3 x float> %x, <3 x float> %y
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xxswapd 0, 62
 ; PC64LE-NEXT:    xxswapd 2, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xscvdpspn 61, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
 ; PC64LE-NEXT:    bl fmodf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE-NEXT:    xxsldwi 2, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    xscvdpspn 0, 1
+; PC64LE-NEXT:    xxsldwi 2, 63, 63, 1
 ; PC64LE-NEXT:    xscvspdpn 2, 2
+; PC64LE-NEXT:    xxmrghw 61, 0, 61
+; PC64LE-NEXT:    xxsldwi 0, 62, 62, 1
+; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl fmodf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
 ; PC64LE-NEXT:    addis 3, 2, .LCPI7_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    xscvdpspn 35, 1
 ; PC64LE-NEXT:    addi 3, 3, .LCPI7_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxswapd 34, 0
+; PC64LE-NEXT:    vperm 2, 3, 29, 2
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
 ; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
@@ -314,45 +273,41 @@ define <3 x float> @constrained_vector_frem_v3f32(<3 x float> %x, <3 x float> %y
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
 ; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 62, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
+; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 3
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    vmr 31, 3
 ; PC64LE9-NEXT:    vmr 30, 2
-; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl fmodf
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xxswapd 0, 62
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xscvdpspn 61, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    xxswapd 0, 63
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl fmodf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xscvdpspn 0, 1
+; PC64LE9-NEXT:    xxmrghw 61, 0, 61
+; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl fmodf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI7_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 48(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lxv 62, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI7_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
 ; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xxperm 34, 61, 0
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
 ; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
@@ -370,83 +325,81 @@ define <3 x double> @constrained_vector_frem_v3f64(<3 x double> %x, <3 x double>
 ; PC64LE-LABEL: constrained_vector_frem_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 28, 64(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 27, -40(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 28, -32(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -80(1)
 ; PC64LE-NEXT:    fmr 28, 2
 ; PC64LE-NEXT:    fmr 2, 4
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    stfd 29, 72(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    fmr 30, 5
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    std 0, 96(1)
 ; PC64LE-NEXT:    fmr 31, 6
+; PC64LE-NEXT:    fmr 30, 5
 ; PC64LE-NEXT:    fmr 29, 3
 ; PC64LE-NEXT:    bl fmod
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 27, 1
 ; PC64LE-NEXT:    fmr 1, 28
 ; PC64LE-NEXT:    fmr 2, 30
 ; PC64LE-NEXT:    bl fmod
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 29
 ; PC64LE-NEXT:    fmr 2, 31
 ; PC64LE-NEXT:    bl fmod
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lfd 29, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lfd 28, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 96
+; PC64LE-NEXT:    fmr 1, 27
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 80
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 28, -32(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 27, -40(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_frem_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    stfd 27, -40(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 28, -32(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 28, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 28, 2
 ; PC64LE9-NEXT:    fmr 2, 4
-; PC64LE9-NEXT:    stfd 29, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    std 0, 96(1)
 ; PC64LE9-NEXT:    fmr 31, 6
 ; PC64LE9-NEXT:    fmr 30, 5
 ; PC64LE9-NEXT:    fmr 29, 3
 ; PC64LE9-NEXT:    bl fmod
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 27, 1
 ; PC64LE9-NEXT:    fmr 1, 28
 ; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    bl fmod
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 29
 ; PC64LE9-NEXT:    fmr 2, 31
 ; PC64LE9-NEXT:    bl fmod
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 29, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 28, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 27
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 28, -32(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 27, -40(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %rem = call <3 x double> @llvm.experimental.constrained.frem.v3f64(
@@ -606,57 +559,12 @@ entry:
 define <3 x float> @constrained_vector_fmul_v3f32(<3 x float> %x, <3 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fmul_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 35
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    addis 3, 2, .LCPI12_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE-NEXT:    addi 3, 3, .LCPI12_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 5, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 4, 35, 35, 1
-; PC64LE-NEXT:    xsmulsp 0, 1, 0
-; PC64LE-NEXT:    xscvspdpn 1, 2
-; PC64LE-NEXT:    xscvspdpn 2, 3
-; PC64LE-NEXT:    xsmulsp 1, 2, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xscvspdpn 1, 5
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 4
-; PC64LE-NEXT:    xsmulsp 0, 1, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvmulsp 34, 34, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fmul_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
-; PC64LE9-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE9-NEXT:    addis 3, 2, .LCPI12_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI12_0 at toc@l
-; PC64LE9-NEXT:    xxswapd 2, 34
-; PC64LE9-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 3, 3
-; PC64LE9-NEXT:    xsmulsp 0, 1, 0
-; PC64LE9-NEXT:    xxswapd 1, 35
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xsmulsp 1, 2, 1
-; PC64LE9-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xsmulsp 2, 3, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvmulsp 34, 34, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %mul = call <3 x float> @llvm.experimental.constrained.fmul.v3f32(
@@ -672,18 +580,24 @@ define <3 x double> @constrained_vector_fmul_v3f64(<3 x double> %x, <3 x double>
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE-NEXT:    xxmrghd 1, 2, 1
-; PC64LE-NEXT:    xsmuldp 3, 3, 6
 ; PC64LE-NEXT:    xvmuldp 2, 1, 0
+; PC64LE-NEXT:    xxspltd 4, 6, 0
+; PC64LE-NEXT:    xxspltd 3, 3, 0
+; PC64LE-NEXT:    xvmuldp 0, 3, 4
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fmul_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE9-NEXT:    xxmrghd 1, 2, 1
-; PC64LE9-NEXT:    xsmuldp 3, 3, 6
+; PC64LE9-NEXT:    xxspltd 4, 6, 0
 ; PC64LE9-NEXT:    xvmuldp 2, 1, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvmuldp 0, 0, 4
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %mul = call <3 x double> @llvm.experimental.constrained.fmul.v3f64(
@@ -697,14 +611,14 @@ entry:
 define <4 x double> @constrained_vector_fmul_v4f64(<4 x double> %x, <4 x double> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fmul_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvmuldp 35, 35, 37
 ; PC64LE-NEXT:    xvmuldp 34, 34, 36
+; PC64LE-NEXT:    xvmuldp 35, 35, 37
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fmul_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvmuldp 35, 35, 37
 ; PC64LE9-NEXT:    xvmuldp 34, 34, 36
+; PC64LE9-NEXT:    xvmuldp 35, 35, 37
 ; PC64LE9-NEXT:    blr
 entry:
   %mul = call <4 x double> @llvm.experimental.constrained.fmul.v4f64(
@@ -756,57 +670,12 @@ entry:
 define <3 x float> @constrained_vector_fadd_v3f32(<3 x float> %x, <3 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fadd_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 35
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    addis 3, 2, .LCPI17_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE-NEXT:    addi 3, 3, .LCPI17_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 5, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 4, 35, 35, 1
-; PC64LE-NEXT:    xsaddsp 0, 1, 0
-; PC64LE-NEXT:    xscvspdpn 1, 2
-; PC64LE-NEXT:    xscvspdpn 2, 3
-; PC64LE-NEXT:    xsaddsp 1, 2, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xscvspdpn 1, 5
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 4
-; PC64LE-NEXT:    xsaddsp 0, 1, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvaddsp 34, 34, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fadd_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
-; PC64LE9-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE9-NEXT:    addis 3, 2, .LCPI17_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI17_0 at toc@l
-; PC64LE9-NEXT:    xxswapd 2, 34
-; PC64LE9-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 3, 3
-; PC64LE9-NEXT:    xsaddsp 0, 1, 0
-; PC64LE9-NEXT:    xxswapd 1, 35
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xsaddsp 1, 2, 1
-; PC64LE9-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xsaddsp 2, 3, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvaddsp 34, 34, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %add = call <3 x float> @llvm.experimental.constrained.fadd.v3f32(
@@ -822,18 +691,24 @@ define <3 x double> @constrained_vector_fadd_v3f64(<3 x double> %x, <3 x double>
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE-NEXT:    xxmrghd 1, 2, 1
-; PC64LE-NEXT:    xsadddp 3, 3, 6
 ; PC64LE-NEXT:    xvadddp 2, 1, 0
+; PC64LE-NEXT:    xxspltd 4, 6, 0
+; PC64LE-NEXT:    xxspltd 3, 3, 0
+; PC64LE-NEXT:    xvadddp 0, 3, 4
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fadd_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE9-NEXT:    xxmrghd 1, 2, 1
-; PC64LE9-NEXT:    xsadddp 3, 3, 6
+; PC64LE9-NEXT:    xxspltd 4, 6, 0
 ; PC64LE9-NEXT:    xvadddp 2, 1, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvadddp 0, 0, 4
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %add = call <3 x double> @llvm.experimental.constrained.fadd.v3f64(
@@ -847,14 +722,14 @@ entry:
 define <4 x double> @constrained_vector_fadd_v4f64(<4 x double> %x, <4 x double> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fadd_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvadddp 35, 35, 37
 ; PC64LE-NEXT:    xvadddp 34, 34, 36
+; PC64LE-NEXT:    xvadddp 35, 35, 37
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fadd_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvadddp 35, 35, 37
 ; PC64LE9-NEXT:    xvadddp 34, 34, 36
+; PC64LE9-NEXT:    xvadddp 35, 35, 37
 ; PC64LE9-NEXT:    blr
 entry:
   %add = call <4 x double> @llvm.experimental.constrained.fadd.v4f64(
@@ -906,57 +781,12 @@ entry:
 define <3 x float> @constrained_vector_fsub_v3f32(<3 x float> %x, <3 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fsub_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 35
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    addis 3, 2, .LCPI22_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE-NEXT:    addi 3, 3, .LCPI22_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 5, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 4, 35, 35, 1
-; PC64LE-NEXT:    xssubsp 0, 1, 0
-; PC64LE-NEXT:    xscvspdpn 1, 2
-; PC64LE-NEXT:    xscvspdpn 2, 3
-; PC64LE-NEXT:    xssubsp 1, 2, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xscvspdpn 1, 5
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 4
-; PC64LE-NEXT:    xssubsp 0, 1, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvsubsp 34, 34, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fsub_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
-; PC64LE9-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE9-NEXT:    addis 3, 2, .LCPI22_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI22_0 at toc@l
-; PC64LE9-NEXT:    xxswapd 2, 34
-; PC64LE9-NEXT:    xxsldwi 3, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 3, 3
-; PC64LE9-NEXT:    xssubsp 0, 1, 0
-; PC64LE9-NEXT:    xxswapd 1, 35
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xssubsp 1, 2, 1
-; PC64LE9-NEXT:    xxsldwi 2, 35, 35, 3
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xssubsp 2, 3, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvsubsp 34, 34, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %sub = call <3 x float> @llvm.experimental.constrained.fsub.v3f32(
@@ -972,18 +802,24 @@ define <3 x double> @constrained_vector_fsub_v3f64(<3 x double> %x, <3 x double>
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE-NEXT:    xxmrghd 1, 2, 1
-; PC64LE-NEXT:    xssubdp 3, 3, 6
 ; PC64LE-NEXT:    xvsubdp 2, 1, 0
+; PC64LE-NEXT:    xxspltd 4, 6, 0
+; PC64LE-NEXT:    xxspltd 3, 3, 0
+; PC64LE-NEXT:    xvsubdp 0, 3, 4
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fsub_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE9-NEXT:    xxmrghd 1, 2, 1
-; PC64LE9-NEXT:    xssubdp 3, 3, 6
+; PC64LE9-NEXT:    xxspltd 4, 6, 0
 ; PC64LE9-NEXT:    xvsubdp 2, 1, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvsubdp 0, 0, 4
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %sub = call <3 x double> @llvm.experimental.constrained.fsub.v3f64(
@@ -997,14 +833,14 @@ entry:
 define <4 x double> @constrained_vector_fsub_v4f64(<4 x double> %x, <4 x double> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_fsub_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvsubdp 35, 35, 37
 ; PC64LE-NEXT:    xvsubdp 34, 34, 36
+; PC64LE-NEXT:    xvsubdp 35, 35, 37
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fsub_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvsubdp 35, 35, 37
 ; PC64LE9-NEXT:    xvsubdp 34, 34, 36
+; PC64LE9-NEXT:    xvsubdp 35, 35, 37
 ; PC64LE9-NEXT:    blr
 entry:
   %sub = call <4 x double> @llvm.experimental.constrained.fsub.v4f64(
@@ -1054,45 +890,12 @@ entry:
 define <3 x float> @constrained_vector_sqrt_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sqrt_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI27_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    addi 3, 3, .LCPI27_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xssqrtsp 0, 0
-; PC64LE-NEXT:    xssqrtsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xssqrtsp 0, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvsqrtsp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sqrt_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI27_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI27_0 at toc@l
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xssqrtsp 1, 1
-; PC64LE9-NEXT:    xssqrtsp 2, 2
-; PC64LE9-NEXT:    xssqrtsp 0, 0
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xvsqrtsp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %sqrt = call <3 x float> @llvm.experimental.constrained.sqrt.v3f32(
@@ -1106,17 +909,21 @@ define <3 x double> @constrained_vector_sqrt_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sqrt_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 2, 1
-; PC64LE-NEXT:    xssqrtdp 3, 3
+; PC64LE-NEXT:    xxspltd 1, 3, 0
 ; PC64LE-NEXT:    xvsqrtdp 2, 0
+; PC64LE-NEXT:    xvsqrtdp 0, 1
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sqrt_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 2, 1
-; PC64LE9-NEXT:    xssqrtdp 3, 3
 ; PC64LE9-NEXT:    xvsqrtdp 2, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xvsqrtdp 0, 0
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %sqrt = call <3 x double> @llvm.experimental.constrained.sqrt.v3f64(
@@ -1129,14 +936,14 @@ entry:
 define <4 x double> @constrained_vector_sqrt_v4f64(<4 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sqrt_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvsqrtdp 35, 35
 ; PC64LE-NEXT:    xvsqrtdp 34, 34
+; PC64LE-NEXT:    xvsqrtdp 35, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sqrt_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvsqrtdp 35, 35
 ; PC64LE9-NEXT:    xvsqrtdp 34, 34
+; PC64LE9-NEXT:    xvsqrtdp 35, 35
 ; PC64LE9-NEXT:    blr
  entry:
   %sqrt = call <4 x double> @llvm.experimental.constrained.sqrt.v4f64(
@@ -1254,52 +1061,56 @@ define <3 x float> @constrained_vector_pow_v3f32(<3 x float> %x, <3 x float> %y)
 ; PC64LE-LABEL: constrained_vector_pow_v3f32:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 1
+; PC64LE-NEXT:    stdu 1, -112(1)
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
+; PC64LE-NEXT:    std 0, 128(1)
+; PC64LE-NEXT:    stxvd2x 60, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    li 3, 96
 ; PC64LE-NEXT:    vmr 30, 2
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 3
 ; PC64LE-NEXT:    bl powf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 62
-; PC64LE-NEXT:    xxswapd 2, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xxsldwi 0, 62, 62, 1
+; PC64LE-NEXT:    xxsldwi 2, 63, 63, 1
+; PC64LE-NEXT:    xxlor 61, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
 ; PC64LE-NEXT:    bl powf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE-NEXT:    xxsldwi 2, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxmrghd 0, 1, 61
+; PC64LE-NEXT:    xscvspdpn 1, 62
+; PC64LE-NEXT:    xscvspdpn 2, 63
+; PC64LE-NEXT:    xvcvdpsp 60, 0
+; PC64LE-NEXT:    bl powf
+; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    xxswapd 0, 62
+; PC64LE-NEXT:    xxswapd 2, 63
+; PC64LE-NEXT:    xxlor 61, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
 ; PC64LE-NEXT:    bl powf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI32_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI32_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    xxmrghd 0, 61, 1
+; PC64LE-NEXT:    li 3, 96
+; PC64LE-NEXT:    xvcvdpsp 34, 0
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 96
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    vmrgew 2, 2, 28
+; PC64LE-NEXT:    lxvd2x 60, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    addi 1, 1, 112
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
@@ -1307,47 +1118,48 @@ define <3 x float> @constrained_vector_pow_v3f32(<3 x float> %x, <3 x float> %y)
 ; PC64LE9-LABEL: constrained_vector_pow_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 62, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stdu 1, -96(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 112(1)
+; PC64LE9-NEXT:    stxv 60, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
+; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 3
+; PC64LE9-NEXT:    stxv 61, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 62, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 80(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    vmr 31, 3
-; PC64LE9-NEXT:    vmr 30, 2
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
+; PC64LE9-NEXT:    vmr 30, 2
 ; PC64LE9-NEXT:    bl powf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 62
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 1
+; PC64LE9-NEXT:    xscpsgndp 61, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxswapd 0, 63
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl powf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xxmrghd 0, 1, 61
+; PC64LE9-NEXT:    xscvspdpn 1, 62
+; PC64LE9-NEXT:    xscvspdpn 2, 63
+; PC64LE9-NEXT:    xvcvdpsp 60, 0
+; PC64LE9-NEXT:    bl powf
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xxswapd 0, 62
+; PC64LE9-NEXT:    xscpsgndp 61, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
+; PC64LE9-NEXT:    xxswapd 0, 63
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl powf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI32_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 48(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lxv 62, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI32_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 80
+; PC64LE9-NEXT:    xxmrghd 0, 61, 1
+; PC64LE9-NEXT:    lxv 63, 80(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 61, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xvcvdpsp 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 28
+; PC64LE9-NEXT:    lxv 60, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 96
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -1671,45 +1483,43 @@ define <3 x float> @constrained_vector_powi_v3f32(<3 x float> %x, i32 %y) #0 {
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
 ; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    std 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    clrldi 30, 5, 32
+; PC64LE-NEXT:    std 30, 80(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    clrldi 30, 5, 32
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    mr 4, 30
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl __powisf2
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xscvdpspn 62, 1
 ; PC64LE-NEXT:    mr 4, 30
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl __powisf2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xscvdpspn 0, 1
 ; PC64LE-NEXT:    mr 4, 30
+; PC64LE-NEXT:    xxmrghw 62, 0, 62
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl __powisf2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
 ; PC64LE-NEXT:    addis 3, 2, .LCPI37_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    ld 30, 64(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    xscvdpspn 35, 1
+; PC64LE-NEXT:    ld 30, 80(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    addi 3, 3, .LCPI37_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxswapd 34, 0
+; PC64LE-NEXT:    vperm 2, 3, 30, 2
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
 ; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
@@ -1719,42 +1529,38 @@ define <3 x float> @constrained_vector_powi_v3f32(<3 x float> %x, i32 %y) #0 {
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
 ; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    std 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    std 30, 64(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stxv 62, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    clrldi 30, 5, 32
-; PC64LE9-NEXT:    vmr 31, 2
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 48(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    mr 4, 30
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl __powisf2
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xscvdpspn 62, 1
 ; PC64LE9-NEXT:    mr 4, 30
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl __powisf2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xscvdpspn 0, 1
 ; PC64LE9-NEXT:    mr 4, 30
+; PC64LE9-NEXT:    xxmrghw 62, 0, 62
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl __powisf2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI37_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    ld 30, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    lxv 63, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    ld 30, 64(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI37_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
 ; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xxperm 34, 62, 0
+; PC64LE9-NEXT:    lxv 62, 32(1) # 16-byte Folded Reload
 ; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
@@ -1772,77 +1578,75 @@ define <3 x double> @constrained_vector_powi_v3f64(<3 x double> %x, i32 %y) #0 {
 ; PC64LE-LABEL: constrained_vector_powi_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    std 30, 64(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 30, -40(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -80(1)
 ; PC64LE-NEXT:    clrldi 30, 6, 32
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 0, 96(1)
 ; PC64LE-NEXT:    fmr 31, 3
 ; PC64LE-NEXT:    fmr 30, 2
 ; PC64LE-NEXT:    mr 4, 30
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    bl __powidf2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 29, 1
 ; PC64LE-NEXT:    fmr 1, 30
 ; PC64LE-NEXT:    mr 4, 30
 ; PC64LE-NEXT:    bl __powidf2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 31
 ; PC64LE-NEXT:    mr 4, 30
 ; PC64LE-NEXT:    bl __powidf2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    ld 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 96
+; PC64LE-NEXT:    fmr 1, 29
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 80
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    ld 30, -40(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_powi_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    std 30, -40(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    std 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    clrldi 30, 6, 32
+; PC64LE9-NEXT:    std 0, 96(1)
 ; PC64LE9-NEXT:    mr 4, 30
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 31, 3
 ; PC64LE9-NEXT:    fmr 30, 2
 ; PC64LE9-NEXT:    bl __powidf2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 29, 1
 ; PC64LE9-NEXT:    fmr 1, 30
 ; PC64LE9-NEXT:    mr 4, 30
 ; PC64LE9-NEXT:    bl __powidf2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 31
 ; PC64LE9-NEXT:    mr 4, 30
 ; PC64LE9-NEXT:    bl __powidf2
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    ld 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 29
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    ld 30, -40(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %powi = call <3 x double> @llvm.experimental.constrained.powi.v3f64(
@@ -2046,41 +1850,44 @@ define <3 x float> @constrained_vector_sin_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sin_v3f32:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    stdu 1, -96(1)
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 0, 112(1)
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl sinf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl sinf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxmrghd 0, 1, 62
+; PC64LE-NEXT:    xscvspdpn 1, 63
+; PC64LE-NEXT:    xvcvdpsp 61, 0
+; PC64LE-NEXT:    bl sinf
+; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    xxswapd 0, 63
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl sinf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI42_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI42_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxmrghd 0, 62, 1
+; PC64LE-NEXT:    li 3, 80
+; PC64LE-NEXT:    xvcvdpsp 34, 0
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    vmrgew 2, 2, 29
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
@@ -2088,38 +1895,38 @@ define <3 x float> @constrained_vector_sin_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_sin_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl sinf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl sinf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xxmrghd 0, 1, 62
+; PC64LE9-NEXT:    xscvspdpn 1, 63
+; PC64LE9-NEXT:    xvcvdpsp 61, 0
+; PC64LE9-NEXT:    bl sinf
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xxswapd 0, 63
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl sinf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI42_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI42_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    xxmrghd 0, 62, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xvcvdpsp 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 29
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -2381,41 +2188,44 @@ define <3 x float> @constrained_vector_cos_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_cos_v3f32:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    stdu 1, -96(1)
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 0, 112(1)
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl cosf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl cosf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxmrghd 0, 1, 62
+; PC64LE-NEXT:    xscvspdpn 1, 63
+; PC64LE-NEXT:    xvcvdpsp 61, 0
+; PC64LE-NEXT:    bl cosf
+; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    xxswapd 0, 63
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl cosf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI47_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI47_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxmrghd 0, 62, 1
+; PC64LE-NEXT:    li 3, 80
+; PC64LE-NEXT:    xvcvdpsp 34, 0
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    vmrgew 2, 2, 29
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
@@ -2423,38 +2233,38 @@ define <3 x float> @constrained_vector_cos_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_cos_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl cosf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl cosf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xxmrghd 0, 1, 62
+; PC64LE9-NEXT:    xscvspdpn 1, 63
+; PC64LE9-NEXT:    xvcvdpsp 61, 0
+; PC64LE9-NEXT:    bl cosf
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xxswapd 0, 63
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl cosf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI47_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI47_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    xxmrghd 0, 62, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xvcvdpsp 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 29
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -2716,41 +2526,44 @@ define <3 x float> @constrained_vector_exp_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_exp_v3f32:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    stdu 1, -96(1)
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 0, 112(1)
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl expf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl expf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxmrghd 0, 1, 62
+; PC64LE-NEXT:    xscvspdpn 1, 63
+; PC64LE-NEXT:    xvcvdpsp 61, 0
+; PC64LE-NEXT:    bl expf
+; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    xxswapd 0, 63
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl expf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI52_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI52_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxmrghd 0, 62, 1
+; PC64LE-NEXT:    li 3, 80
+; PC64LE-NEXT:    xvcvdpsp 34, 0
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    vmrgew 2, 2, 29
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
@@ -2758,38 +2571,38 @@ define <3 x float> @constrained_vector_exp_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_exp_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl expf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl expf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    xxmrghd 0, 1, 62
+; PC64LE9-NEXT:    xscvspdpn 1, 63
+; PC64LE9-NEXT:    xvcvdpsp 61, 0
 ; PC64LE9-NEXT:    bl expf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI52_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI52_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    xxswapd 0, 63
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
+; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    bl expf
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xxmrghd 0, 62, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xvcvdpsp 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 29
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -3052,39 +2865,37 @@ define <3 x float> @constrained_vector_exp2_v3f32(<3 x float> %x) #0 {
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
 ; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl exp2f
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xscvdpspn 62, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl exp2f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    addis 3, 2, .LCPI57_0 at toc@ha
+; PC64LE-NEXT:    xscvdpspn 0, 1
+; PC64LE-NEXT:    addi 3, 3, .LCPI57_0 at toc@l
+; PC64LE-NEXT:    lxvd2x 1, 0, 3
+; PC64LE-NEXT:    xxmrghw 62, 0, 62
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxswapd 63, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl exp2f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI57_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI57_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xscvdpspn 34, 1
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    vperm 2, 2, 30, 31
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
 ; PC64LE-NEXT:    addi 1, 1, 80
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
@@ -3093,38 +2904,36 @@ define <3 x float> @constrained_vector_exp2_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_exp2_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl exp2f
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    bl exp2f
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xscvdpspn 62, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl exp2f
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI57_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI57_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    lxv 61, 0(3)
+; PC64LE9-NEXT:    xxmrghw 62, 0, 62
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    bl exp2f
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xxperm 34, 62, 61
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -3140,65 +2949,63 @@ define <3 x double> @constrained_vector_exp2_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_exp2_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    fmr 30, 2
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -64(1)
+; PC64LE-NEXT:    std 0, 80(1)
 ; PC64LE-NEXT:    fmr 31, 3
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    fmr 30, 2
 ; PC64LE-NEXT:    bl exp2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 29, 1
 ; PC64LE-NEXT:    fmr 1, 30
 ; PC64LE-NEXT:    bl exp2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 31
 ; PC64LE-NEXT:    bl exp2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    fmr 1, 29
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 64
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_exp2_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -64(1)
 ; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 31, 3
 ; PC64LE9-NEXT:    fmr 30, 2
 ; PC64LE9-NEXT:    bl exp2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 29, 1
 ; PC64LE9-NEXT:    fmr 1, 30
 ; PC64LE9-NEXT:    bl exp2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 31
 ; PC64LE9-NEXT:    bl exp2
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 29
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 64
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %exp2 = call <3 x double> @llvm.experimental.constrained.exp2.v3f64(
@@ -3386,41 +3193,44 @@ define <3 x float> @constrained_vector_log_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_log_v3f32:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    stdu 1, -96(1)
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 0, 112(1)
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl logf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl logf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxmrghd 0, 1, 62
+; PC64LE-NEXT:    xscvspdpn 1, 63
+; PC64LE-NEXT:    xvcvdpsp 61, 0
+; PC64LE-NEXT:    bl logf
+; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    xxswapd 0, 63
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl logf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI62_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI62_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxmrghd 0, 62, 1
+; PC64LE-NEXT:    li 3, 80
+; PC64LE-NEXT:    xvcvdpsp 34, 0
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    vmrgew 2, 2, 29
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
@@ -3428,38 +3238,38 @@ define <3 x float> @constrained_vector_log_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_log_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl logf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl logf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xxmrghd 0, 1, 62
+; PC64LE9-NEXT:    xscvspdpn 1, 63
+; PC64LE9-NEXT:    xvcvdpsp 61, 0
+; PC64LE9-NEXT:    bl logf
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xxswapd 0, 63
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl logf
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI62_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI62_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    xxmrghd 0, 62, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xvcvdpsp 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 29
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -3721,41 +3531,44 @@ define <3 x float> @constrained_vector_log10_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_log10_v3f32:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    stdu 1, -96(1)
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    std 0, 112(1)
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl log10f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl log10f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    xxmrghd 0, 1, 62
+; PC64LE-NEXT:    xscvspdpn 1, 63
+; PC64LE-NEXT:    xvcvdpsp 61, 0
+; PC64LE-NEXT:    bl log10f
+; PC64LE-NEXT:    nop
+; PC64LE-NEXT:    xxswapd 0, 63
+; PC64LE-NEXT:    xxlor 62, 1, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl log10f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI67_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI67_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxmrghd 0, 62, 1
+; PC64LE-NEXT:    li 3, 80
+; PC64LE-NEXT:    xvcvdpsp 34, 0
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    vmrgew 2, 2, 29
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
+; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
 ; PC64LE-NEXT:    blr
@@ -3763,38 +3576,38 @@ define <3 x float> @constrained_vector_log10_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_log10_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl log10f
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl log10f
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xxmrghd 0, 1, 62
+; PC64LE9-NEXT:    xscvspdpn 1, 63
+; PC64LE9-NEXT:    xvcvdpsp 61, 0
+; PC64LE9-NEXT:    bl log10f
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xxswapd 0, 63
+; PC64LE9-NEXT:    xscpsgndp 62, 1, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl log10f
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI67_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI67_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    xxmrghd 0, 62, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xvcvdpsp 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 29
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -4057,39 +3870,37 @@ define <3 x float> @constrained_vector_log2_v3f32(<3 x float> %x) #0 {
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
 ; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl log2f
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xscvdpspn 62, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl log2f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    addis 3, 2, .LCPI72_0 at toc@ha
+; PC64LE-NEXT:    xscvdpspn 0, 1
+; PC64LE-NEXT:    addi 3, 3, .LCPI72_0 at toc@l
+; PC64LE-NEXT:    lxvd2x 1, 0, 3
+; PC64LE-NEXT:    xxmrghw 62, 0, 62
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxswapd 63, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl log2f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI72_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI72_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xscvdpspn 34, 1
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    vperm 2, 2, 30, 31
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
 ; PC64LE-NEXT:    addi 1, 1, 80
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
@@ -4098,38 +3909,36 @@ define <3 x float> @constrained_vector_log2_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_log2_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl log2f
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    bl log2f
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xscvdpspn 62, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl log2f
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI72_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI72_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    lxv 61, 0(3)
+; PC64LE9-NEXT:    xxmrghw 62, 0, 62
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    bl log2f
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xxperm 34, 62, 61
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -4145,65 +3954,63 @@ define <3 x double> @constrained_vector_log2_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_log2_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    fmr 30, 2
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -64(1)
+; PC64LE-NEXT:    std 0, 80(1)
 ; PC64LE-NEXT:    fmr 31, 3
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    fmr 30, 2
 ; PC64LE-NEXT:    bl log2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 29, 1
 ; PC64LE-NEXT:    fmr 1, 30
 ; PC64LE-NEXT:    bl log2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 31
 ; PC64LE-NEXT:    bl log2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    fmr 1, 29
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 64
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_log2_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -64(1)
 ; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 31, 3
 ; PC64LE9-NEXT:    fmr 30, 2
 ; PC64LE9-NEXT:    bl log2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 29, 1
 ; PC64LE9-NEXT:    fmr 1, 30
 ; PC64LE9-NEXT:    bl log2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 31
 ; PC64LE9-NEXT:    bl log2
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 29
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 64
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %log2 = call <3 x double> @llvm.experimental.constrained.log2.v3f64(
@@ -4336,45 +4143,12 @@ entry:
 define <3 x float> @constrained_vector_rint_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_rint_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI77_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    addi 3, 3, .LCPI77_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xsrdpic 0, 0
-; PC64LE-NEXT:    xsrdpic 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xsrdpic 0, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvrspic 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_rint_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI77_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI77_0 at toc@l
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xsrdpic 1, 1
-; PC64LE9-NEXT:    xsrdpic 2, 2
-; PC64LE9-NEXT:    xsrdpic 0, 0
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xvrspic 34, 34
 ; PC64LE9-NEXT:    blr
  entry:
   %rint = call <3 x float> @llvm.experimental.constrained.rint.v3f32(
@@ -4388,17 +4162,21 @@ define <3 x double> @constrained_vector_rint_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_rint_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 2, 1
-; PC64LE-NEXT:    xsrdpic 3, 3
+; PC64LE-NEXT:    xxspltd 1, 3, 0
 ; PC64LE-NEXT:    xvrdpic 2, 0
+; PC64LE-NEXT:    xvrdpic 0, 1
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_rint_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 2, 1
-; PC64LE9-NEXT:    xsrdpic 3, 3
 ; PC64LE9-NEXT:    xvrdpic 2, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvrdpic 0, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %rint = call <3 x double> @llvm.experimental.constrained.rint.v3f64(
@@ -4411,14 +4189,14 @@ entry:
 define <4 x double> @constrained_vector_rint_v4f64(<4 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_rint_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvrdpic 35, 35
 ; PC64LE-NEXT:    xvrdpic 34, 34
+; PC64LE-NEXT:    xvrdpic 35, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_rint_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvrdpic 35, 35
 ; PC64LE9-NEXT:    xvrdpic 34, 34
+; PC64LE9-NEXT:    xvrdpic 35, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %rint = call <4 x double> @llvm.experimental.constrained.rint.v4f64(
@@ -4521,83 +4299,12 @@ entry:
 define <3 x float> @constrained_vector_nearbyint_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_nearbyint_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    vmr 31, 2
-; PC64LE-NEXT:    bl nearbyintf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    bl nearbyintf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    bl nearbyintf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI82_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI82_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 80
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    vrfin 2, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_nearbyint_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    bl nearbyintf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    bl nearbyintf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    bl nearbyintf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI82_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI82_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    vrfin 2, 2
 ; PC64LE9-NEXT:    blr
 entry:
   %nearby = call <3 x float> @llvm.experimental.constrained.nearbyint.v3f32(
@@ -4611,65 +4318,63 @@ define <3 x double> @constrained_vector_nearby_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_nearby_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    fmr 30, 2
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -64(1)
+; PC64LE-NEXT:    std 0, 80(1)
 ; PC64LE-NEXT:    fmr 31, 3
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    fmr 30, 2
 ; PC64LE-NEXT:    bl nearbyint
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 29, 1
 ; PC64LE-NEXT:    fmr 1, 30
 ; PC64LE-NEXT:    bl nearbyint
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 31
 ; PC64LE-NEXT:    bl nearbyint
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    fmr 1, 29
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 64
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_nearby_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -64(1)
 ; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 31, 3
 ; PC64LE9-NEXT:    fmr 30, 2
 ; PC64LE9-NEXT:    bl nearbyint
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 29, 1
 ; PC64LE9-NEXT:    fmr 1, 30
 ; PC64LE9-NEXT:    bl nearbyint
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 31
 ; PC64LE9-NEXT:    bl nearbyint
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 29
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 64
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %nearby = call <3 x double> @llvm.experimental.constrained.nearbyint.v3f64(
@@ -4766,26 +4471,12 @@ entry:
 define <1 x float> @constrained_vector_maxnum_v1f32(<1 x float> %x, <1 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_maxnum_v1f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -32(1)
-; PC64LE-NEXT:    std 0, 48(1)
-; PC64LE-NEXT:    bl fmaxf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    addi 1, 1, 32
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    xsmaxdp 1, 1, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_maxnum_v1f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -32(1)
-; PC64LE9-NEXT:    std 0, 48(1)
-; PC64LE9-NEXT:    bl fmaxf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    addi 1, 1, 32
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    xsmaxdp 1, 1, 2
 ; PC64LE9-NEXT:    blr
 entry:
   %max = call <1 x float> @llvm.experimental.constrained.maxnum.v1f32(
@@ -4815,103 +4506,12 @@ entry:
 define <3 x float> @constrained_vector_maxnum_v3f32(<3 x float> %x, <3 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_maxnum_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 1
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    li 3, 64
-; PC64LE-NEXT:    vmr 30, 2
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    vmr 31, 3
-; PC64LE-NEXT:    bl fmaxf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 62
-; PC64LE-NEXT:    xxswapd 2, 63
-; PC64LE-NEXT:    fmr 31, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    bl fmaxf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE-NEXT:    xxsldwi 2, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    bl fmaxf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI87_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI87_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 64
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 96
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    xvmaxsp 34, 34, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_maxnum_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 62, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 48(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
-; PC64LE9-NEXT:    vmr 31, 3
-; PC64LE9-NEXT:    vmr 30, 2
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    bl fmaxf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 62
-; PC64LE9-NEXT:    fmr 31, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    bl fmaxf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE9-NEXT:    fmr 30, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    bl fmaxf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI87_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 48(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lxv 62, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI87_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 80
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    xvmaxsp 34, 34, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %max = call <3 x float> @llvm.experimental.constrained.maxnum.v3f32(
@@ -4924,48 +4524,26 @@ entry:
 define <3 x double> @constrained_vector_max_v3f64(<3 x double> %x, <3 x double> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_max_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -64(1)
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE-NEXT:    xxmrghd 1, 2, 1
-; PC64LE-NEXT:    std 0, 80(1)
-; PC64LE-NEXT:    fmr 2, 6
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    xvmaxdp 63, 1, 0
-; PC64LE-NEXT:    fmr 1, 3
-; PC64LE-NEXT:    bl fmax
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 64
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    xvmaxdp 2, 1, 0
+; PC64LE-NEXT:    xxspltd 4, 6, 0
+; PC64LE-NEXT:    xxspltd 3, 3, 0
+; PC64LE-NEXT:    xvmaxdp 0, 3, 4
+; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_max_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -48(1)
 ; PC64LE9-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE9-NEXT:    xxmrghd 1, 2, 1
-; PC64LE9-NEXT:    std 0, 64(1)
-; PC64LE9-NEXT:    fmr 2, 6
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    xvmaxdp 63, 1, 0
-; PC64LE9-NEXT:    fmr 1, 3
-; PC64LE9-NEXT:    bl fmax
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    addi 1, 1, 48
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    xxspltd 4, 6, 0
+; PC64LE9-NEXT:    xvmaxdp 2, 1, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvmaxdp 0, 0, 4
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %max = call <3 x double> @llvm.experimental.constrained.maxnum.v3f64(
@@ -4998,26 +4576,12 @@ entry:
 define <1 x float> @constrained_vector_minnum_v1f32(<1 x float> %x, <1 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_minnum_v1f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -32(1)
-; PC64LE-NEXT:    std 0, 48(1)
-; PC64LE-NEXT:    bl fminf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    addi 1, 1, 32
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    xsmindp 1, 1, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_minnum_v1f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -32(1)
-; PC64LE9-NEXT:    std 0, 48(1)
-; PC64LE9-NEXT:    bl fminf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    addi 1, 1, 32
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    xsmindp 1, 1, 2
 ; PC64LE9-NEXT:    blr
  entry:
   %min = call <1 x float> @llvm.experimental.constrained.minnum.v1f32(
@@ -5047,103 +4611,12 @@ entry:
 define <3 x float> @constrained_vector_minnum_v3f32(<3 x float> %x, <3 x float> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_minnum_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 1
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    li 3, 64
-; PC64LE-NEXT:    vmr 30, 2
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    vmr 31, 3
-; PC64LE-NEXT:    bl fminf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxswapd 0, 62
-; PC64LE-NEXT:    xxswapd 2, 63
-; PC64LE-NEXT:    fmr 31, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    bl fminf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE-NEXT:    xxsldwi 2, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    bl fminf
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI92_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI92_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 64
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
-; PC64LE-NEXT:    addi 1, 1, 96
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    xvminsp 34, 34, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_minnum_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 62, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 48(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
-; PC64LE9-NEXT:    vmr 31, 3
-; PC64LE9-NEXT:    vmr 30, 2
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    bl fminf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxswapd 0, 62
-; PC64LE9-NEXT:    fmr 31, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    bl fminf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE9-NEXT:    fmr 30, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    bl fminf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
-; PC64LE9-NEXT:    addis 3, 2, .LCPI92_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 48(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lxv 62, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    addi 3, 3, .LCPI92_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 80
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    xvminsp 34, 34, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %min = call <3 x float> @llvm.experimental.constrained.minnum.v3f32(
@@ -5156,48 +4629,26 @@ entry:
 define <3 x double> @constrained_vector_min_v3f64(<3 x double> %x, <3 x double> %y) #0 {
 ; PC64LE-LABEL: constrained_vector_min_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -64(1)
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE-NEXT:    xxmrghd 1, 2, 1
-; PC64LE-NEXT:    std 0, 80(1)
-; PC64LE-NEXT:    fmr 2, 6
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
-; PC64LE-NEXT:    xvmindp 63, 1, 0
-; PC64LE-NEXT:    fmr 1, 3
-; PC64LE-NEXT:    bl fmin
-; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 64
-; PC64LE-NEXT:    ld 0, 16(1)
-; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    xvmindp 2, 1, 0
+; PC64LE-NEXT:    xxspltd 4, 6, 0
+; PC64LE-NEXT:    xxspltd 3, 3, 0
+; PC64LE-NEXT:    xvmindp 0, 3, 4
+; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_min_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -48(1)
 ; PC64LE9-NEXT:    xxmrghd 0, 5, 4
 ; PC64LE9-NEXT:    xxmrghd 1, 2, 1
-; PC64LE9-NEXT:    std 0, 64(1)
-; PC64LE9-NEXT:    fmr 2, 6
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    xvmindp 63, 1, 0
-; PC64LE9-NEXT:    fmr 1, 3
-; PC64LE9-NEXT:    bl fmin
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    addi 1, 1, 48
-; PC64LE9-NEXT:    ld 0, 16(1)
-; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    xxspltd 4, 6, 0
+; PC64LE9-NEXT:    xvmindp 2, 1, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvmindp 0, 0, 4
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
  %min = call <3 x double> @llvm.experimental.constrained.minnum.v3f64(
@@ -5249,32 +4700,12 @@ entry:
 define <2 x i32> @constrained_vector_fptosi_v2i32_v2f32(<2 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v2i32_v2f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xscvdpsxws 0, 0
-; PC64LE-NEXT:    xscvdpsxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
+; PC64LE-NEXT:    xvcvspsxws 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v2i32_v2f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvdpsxws 0, 0
-; PC64LE9-NEXT:    xscvdpsxws 1, 1
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    mffprwz 3, 1
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    xxmrghw 34, 1, 0
+; PC64LE9-NEXT:    xvcvspsxws 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32(
@@ -5286,51 +4717,12 @@ entry:
 define <3 x i32> @constrained_vector_fptosi_v3i32_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v3i32_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvdpsxws 0, 0
-; PC64LE-NEXT:    xscvdpsxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mffprwz 4, 1
-; PC64LE-NEXT:    mtfprwz 0, 4
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI97_0 at toc@ha
-; PC64LE-NEXT:    addi 3, 3, .LCPI97_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xscvdpsxws 0, 0
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtvsrwz 36, 3
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvcvspsxws 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v3i32_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxws 0, 0
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxws 0, 0
-; PC64LE9-NEXT:    mffprwz 4, 0
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    mtvsrwz 34, 3
-; PC64LE9-NEXT:    mtfprwz 1, 4
-; PC64LE9-NEXT:    addis 4, 2, .LCPI97_0 at toc@ha
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    addi 4, 4, .LCPI97_0 at toc@l
-; PC64LE9-NEXT:    xscvdpsxws 0, 0
-; PC64LE9-NEXT:    mffprwz 5, 0
-; PC64LE9-NEXT:    mtfprwz 0, 5
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(4)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvcvspsxws 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f32(
@@ -5378,30 +4770,16 @@ entry:
 define <2 x i64> @constrained_vector_fptosi_v2i64_v2f32(<2 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v2i64_v2f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xscvdpsxds 0, 0
-; PC64LE-NEXT:    xscvdpsxds 1, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xvcvspdp 0, 0
+; PC64LE-NEXT:    xvcvdpsxds 34, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v2i64_v2f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    mtvsrdd 34, 4, 3
+; PC64LE9-NEXT:    xxmrglw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
+; PC64LE9-NEXT:    xvcvdpsxds 34, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f32(
@@ -5413,34 +4791,29 @@ entry:
 define <3 x i64> @constrained_vector_fptosi_v3i64_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v3i64_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvdpsxds 0, 0
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    xscvspdpn 0, 1
-; PC64LE-NEXT:    xscvdpsxds 0, 0
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
+; PC64LE-NEXT:    xvcvspdp 0, 0
+; PC64LE-NEXT:    xscvspdpn 1, 1
+; PC64LE-NEXT:    xvcvdpsxds 0, 0
+; PC64LE-NEXT:    xscvdpsxds 1, 1
+; PC64LE-NEXT:    mffprd 5, 1
 ; PC64LE-NEXT:    mffprd 4, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xscvdpsxds 0, 0
-; PC64LE-NEXT:    mffprd 5, 0
+; PC64LE-NEXT:    xxswapd 2, 0
+; PC64LE-NEXT:    mffprd 3, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v3i64_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
 ; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxmrglw 1, 34, 34
 ; PC64LE9-NEXT:    xscvspdpn 0, 0
+; PC64LE9-NEXT:    xvcvspdp 1, 1
 ; PC64LE9-NEXT:    xscvdpsxds 0, 0
 ; PC64LE9-NEXT:    mffprd 5, 0
+; PC64LE9-NEXT:    xvcvdpsxds 0, 1
+; PC64LE9-NEXT:    mfvsrld 3, 0
+; PC64LE9-NEXT:    mffprd 4, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f32(
@@ -5452,50 +4825,23 @@ entry:
 define <4 x i64> @constrained_vector_fptosi_v4i64_v4f32(<4 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v4i64_v4f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvdpsxds 0, 0
-; PC64LE-NEXT:    xscvdpsxds 1, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xxmrghd 36, 1, 0
-; PC64LE-NEXT:    xscvspdpn 0, 34
-; PC64LE-NEXT:    xscvspdpn 1, 2
-; PC64LE-NEXT:    vmr 2, 4
-; PC64LE-NEXT:    xscvdpsxds 0, 0
-; PC64LE-NEXT:    xscvdpsxds 1, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xxmrghd 35, 0, 1
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xxmrghw 1, 34, 34
+; PC64LE-NEXT:    xvcvspdp 0, 0
+; PC64LE-NEXT:    xvcvdpsxds 34, 0
+; PC64LE-NEXT:    xvcvspdp 0, 1
+; PC64LE-NEXT:    xvcvdpsxds 35, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v4i64_v4f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    xscvspdpn 0, 34
-; PC64LE9-NEXT:    mtvsrdd 36, 4, 3
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxmrglw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
+; PC64LE9-NEXT:    xvcvdpsxds 36, 0
+; PC64LE9-NEXT:    xxmrghw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
 ; PC64LE9-NEXT:    vmr 2, 4
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    mtvsrdd 35, 3, 4
+; PC64LE9-NEXT:    xvcvdpsxds 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32(
@@ -5527,13 +4873,9 @@ entry:
 define <2 x i32> @constrained_vector_fptosi_v2i32_v2f64(<2 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v2i32_v2f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xscvdpsxws 1, 34
 ; PC64LE-NEXT:    xxswapd 0, 34
+; PC64LE-NEXT:    xscvdpsxws 1, 34
 ; PC64LE-NEXT:    xscvdpsxws 0, 0
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
 ; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    blr
 ;
@@ -5542,10 +4884,6 @@ define <2 x i32> @constrained_vector_fptosi_v2i32_v2f64(<2 x double> %x) #0 {
 ; PC64LE9-NEXT:    xxswapd 1, 34
 ; PC64LE9-NEXT:    xscvdpsxws 0, 34
 ; PC64LE9-NEXT:    xscvdpsxws 1, 1
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    mffprwz 3, 1
-; PC64LE9-NEXT:    mtfprwz 1, 3
 ; PC64LE9-NEXT:    xxmrghw 34, 0, 1
 ; PC64LE9-NEXT:    blr
 entry:
@@ -5558,38 +4896,26 @@ entry:
 define <3 x i32> @constrained_vector_fptosi_v3i32_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v3i32_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xscvdpsxws 0, 2
-; PC64LE-NEXT:    xscvdpsxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mffprwz 4, 1
-; PC64LE-NEXT:    mtfprwz 0, 4
-; PC64LE-NEXT:    mtfprwz 1, 3
+; PC64LE-NEXT:    xscvdpsxws 0, 1
+; PC64LE-NEXT:    xscvdpsxws 1, 2
 ; PC64LE-NEXT:    addis 3, 2, .LCPI105_0 at toc@ha
 ; PC64LE-NEXT:    addi 3, 3, .LCPI105_0 at toc@l
+; PC64LE-NEXT:    xscvdpsxws 36, 3
 ; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
 ; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvdpsxws 0, 3
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtvsrwz 36, 3
 ; PC64LE-NEXT:    vperm 2, 4, 2, 3
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v3i32_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xscvdpsxws 0, 3
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xscvdpsxws 0, 2
-; PC64LE9-NEXT:    mtvsrwz 34, 3
-; PC64LE9-NEXT:    mffprwz 4, 0
 ; PC64LE9-NEXT:    xscvdpsxws 0, 1
-; PC64LE9-NEXT:    mtfprwz 1, 4
-; PC64LE9-NEXT:    addis 4, 2, .LCPI105_0 at toc@ha
-; PC64LE9-NEXT:    addi 4, 4, .LCPI105_0 at toc@l
-; PC64LE9-NEXT:    mffprwz 5, 0
-; PC64LE9-NEXT:    mtfprwz 0, 5
+; PC64LE9-NEXT:    xscvdpsxws 1, 2
+; PC64LE9-NEXT:    addis 3, 2, .LCPI105_0 at toc@ha
+; PC64LE9-NEXT:    xscvdpsxws 34, 3
+; PC64LE9-NEXT:    addi 3, 3, .LCPI105_0 at toc@l
 ; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(4)
+; PC64LE9-NEXT:    lxv 0, 0(3)
 ; PC64LE9-NEXT:    xxperm 34, 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
@@ -5602,38 +4928,20 @@ entry:
 define <4 x i32> @constrained_vector_fptosi_v4i32_v4f64(<4 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v4i32_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xscvdpsxws 2, 34
-; PC64LE-NEXT:    xxswapd 1, 35
-; PC64LE-NEXT:    xscvdpsxws 0, 0
-; PC64LE-NEXT:    xscvdpsxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    xscvdpsxws 2, 35
-; PC64LE-NEXT:    mffprwz 4, 0
-; PC64LE-NEXT:    rldimi 4, 3, 32, 0
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    mtfprd 0, 4
-; PC64LE-NEXT:    mffprwz 4, 1
-; PC64LE-NEXT:    rldimi 4, 3, 32, 0
-; PC64LE-NEXT:    mtfprd 1, 4
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrgld 0, 35, 34
+; PC64LE-NEXT:    xxmrghd 1, 35, 34
+; PC64LE-NEXT:    xvcvdpsxws 34, 0
+; PC64LE-NEXT:    xvcvdpsxws 35, 1
+; PC64LE-NEXT:    vmrgew 2, 3, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v4i32_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xscvdpsxws 0, 34
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvdpsxws 0, 0
-; PC64LE9-NEXT:    mffprwz 4, 0
-; PC64LE9-NEXT:    xscvdpsxws 0, 35
-; PC64LE9-NEXT:    rldimi 4, 3, 32, 0
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 35
-; PC64LE9-NEXT:    xscvdpsxws 0, 0
-; PC64LE9-NEXT:    mffprwz 5, 0
-; PC64LE9-NEXT:    rldimi 5, 3, 32, 0
-; PC64LE9-NEXT:    mtvsrdd 34, 5, 4
+; PC64LE9-NEXT:    xxmrgld 0, 35, 34
+; PC64LE9-NEXT:    xvcvdpsxws 36, 0
+; PC64LE9-NEXT:    xxmrghd 0, 35, 34
+; PC64LE9-NEXT:    xvcvdpsxws 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 4
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f64(
@@ -5681,22 +4989,26 @@ entry:
 define <3 x i64> @constrained_vector_fptosi_v3i64_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v3i64_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xscvdpsxds 0, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    xscvdpsxds 0, 2
+; PC64LE-NEXT:    xxmrghd 0, 2, 1
+; PC64LE-NEXT:    xxspltd 1, 3, 0
+; PC64LE-NEXT:    xvcvdpsxds 0, 0
+; PC64LE-NEXT:    xvcvdpsxds 34, 1
 ; PC64LE-NEXT:    mffprd 4, 0
-; PC64LE-NEXT:    xscvdpsxds 0, 3
-; PC64LE-NEXT:    mffprd 5, 0
+; PC64LE-NEXT:    xxswapd 1, 0
+; PC64LE-NEXT:    xxswapd 2, 34
+; PC64LE-NEXT:    mffprd 3, 1
+; PC64LE-NEXT:    mffprd 5, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v3i64_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xscvdpsxds 0, 1
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 2
+; PC64LE9-NEXT:    xxmrghd 0, 2, 1
+; PC64LE9-NEXT:    xxspltd 1, 3, 0
+; PC64LE9-NEXT:    xvcvdpsxds 0, 0
+; PC64LE9-NEXT:    xvcvdpsxds 34, 1
+; PC64LE9-NEXT:    mfvsrld 3, 0
+; PC64LE9-NEXT:    mfvsrld 5, 34
 ; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    xscvdpsxds 0, 3
-; PC64LE9-NEXT:    mffprd 5, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f64(
@@ -5708,14 +5020,14 @@ entry:
 define <4 x i64> @constrained_vector_fptosi_v4i64_v4f64(<4 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptosi_v4i64_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvcvdpsxds 35, 35
 ; PC64LE-NEXT:    xvcvdpsxds 34, 34
+; PC64LE-NEXT:    xvcvdpsxds 35, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptosi_v4i64_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvcvdpsxds 35, 35
 ; PC64LE9-NEXT:    xvcvdpsxds 34, 34
+; PC64LE9-NEXT:    xvcvdpsxds 35, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f64(
@@ -5746,32 +5058,12 @@ entry:
 define <2 x i32> @constrained_vector_fptoui_v2i32_v2f32(<2 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v2i32_v2f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xscvdpuxws 0, 0
-; PC64LE-NEXT:    xscvdpuxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
+; PC64LE-NEXT:    xvcvspuxws 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v2i32_v2f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvdpuxws 0, 0
-; PC64LE9-NEXT:    xscvdpuxws 1, 1
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    mffprwz 3, 1
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    xxmrghw 34, 1, 0
+; PC64LE9-NEXT:    xvcvspuxws 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f32(
@@ -5783,51 +5075,12 @@ entry:
 define <3 x i32> @constrained_vector_fptoui_v3i32_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v3i32_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvdpuxws 0, 0
-; PC64LE-NEXT:    xscvdpuxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mffprwz 4, 1
-; PC64LE-NEXT:    mtfprwz 0, 4
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI113_0 at toc@ha
-; PC64LE-NEXT:    addi 3, 3, .LCPI113_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xscvdpuxws 0, 0
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtvsrwz 36, 3
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvcvspuxws 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v3i32_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxws 0, 0
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxws 0, 0
-; PC64LE9-NEXT:    mffprwz 4, 0
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    mtvsrwz 34, 3
-; PC64LE9-NEXT:    mtfprwz 1, 4
-; PC64LE9-NEXT:    addis 4, 2, .LCPI113_0 at toc@ha
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    addi 4, 4, .LCPI113_0 at toc@l
-; PC64LE9-NEXT:    xscvdpuxws 0, 0
-; PC64LE9-NEXT:    mffprwz 5, 0
-; PC64LE9-NEXT:    mtfprwz 0, 5
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(4)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvcvspuxws 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f32(
@@ -5875,30 +5128,16 @@ entry:
 define <2 x i64> @constrained_vector_fptoui_v2i64_v2f32(<2 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v2i64_v2f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xscvdpuxds 0, 0
-; PC64LE-NEXT:    xscvdpuxds 1, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xvcvspdp 0, 0
+; PC64LE-NEXT:    xvcvdpuxds 34, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v2i64_v2f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    mtvsrdd 34, 4, 3
+; PC64LE9-NEXT:    xxmrglw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
+; PC64LE9-NEXT:    xvcvdpuxds 34, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f32(
@@ -5910,34 +5149,29 @@ entry:
 define <3 x i64> @constrained_vector_fptoui_v3i64_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v3i64_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvdpuxds 0, 0
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    xscvspdpn 0, 1
-; PC64LE-NEXT:    xscvdpuxds 0, 0
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
+; PC64LE-NEXT:    xvcvspdp 0, 0
+; PC64LE-NEXT:    xscvspdpn 1, 1
+; PC64LE-NEXT:    xvcvdpuxds 0, 0
+; PC64LE-NEXT:    xscvdpuxds 1, 1
+; PC64LE-NEXT:    mffprd 5, 1
 ; PC64LE-NEXT:    mffprd 4, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xscvdpuxds 0, 0
-; PC64LE-NEXT:    mffprd 5, 0
+; PC64LE-NEXT:    xxswapd 2, 0
+; PC64LE-NEXT:    mffprd 3, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v3i64_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
 ; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxmrglw 1, 34, 34
 ; PC64LE9-NEXT:    xscvspdpn 0, 0
+; PC64LE9-NEXT:    xvcvspdp 1, 1
 ; PC64LE9-NEXT:    xscvdpuxds 0, 0
 ; PC64LE9-NEXT:    mffprd 5, 0
+; PC64LE9-NEXT:    xvcvdpuxds 0, 1
+; PC64LE9-NEXT:    mfvsrld 3, 0
+; PC64LE9-NEXT:    mffprd 4, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f32(
@@ -5949,50 +5183,23 @@ entry:
 define <4 x i64> @constrained_vector_fptoui_v4i64_v4f32(<4 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v4i64_v4f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvdpuxds 0, 0
-; PC64LE-NEXT:    xscvdpuxds 1, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xxmrghd 36, 1, 0
-; PC64LE-NEXT:    xscvspdpn 0, 34
-; PC64LE-NEXT:    xscvspdpn 1, 2
-; PC64LE-NEXT:    vmr 2, 4
-; PC64LE-NEXT:    xscvdpuxds 0, 0
-; PC64LE-NEXT:    xscvdpuxds 1, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xxmrghd 35, 0, 1
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xxmrghw 1, 34, 34
+; PC64LE-NEXT:    xvcvspdp 0, 0
+; PC64LE-NEXT:    xvcvdpuxds 34, 0
+; PC64LE-NEXT:    xvcvspdp 0, 1
+; PC64LE-NEXT:    xvcvdpuxds 35, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v4i64_v4f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    xscvspdpn 0, 34
-; PC64LE9-NEXT:    mtvsrdd 36, 4, 3
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxmrglw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
+; PC64LE9-NEXT:    xvcvdpuxds 36, 0
+; PC64LE9-NEXT:    xxmrghw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
 ; PC64LE9-NEXT:    vmr 2, 4
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 0
-; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    mtvsrdd 35, 3, 4
+; PC64LE9-NEXT:    xvcvdpuxds 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f32(
@@ -6023,13 +5230,9 @@ entry:
 define <2 x i32> @constrained_vector_fptoui_v2i32_v2f64(<2 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v2i32_v2f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xscvdpuxws 1, 34
 ; PC64LE-NEXT:    xxswapd 0, 34
+; PC64LE-NEXT:    xscvdpuxws 1, 34
 ; PC64LE-NEXT:    xscvdpuxws 0, 0
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
 ; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    blr
 ;
@@ -6038,10 +5241,6 @@ define <2 x i32> @constrained_vector_fptoui_v2i32_v2f64(<2 x double> %x) #0 {
 ; PC64LE9-NEXT:    xxswapd 1, 34
 ; PC64LE9-NEXT:    xscvdpuxws 0, 34
 ; PC64LE9-NEXT:    xscvdpuxws 1, 1
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    mffprwz 3, 1
-; PC64LE9-NEXT:    mtfprwz 1, 3
 ; PC64LE9-NEXT:    xxmrghw 34, 0, 1
 ; PC64LE9-NEXT:    blr
 entry:
@@ -6054,38 +5253,26 @@ entry:
 define <3 x i32> @constrained_vector_fptoui_v3i32_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v3i32_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xscvdpuxws 0, 2
-; PC64LE-NEXT:    xscvdpuxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mffprwz 4, 1
-; PC64LE-NEXT:    mtfprwz 0, 4
-; PC64LE-NEXT:    mtfprwz 1, 3
+; PC64LE-NEXT:    xscvdpuxws 0, 1
+; PC64LE-NEXT:    xscvdpuxws 1, 2
 ; PC64LE-NEXT:    addis 3, 2, .LCPI121_0 at toc@ha
 ; PC64LE-NEXT:    addi 3, 3, .LCPI121_0 at toc@l
+; PC64LE-NEXT:    xscvdpuxws 36, 3
 ; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
 ; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvdpuxws 0, 3
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtvsrwz 36, 3
 ; PC64LE-NEXT:    vperm 2, 4, 2, 3
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v3i32_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xscvdpuxws 0, 3
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xscvdpuxws 0, 2
-; PC64LE9-NEXT:    mtvsrwz 34, 3
-; PC64LE9-NEXT:    mffprwz 4, 0
 ; PC64LE9-NEXT:    xscvdpuxws 0, 1
-; PC64LE9-NEXT:    mtfprwz 1, 4
-; PC64LE9-NEXT:    addis 4, 2, .LCPI121_0 at toc@ha
-; PC64LE9-NEXT:    addi 4, 4, .LCPI121_0 at toc@l
-; PC64LE9-NEXT:    mffprwz 5, 0
-; PC64LE9-NEXT:    mtfprwz 0, 5
+; PC64LE9-NEXT:    xscvdpuxws 1, 2
+; PC64LE9-NEXT:    addis 3, 2, .LCPI121_0 at toc@ha
+; PC64LE9-NEXT:    xscvdpuxws 34, 3
+; PC64LE9-NEXT:    addi 3, 3, .LCPI121_0 at toc@l
 ; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(4)
+; PC64LE9-NEXT:    lxv 0, 0(3)
 ; PC64LE9-NEXT:    xxperm 34, 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
@@ -6098,38 +5285,20 @@ entry:
 define <4 x i32> @constrained_vector_fptoui_v4i32_v4f64(<4 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v4i32_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xscvdpuxws 2, 34
-; PC64LE-NEXT:    xxswapd 1, 35
-; PC64LE-NEXT:    xscvdpuxws 0, 0
-; PC64LE-NEXT:    xscvdpuxws 1, 1
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    xscvdpuxws 2, 35
-; PC64LE-NEXT:    mffprwz 4, 0
-; PC64LE-NEXT:    rldimi 4, 3, 32, 0
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    mtfprd 0, 4
-; PC64LE-NEXT:    mffprwz 4, 1
-; PC64LE-NEXT:    rldimi 4, 3, 32, 0
-; PC64LE-NEXT:    mtfprd 1, 4
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrgld 0, 35, 34
+; PC64LE-NEXT:    xxmrghd 1, 35, 34
+; PC64LE-NEXT:    xvcvdpuxws 34, 0
+; PC64LE-NEXT:    xvcvdpuxws 35, 1
+; PC64LE-NEXT:    vmrgew 2, 3, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v4i32_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xscvdpuxws 0, 34
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvdpuxws 0, 0
-; PC64LE9-NEXT:    mffprwz 4, 0
-; PC64LE9-NEXT:    xscvdpuxws 0, 35
-; PC64LE9-NEXT:    rldimi 4, 3, 32, 0
-; PC64LE9-NEXT:    mffprwz 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 35
-; PC64LE9-NEXT:    xscvdpuxws 0, 0
-; PC64LE9-NEXT:    mffprwz 5, 0
-; PC64LE9-NEXT:    rldimi 5, 3, 32, 0
-; PC64LE9-NEXT:    mtvsrdd 34, 5, 4
+; PC64LE9-NEXT:    xxmrgld 0, 35, 34
+; PC64LE9-NEXT:    xvcvdpuxws 36, 0
+; PC64LE9-NEXT:    xxmrghd 0, 35, 34
+; PC64LE9-NEXT:    xvcvdpuxws 34, 0
+; PC64LE9-NEXT:    vmrgew 2, 2, 4
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f64(
@@ -6177,22 +5346,26 @@ entry:
 define <3 x i64> @constrained_vector_fptoui_v3i64_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v3i64_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xscvdpuxds 0, 1
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    xscvdpuxds 0, 2
+; PC64LE-NEXT:    xxmrghd 0, 2, 1
+; PC64LE-NEXT:    xxspltd 1, 3, 0
+; PC64LE-NEXT:    xvcvdpuxds 0, 0
+; PC64LE-NEXT:    xvcvdpuxds 34, 1
 ; PC64LE-NEXT:    mffprd 4, 0
-; PC64LE-NEXT:    xscvdpuxds 0, 3
-; PC64LE-NEXT:    mffprd 5, 0
+; PC64LE-NEXT:    xxswapd 1, 0
+; PC64LE-NEXT:    xxswapd 2, 34
+; PC64LE-NEXT:    mffprd 3, 1
+; PC64LE-NEXT:    mffprd 5, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v3i64_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xscvdpuxds 0, 1
-; PC64LE9-NEXT:    mffprd 3, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 2
+; PC64LE9-NEXT:    xxmrghd 0, 2, 1
+; PC64LE9-NEXT:    xxspltd 1, 3, 0
+; PC64LE9-NEXT:    xvcvdpuxds 0, 0
+; PC64LE9-NEXT:    xvcvdpuxds 34, 1
+; PC64LE9-NEXT:    mfvsrld 3, 0
+; PC64LE9-NEXT:    mfvsrld 5, 34
 ; PC64LE9-NEXT:    mffprd 4, 0
-; PC64LE9-NEXT:    xscvdpuxds 0, 3
-; PC64LE9-NEXT:    mffprd 5, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f64(
@@ -6204,14 +5377,14 @@ entry:
 define <4 x i64> @constrained_vector_fptoui_v4i64_v4f64(<4 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptoui_v4i64_v4f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvcvdpuxds 35, 35
 ; PC64LE-NEXT:    xvcvdpuxds 34, 34
+; PC64LE-NEXT:    xvcvdpuxds 35, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptoui_v4i64_v4f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvcvdpuxds 35, 35
 ; PC64LE9-NEXT:    xvcvdpuxds 34, 34
+; PC64LE9-NEXT:    xvcvdpuxds 35, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f64(
@@ -6269,33 +5442,33 @@ entry:
 define <3 x float> @constrained_vector_fptrunc_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fptrunc_v3f64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xsrsp 0, 3
-; PC64LE-NEXT:    xsrsp 2, 2
+; PC64LE-NEXT:    xsrsp 0, 1
+; PC64LE-NEXT:    xsrsp 1, 2
 ; PC64LE-NEXT:    addis 3, 2, .LCPI129_0 at toc@ha
 ; PC64LE-NEXT:    addi 3, 3, .LCPI129_0 at toc@l
-; PC64LE-NEXT:    xsrsp 1, 1
+; PC64LE-NEXT:    xscvdpspn 0, 0
 ; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 2, 2
+; PC64LE-NEXT:    xxmrghw 34, 1, 0
+; PC64LE-NEXT:    lxvd2x 0, 0, 3
+; PC64LE-NEXT:    xxswapd 35, 0
+; PC64LE-NEXT:    xsrsp 0, 3
 ; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    xxmrghw 34, 2, 1
-; PC64LE-NEXT:    lxvd2x 1, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 1
 ; PC64LE-NEXT:    vperm 2, 4, 2, 3
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fptrunc_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xsrsp 0, 3
-; PC64LE9-NEXT:    xsrsp 2, 2
-; PC64LE9-NEXT:    xsrsp 1, 1
+; PC64LE9-NEXT:    xsrsp 0, 1
+; PC64LE9-NEXT:    xsrsp 1, 2
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI129_0 at toc@ha
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI129_0 at toc@l
+; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 2, 1
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xxmrghw 35, 1, 0
+; PC64LE9-NEXT:    xsrsp 1, 3
+; PC64LE9-NEXT:    lxv 0, 0(3)
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    xxperm 34, 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64(
@@ -6349,20 +5522,14 @@ entry:
 define <2 x double> @constrained_vector_fpext_v2f32(<2 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fpext_v2f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xvcvspdp 34, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fpext_v2f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xxmrghd 34, 1, 0
+; PC64LE9-NEXT:    xxmrglw 0, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 34, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(
@@ -6374,22 +5541,21 @@ entry:
 define <3 x double> @constrained_vector_fpext_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fpext_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 3, 0
-; PC64LE-NEXT:    xxsldwi 4, 34, 34, 3
-; PC64LE-NEXT:    xscvspdpn 2, 1
-; PC64LE-NEXT:    xscvspdpn 1, 4
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xxsldwi 3, 34, 34, 1
+; PC64LE-NEXT:    xvcvspdp 2, 0
+; PC64LE-NEXT:    xscvspdpn 3, 3
+; PC64LE-NEXT:    xxswapd 1, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fpext_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 3, 0
-; PC64LE9-NEXT:    xxswapd 0, 34
-; PC64LE9-NEXT:    xscvspdpn 2, 0
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    xxmrghw 0, 34, 34
+; PC64LE9-NEXT:    xxmrglw 1, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
+; PC64LE9-NEXT:    xvcvspdp 2, 1
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32(
@@ -6401,29 +5567,18 @@ entry:
 define <4 x double> @constrained_vector_fpext_v4f32(<4 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_fpext_v4f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    xxmrghd 0, 1, 0
-; PC64LE-NEXT:    xscvspdpn 1, 34
-; PC64LE-NEXT:    xxlor 34, 0, 0
-; PC64LE-NEXT:    xxmrghd 35, 1, 2
+; PC64LE-NEXT:    xxmrglw 0, 34, 34
+; PC64LE-NEXT:    xxmrghw 1, 34, 34
+; PC64LE-NEXT:    xvcvspdp 34, 0
+; PC64LE-NEXT:    xvcvspdp 35, 1
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_fpext_v4f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    xscvspdpn 1, 34
-; PC64LE9-NEXT:    xxmrghd 35, 1, 2
+; PC64LE9-NEXT:    xxmrglw 0, 34, 34
+; PC64LE9-NEXT:    xxmrghw 1, 34, 34
+; PC64LE9-NEXT:    xvcvspdp 0, 0
+; PC64LE9-NEXT:    xvcvspdp 35, 1
 ; PC64LE9-NEXT:    xxlor 34, 0, 0
 ; PC64LE9-NEXT:    blr
 entry:
@@ -6470,45 +5625,12 @@ entry:
 define <3 x float> @constrained_vector_ceil_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_ceil_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI137_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    addi 3, 3, .LCPI137_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xsrdpip 0, 0
-; PC64LE-NEXT:    xsrdpip 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xsrdpip 0, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvrspip 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_ceil_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI137_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI137_0 at toc@l
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xsrdpip 1, 1
-; PC64LE9-NEXT:    xsrdpip 2, 2
-; PC64LE9-NEXT:    xsrdpip 0, 0
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xvrspip 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %ceil = call <3 x float> @llvm.experimental.constrained.ceil.v3f32(
@@ -6521,17 +5643,21 @@ define <3 x double> @constrained_vector_ceil_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_ceil_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 2, 1
-; PC64LE-NEXT:    xsrdpip 3, 3
+; PC64LE-NEXT:    xxspltd 1, 3, 0
 ; PC64LE-NEXT:    xvrdpip 2, 0
+; PC64LE-NEXT:    xvrdpip 0, 1
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_ceil_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 2, 1
-; PC64LE9-NEXT:    xsrdpip 3, 3
 ; PC64LE9-NEXT:    xvrdpip 2, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvrdpip 0, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %ceil = call <3 x double> @llvm.experimental.constrained.ceil.v3f64(
@@ -6578,45 +5704,12 @@ entry:
 define <3 x float> @constrained_vector_floor_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_floor_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI141_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    addi 3, 3, .LCPI141_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xsrdpim 0, 0
-; PC64LE-NEXT:    xsrdpim 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xsrdpim 0, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvrspim 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_floor_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI141_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI141_0 at toc@l
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xsrdpim 1, 1
-; PC64LE9-NEXT:    xsrdpim 2, 2
-; PC64LE9-NEXT:    xsrdpim 0, 0
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xvrspim 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %floor = call <3 x float> @llvm.experimental.constrained.floor.v3f32(
@@ -6629,17 +5722,21 @@ define <3 x double> @constrained_vector_floor_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_floor_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 2, 1
-; PC64LE-NEXT:    xsrdpim 3, 3
+; PC64LE-NEXT:    xxspltd 1, 3, 0
 ; PC64LE-NEXT:    xvrdpim 2, 0
+; PC64LE-NEXT:    xvrdpim 0, 1
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_floor_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 2, 1
-; PC64LE9-NEXT:    xsrdpim 3, 3
 ; PC64LE9-NEXT:    xvrdpim 2, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvrdpim 0, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %floor = call <3 x double> @llvm.experimental.constrained.floor.v3f64(
@@ -6685,45 +5782,12 @@ entry:
 define <3 x float> @constrained_vector_round_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_round_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI145_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    addi 3, 3, .LCPI145_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xsrdpi 0, 0
-; PC64LE-NEXT:    xsrdpi 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xsrdpi 0, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvrspi 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_round_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI145_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI145_0 at toc@l
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xsrdpi 1, 1
-; PC64LE9-NEXT:    xsrdpi 2, 2
-; PC64LE9-NEXT:    xsrdpi 0, 0
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xvrspi 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %round = call <3 x float> @llvm.experimental.constrained.round.v3f32(
@@ -6737,17 +5801,21 @@ define <3 x double> @constrained_vector_round_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_round_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 2, 1
-; PC64LE-NEXT:    xsrdpi 3, 3
+; PC64LE-NEXT:    xxspltd 1, 3, 0
 ; PC64LE-NEXT:    xvrdpi 2, 0
+; PC64LE-NEXT:    xvrdpi 0, 1
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_round_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 2, 1
-; PC64LE9-NEXT:    xsrdpi 3, 3
 ; PC64LE9-NEXT:    xvrdpi 2, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvrdpi 0, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %round = call <3 x double> @llvm.experimental.constrained.round.v3f64(
@@ -6793,45 +5861,12 @@ entry:
 define <3 x float> @constrained_vector_trunc_v3f32(<3 x float> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_trunc_v3f32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 3
-; PC64LE-NEXT:    addis 3, 2, .LCPI149_0 at toc@ha
-; PC64LE-NEXT:    xscvspdpn 0, 0
-; PC64LE-NEXT:    xscvspdpn 1, 1
-; PC64LE-NEXT:    addi 3, 3, .LCPI149_0 at toc@l
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    xsrdpiz 0, 0
-; PC64LE-NEXT:    xsrdpiz 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    xscvspdpn 0, 2
-; PC64LE-NEXT:    xsrdpiz 0, 0
-; PC64LE-NEXT:    xscvdpspn 36, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    xvrspiz 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_trunc_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xxswapd 1, 34
-; PC64LE9-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI149_0 at toc@ha
-; PC64LE9-NEXT:    addi 3, 3, .LCPI149_0 at toc@l
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 1
-; PC64LE9-NEXT:    xscvspdpn 2, 2
-; PC64LE9-NEXT:    xscvspdpn 0, 0
-; PC64LE9-NEXT:    xsrdpiz 1, 1
-; PC64LE9-NEXT:    xsrdpiz 2, 2
-; PC64LE9-NEXT:    xsrdpiz 0, 0
-; PC64LE9-NEXT:    xscvdpspn 2, 2
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xvrspiz 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %trunc = call <3 x float> @llvm.experimental.constrained.trunc.v3f32(
@@ -6844,17 +5879,21 @@ define <3 x double> @constrained_vector_trunc_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_trunc_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    xxmrghd 0, 2, 1
-; PC64LE-NEXT:    xsrdpiz 3, 3
+; PC64LE-NEXT:    xxspltd 1, 3, 0
 ; PC64LE-NEXT:    xvrdpiz 2, 0
+; PC64LE-NEXT:    xvrdpiz 0, 1
 ; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_trunc_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    xxmrghd 0, 2, 1
-; PC64LE9-NEXT:    xsrdpiz 3, 3
 ; PC64LE9-NEXT:    xvrdpiz 2, 0
+; PC64LE9-NEXT:    xxspltd 0, 3, 0
+; PC64LE9-NEXT:    xvrdpiz 0, 0
 ; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %trunc = call <3 x double> @llvm.experimental.constrained.trunc.v3f64(
@@ -6979,28 +6018,14 @@ entry:
 define <2 x double> @constrained_vector_sitofp_v2f64_v2i32(<2 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v2f64_v2i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwa 1, 3
-; PC64LE-NEXT:    xscvsxddp 0, 0
-; PC64LE-NEXT:    xscvsxddp 1, 1
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrglw 34, 34, 34
+; PC64LE-NEXT:    xvcvsxwdp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v2f64_v2i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvsxddp 0, 0
-; PC64LE9-NEXT:    mtfprwa 1, 3
-; PC64LE9-NEXT:    xscvsxddp 1, 1
-; PC64LE9-NEXT:    xxmrghd 34, 1, 0
+; PC64LE9-NEXT:    xxmrglw 34, 34, 34
+; PC64LE9-NEXT:    xvcvsxwdp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x double>
@@ -7013,32 +6038,12 @@ entry:
 define <2 x float> @constrained_vector_sitofp_v2f32_v2i32(<2 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v2f32_v2i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwa 1, 3
-; PC64LE-NEXT:    xscvsxdsp 0, 0
-; PC64LE-NEXT:    xscvsxdsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
+; PC64LE-NEXT:    xvcvsxwsp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v2f32_v2i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvsxdsp 0, 0
-; PC64LE9-NEXT:    mtfprwa 1, 3
-; PC64LE9-NEXT:    xscvsxdsp 1, 1
-; PC64LE9-NEXT:    xscvdpspn 0, 0
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xxmrghw 34, 1, 0
+; PC64LE9-NEXT:    xvcvsxwsp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x float>
@@ -7069,12 +6074,8 @@ entry:
 define <2 x float> @constrained_vector_sitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v2f32_v2i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mfvsrd 3, 34
 ; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    xscvsxdsp 1, 1
+; PC64LE-NEXT:    xscvsxdsp 1, 34
 ; PC64LE-NEXT:    xscvsxdsp 0, 0
 ; PC64LE-NEXT:    xscvdpspn 1, 1
 ; PC64LE-NEXT:    xscvdpspn 0, 0
@@ -7083,14 +6084,11 @@ define <2 x float> @constrained_vector_sitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v2f32_v2i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mfvsrld 3, 34
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    mfvsrd 3, 34
-; PC64LE9-NEXT:    mtfprd 1, 3
+; PC64LE9-NEXT:    xxswapd 0, 34
+; PC64LE9-NEXT:    xscvsxdsp 1, 34
 ; PC64LE9-NEXT:    xscvsxdsp 0, 0
-; PC64LE9-NEXT:    xscvsxdsp 1, 1
-; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xscvdpspn 1, 1
+; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE9-NEXT:    blr
 entry:
@@ -7104,32 +6102,22 @@ entry:
 define <3 x double> @constrained_vector_sitofp_v3f64_v3i32(<3 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v3f64_v3i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    xscvsxddp 1, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    mfvsrwz 3, 34
-; PC64LE-NEXT:    xscvsxddp 2, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    xscvsxddp 3, 0
+; PC64LE-NEXT:    xxmrghw 35, 34, 34
+; PC64LE-NEXT:    xxmrglw 34, 34, 34
+; PC64LE-NEXT:    xvcvsxwdp 0, 35
+; PC64LE-NEXT:    xvcvsxwdp 2, 34
+; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v3f64_v3i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvsxddp 1, 0
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    mfvsrwz 3, 34
-; PC64LE9-NEXT:    xscvsxddp 2, 0
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    xscvsxddp 3, 0
+; PC64LE9-NEXT:    xxmrghw 35, 34, 34
+; PC64LE9-NEXT:    xxmrglw 34, 34, 34
+; PC64LE9-NEXT:    xvcvsxwdp 0, 35
+; PC64LE9-NEXT:    xvcvsxwdp 2, 34
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x double>
@@ -7142,49 +6130,12 @@ entry:
 define <3 x float> @constrained_vector_sitofp_v3f32_v3i32(<3 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v3f32_v3i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwa 1, 3
-; PC64LE-NEXT:    xscvsxdsp 0, 0
-; PC64LE-NEXT:    addis 3, 2, .LCPI161_0 at toc@ha
-; PC64LE-NEXT:    addi 3, 3, .LCPI161_0 at toc@l
-; PC64LE-NEXT:    xscvsxdsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 35, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    mfvsrwz 3, 34
-; PC64LE-NEXT:    xxswapd 36, 0
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    xscvsxdsp 0, 0
-; PC64LE-NEXT:    xscvdpspn 34, 0
-; PC64LE-NEXT:    vperm 2, 2, 3, 4
+; PC64LE-NEXT:    xvcvsxwsp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v3f32_v3i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvsxdsp 0, 0
-; PC64LE9-NEXT:    mtfprwa 1, 3
-; PC64LE9-NEXT:    mfvsrwz 3, 34
-; PC64LE9-NEXT:    xscvsxdsp 1, 1
-; PC64LE9-NEXT:    mtfprwa 2, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI161_0 at toc@ha
-; PC64LE9-NEXT:    xscvsxdsp 2, 2
-; PC64LE9-NEXT:    xscvdpspn 0, 0
-; PC64LE9-NEXT:    addi 3, 3, .LCPI161_0 at toc@l
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 2
-; PC64LE9-NEXT:    xxmrghw 35, 0, 1
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvcvsxwsp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x float>
@@ -7198,21 +6149,25 @@ define <3 x double> @constrained_vector_sitofp_v3f64_v3i64(<3 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v3f64_v3i64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    xscvsxddp 1, 0
-; PC64LE-NEXT:    mtfprd 0, 4
-; PC64LE-NEXT:    xscvsxddp 2, 0
+; PC64LE-NEXT:    mtfprd 1, 4
+; PC64LE-NEXT:    xxmrghd 34, 1, 0
 ; PC64LE-NEXT:    mtfprd 0, 5
-; PC64LE-NEXT:    xscvsxddp 3, 0
+; PC64LE-NEXT:    xvcvsxddp 2, 34
+; PC64LE-NEXT:    xxswapd 35, 0
+; PC64LE-NEXT:    xvcvsxddp 0, 35
+; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v3f64_v3i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    xscvsxddp 1, 0
-; PC64LE9-NEXT:    mtfprd 0, 4
-; PC64LE9-NEXT:    xscvsxddp 2, 0
 ; PC64LE9-NEXT:    mtfprd 0, 5
-; PC64LE9-NEXT:    xscvsxddp 3, 0
+; PC64LE9-NEXT:    mtvsrdd 34, 4, 3
+; PC64LE9-NEXT:    xxswapd 35, 0
+; PC64LE9-NEXT:    xvcvsxddp 2, 34
+; PC64LE9-NEXT:    xvcvsxddp 0, 35
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x double>
@@ -7225,15 +6180,15 @@ entry:
 define <3 x float> @constrained_vector_sitofp_v3f32_v3i64(<3 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v3f32_v3i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mtfprd 0, 4
-; PC64LE-NEXT:    mtfprd 1, 3
+; PC64LE-NEXT:    mtfprd 0, 3
+; PC64LE-NEXT:    mtfprd 1, 4
 ; PC64LE-NEXT:    addis 3, 2, .LCPI163_0 at toc@ha
 ; PC64LE-NEXT:    addi 3, 3, .LCPI163_0 at toc@l
 ; PC64LE-NEXT:    xscvsxdsp 0, 0
 ; PC64LE-NEXT:    xscvsxdsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
 ; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
+; PC64LE-NEXT:    xscvdpspn 1, 1
+; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
 ; PC64LE-NEXT:    xxswapd 35, 0
 ; PC64LE-NEXT:    mtfprd 0, 5
@@ -7244,20 +6199,20 @@ define <3 x float> @constrained_vector_sitofp_v3f32_v3i64(<3 x i64> %x) #0 {
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v3f32_v3i64:
 ; PC64LE9:       # %bb.0: # %entry
+; PC64LE9-NEXT:    mtfprd 0, 3
 ; PC64LE9-NEXT:    mtfprd 1, 4
-; PC64LE9-NEXT:    mtfprd 2, 3
-; PC64LE9-NEXT:    mtfprd 0, 5
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI163_0 at toc@ha
-; PC64LE9-NEXT:    xscvsxdsp 1, 1
-; PC64LE9-NEXT:    xscvsxdsp 2, 2
 ; PC64LE9-NEXT:    xscvsxdsp 0, 0
+; PC64LE9-NEXT:    xscvsxdsp 1, 1
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI163_0 at toc@l
-; PC64LE9-NEXT:    xscvdpspn 2, 2
+; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xxmrghw 35, 1, 0
+; PC64LE9-NEXT:    mtfprd 1, 5
+; PC64LE9-NEXT:    lxv 0, 0(3)
+; PC64LE9-NEXT:    xscvsxdsp 1, 1
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    xxperm 34, 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x float>
@@ -7270,46 +6225,19 @@ entry:
 define <4 x double> @constrained_vector_sitofp_v4f64_v4i32(<4 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v4f64_v4i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE-NEXT:    mtfprwa 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwa 1, 3
-; PC64LE-NEXT:    xscvsxddp 0, 0
-; PC64LE-NEXT:    mfvsrwz 3, 34
-; PC64LE-NEXT:    xscvsxddp 1, 1
-; PC64LE-NEXT:    xxmrghd 0, 1, 0
-; PC64LE-NEXT:    mtfprwa 1, 3
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    xxlor 34, 0, 0
-; PC64LE-NEXT:    mtfprwa 2, 3
-; PC64LE-NEXT:    xscvsxddp 1, 1
-; PC64LE-NEXT:    xscvsxddp 2, 2
-; PC64LE-NEXT:    xxmrghd 35, 2, 1
+; PC64LE-NEXT:    xxmrglw 35, 34, 34
+; PC64LE-NEXT:    xxmrghw 36, 34, 34
+; PC64LE-NEXT:    xvcvsxwdp 34, 35
+; PC64LE-NEXT:    xvcvsxwdp 35, 36
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v4f64_v4i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwa 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvsxddp 0, 0
-; PC64LE9-NEXT:    mtfprwa 1, 3
-; PC64LE9-NEXT:    li 3, 12
-; PC64LE9-NEXT:    xscvsxddp 1, 1
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    mtfprwa 1, 3
-; PC64LE9-NEXT:    mfvsrwz 3, 34
-; PC64LE9-NEXT:    mtfprwa 2, 3
-; PC64LE9-NEXT:    xscvsxddp 1, 1
-; PC64LE9-NEXT:    xscvsxddp 2, 2
+; PC64LE9-NEXT:    xxmrglw 35, 34, 34
+; PC64LE9-NEXT:    xxmrghw 34, 34, 34
+; PC64LE9-NEXT:    xvcvsxwdp 0, 35
+; PC64LE9-NEXT:    xvcvsxwdp 35, 34
 ; PC64LE9-NEXT:    xxlor 34, 0, 0
-; PC64LE9-NEXT:    xxmrghd 35, 1, 2
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x double>
@@ -7365,14 +6293,14 @@ entry:
 define <4 x double> @constrained_vector_sitofp_v4f64_v4i64(<4 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v4f64_v4i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvcvsxddp 35, 35
 ; PC64LE-NEXT:    xvcvsxddp 34, 34
+; PC64LE-NEXT:    xvcvsxddp 35, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v4f64_v4i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvcvsxddp 35, 35
 ; PC64LE9-NEXT:    xvcvsxddp 34, 34
+; PC64LE9-NEXT:    xvcvsxddp 35, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x double>
@@ -7385,46 +6313,20 @@ entry:
 define <4 x float> @constrained_vector_sitofp_v4f32_v4i64(<4 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_sitofp_v4f32_v4i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mfvsrd 3, 34
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxswapd 1, 35
-; PC64LE-NEXT:    mtfprd 2, 3
-; PC64LE-NEXT:    mfvsrd 3, 35
-; PC64LE-NEXT:    mtfprd 3, 3
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    xscvsxdsp 2, 2
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    xscvsxdsp 3, 3
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xscvsxdsp 0, 0
-; PC64LE-NEXT:    xscvsxdsp 1, 1
-; PC64LE-NEXT:    xxmrghd 2, 3, 2
-; PC64LE-NEXT:    xvcvdpsp 34, 2
-; PC64LE-NEXT:    xxmrghd 0, 1, 0
-; PC64LE-NEXT:    xvcvdpsp 35, 0
-; PC64LE-NEXT:    vmrgew 2, 2, 3
+; PC64LE-NEXT:    xvcvsxdsp 0, 34
+; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
+; PC64LE-NEXT:    xvcvsxdsp 0, 35
+; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
+; PC64LE-NEXT:    vpkudum 2, 3, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_sitofp_v4f32_v4i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mfvsrld 3, 34
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    mfvsrld 3, 35
-; PC64LE9-NEXT:    xscvsxdsp 0, 0
-; PC64LE9-NEXT:    mtfprd 1, 3
-; PC64LE9-NEXT:    mfvsrd 3, 34
-; PC64LE9-NEXT:    xscvsxdsp 1, 1
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    xvcvdpsp 36, 0
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    mfvsrd 3, 35
-; PC64LE9-NEXT:    mtfprd 1, 3
-; PC64LE9-NEXT:    xscvsxdsp 0, 0
-; PC64LE9-NEXT:    xscvsxdsp 1, 1
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    xvcvdpsp 34, 0
-; PC64LE9-NEXT:    vmrgew 2, 2, 4
+; PC64LE9-NEXT:    xvcvsxdsp 0, 34
+; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
+; PC64LE9-NEXT:    xvcvsxdsp 0, 35
+; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
+; PC64LE9-NEXT:    vpkudum 2, 3, 2
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x float>
@@ -7546,28 +6448,14 @@ entry:
 define <2 x double> @constrained_vector_uitofp_v2f64_v2i32(<2 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v2f64_v2i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    xscvuxddp 0, 0
-; PC64LE-NEXT:    xscvuxddp 1, 1
-; PC64LE-NEXT:    xxmrghd 34, 1, 0
+; PC64LE-NEXT:    xxmrglw 34, 34, 34
+; PC64LE-NEXT:    xvcvuxwdp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v2f64_v2i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvuxddp 0, 0
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    xscvuxddp 1, 1
-; PC64LE9-NEXT:    xxmrghd 34, 1, 0
+; PC64LE9-NEXT:    xxmrglw 34, 34, 34
+; PC64LE9-NEXT:    xvcvuxwdp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x double>
@@ -7580,32 +6468,12 @@ entry:
 define <2 x float> @constrained_vector_uitofp_v2f32_v2i32(<2 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v2f32_v2i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    xscvuxdsp 0, 0
-; PC64LE-NEXT:    xscvuxdsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
+; PC64LE-NEXT:    xvcvuxwsp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v2f32_v2i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvuxdsp 0, 0
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    xscvuxdsp 1, 1
-; PC64LE9-NEXT:    xscvdpspn 0, 0
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xxmrghw 34, 1, 0
+; PC64LE9-NEXT:    xvcvuxwsp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <2 x float>
@@ -7636,12 +6504,8 @@ entry:
 define <2 x float> @constrained_vector_uitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v2f32_v2i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mfvsrd 3, 34
 ; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    xscvuxdsp 1, 1
+; PC64LE-NEXT:    xscvuxdsp 1, 34
 ; PC64LE-NEXT:    xscvuxdsp 0, 0
 ; PC64LE-NEXT:    xscvdpspn 1, 1
 ; PC64LE-NEXT:    xscvdpspn 0, 0
@@ -7650,14 +6514,11 @@ define <2 x float> @constrained_vector_uitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v2f32_v2i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mfvsrld 3, 34
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    mfvsrd 3, 34
-; PC64LE9-NEXT:    mtfprd 1, 3
+; PC64LE9-NEXT:    xxswapd 0, 34
+; PC64LE9-NEXT:    xscvuxdsp 1, 34
 ; PC64LE9-NEXT:    xscvuxdsp 0, 0
-; PC64LE9-NEXT:    xscvuxdsp 1, 1
-; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xscvdpspn 1, 1
+; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE9-NEXT:    blr
 entry:
@@ -7668,35 +6529,25 @@ entry:
   ret <2 x float> %result
 }
 
-define <3 x double> @constrained_vector_uitofp_v3f64_v3i32(<3 x i32> %x) #0 {
-; PC64LE-LABEL: constrained_vector_uitofp_v3f64_v3i32:
-; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    xscvuxddp 1, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mfvsrwz 3, 34
-; PC64LE-NEXT:    xscvuxddp 2, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    xscvuxddp 3, 0
+define <3 x double> @constrained_vector_uitofp_v3f64_v3i32(<3 x i32> %x) #0 {
+; PC64LE-LABEL: constrained_vector_uitofp_v3f64_v3i32:
+; PC64LE:       # %bb.0: # %entry
+; PC64LE-NEXT:    xxmrghw 35, 34, 34
+; PC64LE-NEXT:    xxmrglw 34, 34, 34
+; PC64LE-NEXT:    xvcvuxwdp 0, 35
+; PC64LE-NEXT:    xvcvuxwdp 2, 34
+; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v3f64_v3i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvuxddp 1, 0
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    mfvsrwz 3, 34
-; PC64LE9-NEXT:    xscvuxddp 2, 0
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    xscvuxddp 3, 0
+; PC64LE9-NEXT:    xxmrghw 35, 34, 34
+; PC64LE9-NEXT:    xxmrglw 34, 34, 34
+; PC64LE9-NEXT:    xvcvuxwdp 0, 35
+; PC64LE9-NEXT:    xvcvuxwdp 2, 34
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x double>
@@ -7709,49 +6560,12 @@ entry:
 define <3 x float> @constrained_vector_uitofp_v3f32_v3i32(<3 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v3f32_v3i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxswapd 1, 34
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    xscvuxdsp 0, 0
-; PC64LE-NEXT:    addis 3, 2, .LCPI179_0 at toc@ha
-; PC64LE-NEXT:    addi 3, 3, .LCPI179_0 at toc@l
-; PC64LE-NEXT:    xscvuxdsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xscvdpspn 1, 1
-; PC64LE-NEXT:    xxmrghw 35, 0, 1
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    mfvsrwz 3, 34
-; PC64LE-NEXT:    xxswapd 36, 0
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    xscvuxdsp 0, 0
-; PC64LE-NEXT:    xscvdpspn 34, 0
-; PC64LE-NEXT:    vperm 2, 2, 3, 4
+; PC64LE-NEXT:    xvcvuxwsp 34, 34
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v3f32_v3i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvuxdsp 0, 0
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    mfvsrwz 3, 34
-; PC64LE9-NEXT:    xscvuxdsp 1, 1
-; PC64LE9-NEXT:    mtfprwz 2, 3
-; PC64LE9-NEXT:    addis 3, 2, .LCPI179_0 at toc@ha
-; PC64LE9-NEXT:    xscvuxdsp 2, 2
-; PC64LE9-NEXT:    xscvdpspn 0, 0
-; PC64LE9-NEXT:    addi 3, 3, .LCPI179_0 at toc@l
-; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 2
-; PC64LE9-NEXT:    xxmrghw 35, 0, 1
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xvcvuxwsp 34, 34
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x float>
@@ -7765,21 +6579,25 @@ define <3 x double> @constrained_vector_uitofp_v3f64_v3i64(<3 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v3f64_v3i64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    xscvuxddp 1, 0
-; PC64LE-NEXT:    mtfprd 0, 4
-; PC64LE-NEXT:    xscvuxddp 2, 0
+; PC64LE-NEXT:    mtfprd 1, 4
+; PC64LE-NEXT:    xxmrghd 34, 1, 0
 ; PC64LE-NEXT:    mtfprd 0, 5
-; PC64LE-NEXT:    xscvuxddp 3, 0
+; PC64LE-NEXT:    xvcvuxddp 2, 34
+; PC64LE-NEXT:    xxswapd 35, 0
+; PC64LE-NEXT:    xvcvuxddp 0, 35
+; PC64LE-NEXT:    xxswapd 1, 2
+; PC64LE-NEXT:    xxswapd 3, 0
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v3f64_v3i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    xscvuxddp 1, 0
-; PC64LE9-NEXT:    mtfprd 0, 4
-; PC64LE9-NEXT:    xscvuxddp 2, 0
 ; PC64LE9-NEXT:    mtfprd 0, 5
-; PC64LE9-NEXT:    xscvuxddp 3, 0
+; PC64LE9-NEXT:    mtvsrdd 34, 4, 3
+; PC64LE9-NEXT:    xxswapd 35, 0
+; PC64LE9-NEXT:    xvcvuxddp 2, 34
+; PC64LE9-NEXT:    xvcvuxddp 0, 35
+; PC64LE9-NEXT:    xxswapd 1, 2
+; PC64LE9-NEXT:    xxswapd 3, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x double>
@@ -7792,15 +6610,15 @@ entry:
 define <3 x float> @constrained_vector_uitofp_v3f32_v3i64(<3 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v3f32_v3i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mtfprd 0, 4
-; PC64LE-NEXT:    mtfprd 1, 3
+; PC64LE-NEXT:    mtfprd 0, 3
+; PC64LE-NEXT:    mtfprd 1, 4
 ; PC64LE-NEXT:    addis 3, 2, .LCPI181_0 at toc@ha
 ; PC64LE-NEXT:    addi 3, 3, .LCPI181_0 at toc@l
 ; PC64LE-NEXT:    xscvuxdsp 0, 0
 ; PC64LE-NEXT:    xscvuxdsp 1, 1
-; PC64LE-NEXT:    xscvdpspn 1, 1
 ; PC64LE-NEXT:    xscvdpspn 0, 0
-; PC64LE-NEXT:    xxmrghw 34, 0, 1
+; PC64LE-NEXT:    xscvdpspn 1, 1
+; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
 ; PC64LE-NEXT:    xxswapd 35, 0
 ; PC64LE-NEXT:    mtfprd 0, 5
@@ -7811,20 +6629,20 @@ define <3 x float> @constrained_vector_uitofp_v3f32_v3i64(<3 x i64> %x) #0 {
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v3f32_v3i64:
 ; PC64LE9:       # %bb.0: # %entry
+; PC64LE9-NEXT:    mtfprd 0, 3
 ; PC64LE9-NEXT:    mtfprd 1, 4
-; PC64LE9-NEXT:    mtfprd 2, 3
-; PC64LE9-NEXT:    mtfprd 0, 5
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI181_0 at toc@ha
-; PC64LE9-NEXT:    xscvuxdsp 1, 1
-; PC64LE9-NEXT:    xscvuxdsp 2, 2
 ; PC64LE9-NEXT:    xscvuxdsp 0, 0
+; PC64LE9-NEXT:    xscvuxdsp 1, 1
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI181_0 at toc@l
-; PC64LE9-NEXT:    xscvdpspn 2, 2
+; PC64LE9-NEXT:    xscvdpspn 0, 0
 ; PC64LE9-NEXT:    xscvdpspn 1, 1
-; PC64LE9-NEXT:    xscvdpspn 34, 0
-; PC64LE9-NEXT:    xxmrghw 35, 1, 2
-; PC64LE9-NEXT:    lxv 1, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 1
+; PC64LE9-NEXT:    xxmrghw 35, 1, 0
+; PC64LE9-NEXT:    mtfprd 1, 5
+; PC64LE9-NEXT:    lxv 0, 0(3)
+; PC64LE9-NEXT:    xscvuxdsp 1, 1
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    xxperm 34, 35, 0
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <3 x float>
@@ -7837,46 +6655,19 @@ entry:
 define <4 x double> @constrained_vector_uitofp_v4f64_v4i32(<4 x i32> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v4f64_v4i32:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxsldwi 1, 34, 34, 1
-; PC64LE-NEXT:    mffprwz 3, 0
-; PC64LE-NEXT:    xxsldwi 2, 34, 34, 3
-; PC64LE-NEXT:    mtfprwz 0, 3
-; PC64LE-NEXT:    mffprwz 3, 1
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    xscvuxddp 0, 0
-; PC64LE-NEXT:    mfvsrwz 3, 34
-; PC64LE-NEXT:    xscvuxddp 1, 1
-; PC64LE-NEXT:    xxmrghd 0, 1, 0
-; PC64LE-NEXT:    mtfprwz 1, 3
-; PC64LE-NEXT:    mffprwz 3, 2
-; PC64LE-NEXT:    xxlor 34, 0, 0
-; PC64LE-NEXT:    mtfprwz 2, 3
-; PC64LE-NEXT:    xscvuxddp 1, 1
-; PC64LE-NEXT:    xscvuxddp 2, 2
-; PC64LE-NEXT:    xxmrghd 35, 2, 1
+; PC64LE-NEXT:    xxmrglw 35, 34, 34
+; PC64LE-NEXT:    xxmrghw 36, 34, 34
+; PC64LE-NEXT:    xvcvuxwdp 34, 35
+; PC64LE-NEXT:    xvcvuxwdp 35, 36
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v4f64_v4i32:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    li 3, 0
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    mtfprwz 0, 3
-; PC64LE9-NEXT:    li 3, 4
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xscvuxddp 0, 0
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    li 3, 12
-; PC64LE9-NEXT:    xscvuxddp 1, 1
-; PC64LE9-NEXT:    vextuwrx 3, 3, 2
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    mtfprwz 1, 3
-; PC64LE9-NEXT:    mfvsrwz 3, 34
-; PC64LE9-NEXT:    mtfprwz 2, 3
-; PC64LE9-NEXT:    xscvuxddp 1, 1
-; PC64LE9-NEXT:    xscvuxddp 2, 2
+; PC64LE9-NEXT:    xxmrglw 35, 34, 34
+; PC64LE9-NEXT:    xxmrghw 34, 34, 34
+; PC64LE9-NEXT:    xvcvuxwdp 0, 35
+; PC64LE9-NEXT:    xvcvuxwdp 35, 34
 ; PC64LE9-NEXT:    xxlor 34, 0, 0
-; PC64LE9-NEXT:    xxmrghd 35, 1, 2
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x double>
@@ -7929,14 +6720,14 @@ entry:
 define <4 x double> @constrained_vector_uitofp_v4f64_v4i64(<4 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v4f64_v4i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    xvcvuxddp 35, 35
 ; PC64LE-NEXT:    xvcvuxddp 34, 34
+; PC64LE-NEXT:    xvcvuxddp 35, 35
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v4f64_v4i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    xvcvuxddp 35, 35
 ; PC64LE9-NEXT:    xvcvuxddp 34, 34
+; PC64LE9-NEXT:    xvcvuxddp 35, 35
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x double>
@@ -7949,46 +6740,20 @@ entry:
 define <4 x float> @constrained_vector_uitofp_v4f32_v4i64(<4 x i64> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_uitofp_v4f32_v4i64:
 ; PC64LE:       # %bb.0: # %entry
-; PC64LE-NEXT:    mfvsrd 3, 34
-; PC64LE-NEXT:    xxswapd 0, 34
-; PC64LE-NEXT:    xxswapd 1, 35
-; PC64LE-NEXT:    mtfprd 2, 3
-; PC64LE-NEXT:    mfvsrd 3, 35
-; PC64LE-NEXT:    mtfprd 3, 3
-; PC64LE-NEXT:    mffprd 3, 0
-; PC64LE-NEXT:    xscvuxdsp 2, 2
-; PC64LE-NEXT:    mtfprd 0, 3
-; PC64LE-NEXT:    mffprd 3, 1
-; PC64LE-NEXT:    xscvuxdsp 3, 3
-; PC64LE-NEXT:    mtfprd 1, 3
-; PC64LE-NEXT:    xscvuxdsp 0, 0
-; PC64LE-NEXT:    xscvuxdsp 1, 1
-; PC64LE-NEXT:    xxmrghd 2, 3, 2
-; PC64LE-NEXT:    xvcvdpsp 34, 2
-; PC64LE-NEXT:    xxmrghd 0, 1, 0
-; PC64LE-NEXT:    xvcvdpsp 35, 0
-; PC64LE-NEXT:    vmrgew 2, 2, 3
+; PC64LE-NEXT:    xvcvuxdsp 0, 34
+; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
+; PC64LE-NEXT:    xvcvuxdsp 0, 35
+; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
+; PC64LE-NEXT:    vpkudum 2, 3, 2
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_uitofp_v4f32_v4i64:
 ; PC64LE9:       # %bb.0: # %entry
-; PC64LE9-NEXT:    mfvsrld 3, 34
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    mfvsrld 3, 35
-; PC64LE9-NEXT:    xscvuxdsp 0, 0
-; PC64LE9-NEXT:    mtfprd 1, 3
-; PC64LE9-NEXT:    mfvsrd 3, 34
-; PC64LE9-NEXT:    xscvuxdsp 1, 1
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    xvcvdpsp 36, 0
-; PC64LE9-NEXT:    mtfprd 0, 3
-; PC64LE9-NEXT:    mfvsrd 3, 35
-; PC64LE9-NEXT:    mtfprd 1, 3
-; PC64LE9-NEXT:    xscvuxdsp 0, 0
-; PC64LE9-NEXT:    xscvuxdsp 1, 1
-; PC64LE9-NEXT:    xxmrghd 0, 1, 0
-; PC64LE9-NEXT:    xvcvdpsp 34, 0
-; PC64LE9-NEXT:    vmrgew 2, 2, 4
+; PC64LE9-NEXT:    xvcvuxdsp 0, 34
+; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
+; PC64LE9-NEXT:    xvcvuxdsp 0, 35
+; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
+; PC64LE9-NEXT:    vpkudum 2, 3, 2
 ; PC64LE9-NEXT:    blr
 entry:
   %result = call <4 x float>
@@ -8093,39 +6858,37 @@ define <3 x float> @constrained_vector_tan_v3f32(<3 x float> %x) #0 {
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
 ; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 2
 ; PC64LE-NEXT:    bl tanf
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xxswapd 0, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xscvdpspn 62, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl tanf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
+; PC64LE-NEXT:    addis 3, 2, .LCPI189_0 at toc@ha
+; PC64LE-NEXT:    xscvdpspn 0, 1
+; PC64LE-NEXT:    addi 3, 3, .LCPI189_0 at toc@l
+; PC64LE-NEXT:    lxvd2x 1, 0, 3
+; PC64LE-NEXT:    xxmrghw 62, 0, 62
+; PC64LE-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE-NEXT:    xxswapd 63, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl tanf
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
-; PC64LE-NEXT:    addis 3, 2, .LCPI189_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    addi 3, 3, .LCPI189_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
-; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xscvdpspn 34, 1
+; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    vperm 2, 2, 30, 31
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
 ; PC64LE-NEXT:    addi 1, 1, 80
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
@@ -8134,38 +6897,36 @@ define <3 x float> @constrained_vector_tan_v3f32(<3 x float> %x) #0 {
 ; PC64LE9-LABEL: constrained_vector_tan_v3f32:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
-; PC64LE9-NEXT:    stdu 1, -64(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    vmr 31, 2
+; PC64LE9-NEXT:    stdu 1, -80(1)
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE9-NEXT:    std 0, 96(1)
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    vmr 31, 2
 ; PC64LE9-NEXT:    bl tanf
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xxswapd 0, 63
-; PC64LE9-NEXT:    fmr 31, 1
-; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    bl tanf
-; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xscvdpspn 62, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    bl tanf
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI189_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI189_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
-; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
-; PC64LE9-NEXT:    addi 1, 1, 64
+; PC64LE9-NEXT:    lxv 61, 0(3)
+; PC64LE9-NEXT:    xxmrghw 62, 0, 62
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
+; PC64LE9-NEXT:    xscvspdpn 1, 0
+; PC64LE9-NEXT:    bl tanf
+; PC64LE9-NEXT:    nop
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    xxperm 34, 62, 61
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
 ; PC64LE9-NEXT:    blr
@@ -8181,65 +6942,63 @@ define <3 x double> @constrained_vector_tan_v3f64(<3 x double> %x) #0 {
 ; PC64LE-LABEL: constrained_vector_tan_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -80(1)
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    std 0, 96(1)
-; PC64LE-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    fmr 30, 2
-; PC64LE-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -64(1)
+; PC64LE-NEXT:    std 0, 80(1)
 ; PC64LE-NEXT:    fmr 31, 3
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    fmr 30, 2
 ; PC64LE-NEXT:    bl tan
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 29, 1
 ; PC64LE-NEXT:    fmr 1, 30
 ; PC64LE-NEXT:    bl tan
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 31
 ; PC64LE-NEXT:    bl tan
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 80
+; PC64LE-NEXT:    fmr 1, 29
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 64
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_tan_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -64(1)
 ; PC64LE9-NEXT:    std 0, 80(1)
-; PC64LE9-NEXT:    stfd 30, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 56(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 31, 3
 ; PC64LE9-NEXT:    fmr 30, 2
 ; PC64LE9-NEXT:    bl tan
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 29, 1
 ; PC64LE9-NEXT:    fmr 1, 30
 ; PC64LE9-NEXT:    bl tan
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 31
 ; PC64LE9-NEXT:    bl tan
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 29
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 64
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %tan = call <3 x double> @llvm.experimental.constrained.tan.v3f64(
@@ -8442,16 +7201,16 @@ define <3 x float> @constrained_vector_atan2_v3f32(<3 x float> %x, <3 x float> %
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
 ; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    xxsldwi 0, 34, 34, 1
-; PC64LE-NEXT:    xxsldwi 2, 35, 35, 1
+; PC64LE-NEXT:    xxsldwi 0, 34, 34, 3
+; PC64LE-NEXT:    xxsldwi 2, 35, 35, 3
 ; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
-; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    stxvd2x 61, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    stxvd2x 62, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    vmr 30, 2
 ; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
 ; PC64LE-NEXT:    vmr 31, 3
@@ -8459,33 +7218,31 @@ define <3 x float> @constrained_vector_atan2_v3f32(<3 x float> %x, <3 x float> %
 ; PC64LE-NEXT:    nop
 ; PC64LE-NEXT:    xxswapd 0, 62
 ; PC64LE-NEXT:    xxswapd 2, 63
-; PC64LE-NEXT:    fmr 31, 1
+; PC64LE-NEXT:    xscvdpspn 61, 1
 ; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    xscvspdpn 2, 2
 ; PC64LE-NEXT:    bl atan2f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE-NEXT:    xxsldwi 2, 63, 63, 3
-; PC64LE-NEXT:    fmr 30, 1
-; PC64LE-NEXT:    xscvspdpn 1, 0
+; PC64LE-NEXT:    xscvdpspn 0, 1
+; PC64LE-NEXT:    xxsldwi 2, 63, 63, 1
 ; PC64LE-NEXT:    xscvspdpn 2, 2
+; PC64LE-NEXT:    xxmrghw 61, 0, 61
+; PC64LE-NEXT:    xxsldwi 0, 62, 62, 1
+; PC64LE-NEXT:    xscvspdpn 1, 0
 ; PC64LE-NEXT:    bl atan2f
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xscvdpspn 0, 1
-; PC64LE-NEXT:    xscvdpspn 1, 30
 ; PC64LE-NEXT:    addis 3, 2, .LCPI194_0 at toc@ha
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xscvdpspn 36, 31
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    xscvdpspn 35, 1
 ; PC64LE-NEXT:    addi 3, 3, .LCPI194_0 at toc@l
-; PC64LE-NEXT:    xxmrghw 34, 1, 0
 ; PC64LE-NEXT:    lxvd2x 0, 0, 3
-; PC64LE-NEXT:    li 3, 64
+; PC64LE-NEXT:    li 3, 80
 ; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    li 3, 64
 ; PC64LE-NEXT:    lxvd2x 62, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    xxswapd 35, 0
-; PC64LE-NEXT:    vperm 2, 4, 2, 3
+; PC64LE-NEXT:    li 3, 48
+; PC64LE-NEXT:    xxswapd 34, 0
+; PC64LE-NEXT:    vperm 2, 3, 29, 2
+; PC64LE-NEXT:    lxvd2x 61, 1, 3 # 16-byte Folded Reload
 ; PC64LE-NEXT:    addi 1, 1, 96
 ; PC64LE-NEXT:    ld 0, 16(1)
 ; PC64LE-NEXT:    mtlr 0
@@ -8495,45 +7252,41 @@ define <3 x float> @constrained_vector_atan2_v3f32(<3 x float> %x, <3 x float> %
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
 ; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 1
+; PC64LE9-NEXT:    xxsldwi 0, 34, 34, 3
 ; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 62, 32(1) # 16-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 61, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 1
+; PC64LE9-NEXT:    xxsldwi 0, 35, 35, 3
+; PC64LE9-NEXT:    stxv 62, 48(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    stxv 63, 64(1) # 16-byte Folded Spill
+; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    vmr 31, 3
 ; PC64LE9-NEXT:    vmr 30, 2
-; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl atan2f
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    xxswapd 0, 62
-; PC64LE9-NEXT:    fmr 31, 1
+; PC64LE9-NEXT:    xscvdpspn 61, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
 ; PC64LE9-NEXT:    xxswapd 0, 63
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl atan2f
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 3
-; PC64LE9-NEXT:    fmr 30, 1
+; PC64LE9-NEXT:    xscvdpspn 0, 1
+; PC64LE9-NEXT:    xxmrghw 61, 0, 61
+; PC64LE9-NEXT:    xxsldwi 0, 62, 62, 1
 ; PC64LE9-NEXT:    xscvspdpn 1, 0
-; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 3
+; PC64LE9-NEXT:    xxsldwi 0, 63, 63, 1
 ; PC64LE9-NEXT:    xscvspdpn 2, 0
 ; PC64LE9-NEXT:    bl atan2f
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscvdpspn 0, 1
-; PC64LE9-NEXT:    xscvdpspn 1, 30
 ; PC64LE9-NEXT:    addis 3, 2, .LCPI194_0 at toc@ha
-; PC64LE9-NEXT:    xscvdpspn 34, 31
-; PC64LE9-NEXT:    lxv 63, 48(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lxv 62, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    xscvdpspn 34, 1
+; PC64LE9-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
+; PC64LE9-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
 ; PC64LE9-NEXT:    addi 3, 3, .LCPI194_0 at toc@l
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    xxmrghw 35, 1, 0
 ; PC64LE9-NEXT:    lxv 0, 0(3)
-; PC64LE9-NEXT:    xxperm 34, 35, 0
+; PC64LE9-NEXT:    xxperm 34, 61, 0
+; PC64LE9-NEXT:    lxv 61, 32(1) # 16-byte Folded Reload
 ; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
 ; PC64LE9-NEXT:    mtlr 0
@@ -8551,83 +7304,81 @@ define <3 x double> @constrained_vector_atan2_v3f64(<3 x double> %x, <3 x double
 ; PC64LE-LABEL: constrained_vector_atan2_v3f64:
 ; PC64LE:       # %bb.0: # %entry
 ; PC64LE-NEXT:    mflr 0
-; PC64LE-NEXT:    stdu 1, -96(1)
-; PC64LE-NEXT:    std 0, 112(1)
-; PC64LE-NEXT:    stfd 28, 64(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 27, -40(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 28, -32(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
+; PC64LE-NEXT:    stdu 1, -80(1)
 ; PC64LE-NEXT:    fmr 28, 2
 ; PC64LE-NEXT:    fmr 2, 4
-; PC64LE-NEXT:    li 3, 48
-; PC64LE-NEXT:    stfd 29, 72(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stfd 30, 80(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    fmr 30, 5
-; PC64LE-NEXT:    stfd 31, 88(1) # 8-byte Folded Spill
-; PC64LE-NEXT:    stxvd2x 63, 1, 3 # 16-byte Folded Spill
+; PC64LE-NEXT:    std 0, 96(1)
 ; PC64LE-NEXT:    fmr 31, 6
+; PC64LE-NEXT:    fmr 30, 5
 ; PC64LE-NEXT:    fmr 29, 3
 ; PC64LE-NEXT:    bl atan2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxlor 63, 1, 1
+; PC64LE-NEXT:    fmr 27, 1
 ; PC64LE-NEXT:    fmr 1, 28
 ; PC64LE-NEXT:    fmr 2, 30
 ; PC64LE-NEXT:    bl atan2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    xxmrghd 63, 1, 63
+; PC64LE-NEXT:    fmr 30, 1
 ; PC64LE-NEXT:    fmr 1, 29
 ; PC64LE-NEXT:    fmr 2, 31
 ; PC64LE-NEXT:    bl atan2
 ; PC64LE-NEXT:    nop
-; PC64LE-NEXT:    li 3, 48
 ; PC64LE-NEXT:    fmr 3, 1
-; PC64LE-NEXT:    xxswapd 1, 63
-; PC64LE-NEXT:    lfd 31, 88(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    xxlor 2, 63, 63
-; PC64LE-NEXT:    lfd 30, 80(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lfd 29, 72(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lfd 28, 64(1) # 8-byte Folded Reload
-; PC64LE-NEXT:    lxvd2x 63, 1, 3 # 16-byte Folded Reload
-; PC64LE-NEXT:    addi 1, 1, 96
+; PC64LE-NEXT:    fmr 1, 27
+; PC64LE-NEXT:    fmr 2, 30
+; PC64LE-NEXT:    addi 1, 1, 80
 ; PC64LE-NEXT:    ld 0, 16(1)
+; PC64LE-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    mtlr 0
+; PC64LE-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 28, -32(1) # 8-byte Folded Reload
+; PC64LE-NEXT:    lfd 27, -40(1) # 8-byte Folded Reload
 ; PC64LE-NEXT:    blr
 ;
 ; PC64LE9-LABEL: constrained_vector_atan2_v3f64:
 ; PC64LE9:       # %bb.0: # %entry
 ; PC64LE9-NEXT:    mflr 0
+; PC64LE9-NEXT:    stfd 27, -40(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 28, -32(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 29, -24(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 30, -16(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    stfd 31, -8(1) # 8-byte Folded Spill
 ; PC64LE9-NEXT:    stdu 1, -80(1)
-; PC64LE9-NEXT:    std 0, 96(1)
-; PC64LE9-NEXT:    stfd 28, 48(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stxv 63, 32(1) # 16-byte Folded Spill
 ; PC64LE9-NEXT:    fmr 28, 2
 ; PC64LE9-NEXT:    fmr 2, 4
-; PC64LE9-NEXT:    stfd 29, 56(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 30, 64(1) # 8-byte Folded Spill
-; PC64LE9-NEXT:    stfd 31, 72(1) # 8-byte Folded Spill
+; PC64LE9-NEXT:    std 0, 96(1)
 ; PC64LE9-NEXT:    fmr 31, 6
 ; PC64LE9-NEXT:    fmr 30, 5
 ; PC64LE9-NEXT:    fmr 29, 3
 ; PC64LE9-NEXT:    bl atan2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xscpsgndp 63, 1, 1
+; PC64LE9-NEXT:    fmr 27, 1
 ; PC64LE9-NEXT:    fmr 1, 28
 ; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    bl atan2
 ; PC64LE9-NEXT:    nop
-; PC64LE9-NEXT:    xxmrghd 63, 1, 63
+; PC64LE9-NEXT:    fmr 30, 1
 ; PC64LE9-NEXT:    fmr 1, 29
 ; PC64LE9-NEXT:    fmr 2, 31
 ; PC64LE9-NEXT:    bl atan2
 ; PC64LE9-NEXT:    nop
 ; PC64LE9-NEXT:    fmr 3, 1
-; PC64LE9-NEXT:    xxswapd 1, 63
-; PC64LE9-NEXT:    xscpsgndp 2, 63, 63
-; PC64LE9-NEXT:    lxv 63, 32(1) # 16-byte Folded Reload
-; PC64LE9-NEXT:    lfd 31, 72(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 30, 64(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 29, 56(1) # 8-byte Folded Reload
-; PC64LE9-NEXT:    lfd 28, 48(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    fmr 1, 27
+; PC64LE9-NEXT:    fmr 2, 30
 ; PC64LE9-NEXT:    addi 1, 1, 80
 ; PC64LE9-NEXT:    ld 0, 16(1)
+; PC64LE9-NEXT:    lfd 31, -8(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 30, -16(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    mtlr 0
+; PC64LE9-NEXT:    lfd 29, -24(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 28, -32(1) # 8-byte Folded Reload
+; PC64LE9-NEXT:    lfd 27, -40(1) # 8-byte Folded Reload
 ; PC64LE9-NEXT:    blr
 entry:
   %atan2 = call <3 x double> @llvm.experimental.constrained.atan2.v3f64(
diff --git a/llvm/test/CodeGen/RISCV/double-arith-strict.ll b/llvm/test/CodeGen/RISCV/double-arith-strict.ll
index 0071f3c168964..11215830d0a13 100644
--- a/llvm/test/CodeGen/RISCV/double-arith-strict.ll
+++ b/llvm/test/CodeGen/RISCV/double-arith-strict.ll
@@ -202,40 +202,19 @@ define double @fsqrt_d(double %a) nounwind strictfp {
 }
 
 define double @fmin_d(double %a, double %b) nounwind strictfp {
-; RV32IFD-LABEL: fmin_d:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call fmin
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fmin_d:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call fmin
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fmin_d:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    fmin.d fa0, fa0, fa1
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: fmin_d:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV32IZFINXZDINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    call fmin
-; RV32IZFINXZDINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV32IZFINXZDINX-NEXT:    fmin.d a0, a0, a2
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: fmin_d:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call fmin
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    fmin.d a0, a0, a1
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: fmin_d:
@@ -260,40 +239,19 @@ define double @fmin_d(double %a, double %b) nounwind strictfp {
 }
 
 define double @fmax_d(double %a, double %b) nounwind strictfp {
-; RV32IFD-LABEL: fmax_d:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call fmax
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fmax_d:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call fmax
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fmax_d:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    fmax.d fa0, fa0, fa1
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: fmax_d:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV32IZFINXZDINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    call fmax
-; RV32IZFINXZDINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV32IZFINXZDINX-NEXT:    fmax.d a0, a0, a2
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: fmax_d:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call fmax
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    fmax.d a0, a0, a1
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: fmax_d:
diff --git a/llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll b/llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll
index 53fcfa19725db..0861cc5c0bf95 100644
--- a/llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll
+++ b/llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll
@@ -58,12 +58,7 @@ define double @sqrt_f64(double %a) nounwind strictfp {
 define double @powi_f64(double %a, i32 %b) nounwind strictfp {
 ; RV32IFD-LABEL: powi_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call __powidf2
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail __powidf2
 ;
 ; RV64IFD-LABEL: powi_f64:
 ; RV64IFD:       # %bb.0:
@@ -117,23 +112,9 @@ define double @powi_f64(double %a, i32 %b) nounwind strictfp {
 }
 
 define double @sin_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: sin_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call sin
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: sin_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call sin
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: sin_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail sin
 ;
 ; RV32IZFINXZDINX-LABEL: sin_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -146,12 +127,7 @@ define double @sin_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: sin_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call sin
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail sin
 ;
 ; RV32I-LABEL: sin_f64:
 ; RV32I:       # %bb.0:
@@ -175,23 +151,9 @@ define double @sin_f64(double %a) nounwind strictfp {
 }
 
 define double @cos_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: cos_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call cos
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: cos_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call cos
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: cos_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail cos
 ;
 ; RV32IZFINXZDINX-LABEL: cos_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -204,12 +166,7 @@ define double @cos_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: cos_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call cos
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail cos
 ;
 ; RV32I-LABEL: cos_f64:
 ; RV32I:       # %bb.0:
@@ -367,23 +324,9 @@ define double @sincos_f64(double %a) nounwind strictfp {
 }
 
 define double @tan_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: tan_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call tan
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: tan_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call tan
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: tan_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail tan
 ;
 ; RV32IZFINXZDINX-LABEL: tan_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -396,12 +339,7 @@ define double @tan_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: tan_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call tan
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail tan
 ;
 ; RV32I-LABEL: tan_f64:
 ; RV32I:       # %bb.0:
@@ -425,23 +363,9 @@ define double @tan_f64(double %a) nounwind strictfp {
 }
 
 define double @asin_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: asin_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call asin
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: asin_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call asin
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: asin_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail asin
 ;
 ; RV32IZFINXZDINX-LABEL: asin_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -454,12 +378,7 @@ define double @asin_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: asin_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call asin
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail asin
 ;
 ; RV32I-LABEL: asin_f64:
 ; RV32I:       # %bb.0:
@@ -483,23 +402,9 @@ define double @asin_f64(double %a) nounwind strictfp {
 }
 
 define double @acos_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: acos_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call acos
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: acos_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call acos
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: acos_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail acos
 ;
 ; RV32IZFINXZDINX-LABEL: acos_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -512,12 +417,7 @@ define double @acos_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: acos_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call acos
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail acos
 ;
 ; RV32I-LABEL: acos_f64:
 ; RV32I:       # %bb.0:
@@ -541,23 +441,9 @@ define double @acos_f64(double %a) nounwind strictfp {
 }
 
 define double @atan_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: atan_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call atan
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: atan_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call atan
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: atan_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail atan
 ;
 ; RV32IZFINXZDINX-LABEL: atan_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -570,12 +456,7 @@ define double @atan_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: atan_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call atan
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail atan
 ;
 ; RV32I-LABEL: atan_f64:
 ; RV32I:       # %bb.0:
@@ -599,23 +480,9 @@ define double @atan_f64(double %a) nounwind strictfp {
 }
 
 define double @atan2_f64(double %a, double %b) nounwind strictfp {
-; RV32IFD-LABEL: atan2_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call atan2
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: atan2_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call atan2
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: atan2_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail atan2
 ;
 ; RV32IZFINXZDINX-LABEL: atan2_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -628,12 +495,7 @@ define double @atan2_f64(double %a, double %b) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: atan2_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call atan2
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail atan2
 ;
 ; RV32I-LABEL: atan2_f64:
 ; RV32I:       # %bb.0:
@@ -657,23 +519,9 @@ define double @atan2_f64(double %a, double %b) nounwind strictfp {
 }
 
 define double @sinh_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: sinh_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call sinh
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: sinh_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call sinh
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: sinh_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail sinh
 ;
 ; RV32IZFINXZDINX-LABEL: sinh_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -686,12 +534,7 @@ define double @sinh_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: sinh_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call sinh
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail sinh
 ;
 ; RV32I-LABEL: sinh_f64:
 ; RV32I:       # %bb.0:
@@ -715,23 +558,9 @@ define double @sinh_f64(double %a) nounwind strictfp {
 }
 
 define double @cosh_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: cosh_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call cosh
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: cosh_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call cosh
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: cosh_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail cosh
 ;
 ; RV32IZFINXZDINX-LABEL: cosh_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -744,12 +573,7 @@ define double @cosh_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: cosh_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call cosh
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail cosh
 ;
 ; RV32I-LABEL: cosh_f64:
 ; RV32I:       # %bb.0:
@@ -773,23 +597,9 @@ define double @cosh_f64(double %a) nounwind strictfp {
 }
 
 define double @tanh_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: tanh_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call tanh
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: tanh_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call tanh
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: tanh_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail tanh
 ;
 ; RV32IZFINXZDINX-LABEL: tanh_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -802,12 +612,7 @@ define double @tanh_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: tanh_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call tanh
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail tanh
 ;
 ; RV32I-LABEL: tanh_f64:
 ; RV32I:       # %bb.0:
@@ -831,23 +636,9 @@ define double @tanh_f64(double %a) nounwind strictfp {
 }
 
 define double @pow_f64(double %a, double %b) nounwind strictfp {
-; RV32IFD-LABEL: pow_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call pow
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: pow_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call pow
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: pow_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail pow
 ;
 ; RV32IZFINXZDINX-LABEL: pow_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -860,12 +651,7 @@ define double @pow_f64(double %a, double %b) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: pow_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call pow
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail pow
 ;
 ; RV32I-LABEL: pow_f64:
 ; RV32I:       # %bb.0:
@@ -889,23 +675,9 @@ define double @pow_f64(double %a, double %b) nounwind strictfp {
 }
 
 define double @exp_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: exp_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call exp
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: exp_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call exp
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: exp_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail exp
 ;
 ; RV32IZFINXZDINX-LABEL: exp_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -918,12 +690,7 @@ define double @exp_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: exp_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call exp
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail exp
 ;
 ; RV32I-LABEL: exp_f64:
 ; RV32I:       # %bb.0:
@@ -947,23 +714,9 @@ define double @exp_f64(double %a) nounwind strictfp {
 }
 
 define double @exp2_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: exp2_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call exp2
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: exp2_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call exp2
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: exp2_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail exp2
 ;
 ; RV32IZFINXZDINX-LABEL: exp2_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -976,12 +729,7 @@ define double @exp2_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: exp2_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call exp2
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail exp2
 ;
 ; RV32I-LABEL: exp2_f64:
 ; RV32I:       # %bb.0:
@@ -1005,23 +753,9 @@ define double @exp2_f64(double %a) nounwind strictfp {
 }
 
 define double @log_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: log_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call log
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: log_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call log
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: log_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail log
 ;
 ; RV32IZFINXZDINX-LABEL: log_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -1034,12 +768,7 @@ define double @log_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: log_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call log
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail log
 ;
 ; RV32I-LABEL: log_f64:
 ; RV32I:       # %bb.0:
@@ -1063,23 +792,9 @@ define double @log_f64(double %a) nounwind strictfp {
 }
 
 define double @log10_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: log10_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call log10
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: log10_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call log10
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: log10_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail log10
 ;
 ; RV32IZFINXZDINX-LABEL: log10_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -1092,12 +807,7 @@ define double @log10_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: log10_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call log10
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail log10
 ;
 ; RV32I-LABEL: log10_f64:
 ; RV32I:       # %bb.0:
@@ -1121,23 +831,9 @@ define double @log10_f64(double %a) nounwind strictfp {
 }
 
 define double @log2_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: log2_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call log2
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: log2_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call log2
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: log2_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail log2
 ;
 ; RV32IZFINXZDINX-LABEL: log2_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -1150,12 +846,7 @@ define double @log2_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: log2_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call log2
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail log2
 ;
 ; RV32I-LABEL: log2_f64:
 ; RV32I:       # %bb.0:
@@ -1267,40 +958,19 @@ define double @fmuladd_f64(double %a, double %b, double %c) nounwind strictfp {
 }
 
 define double @minnum_f64(double %a, double %b) nounwind strictfp {
-; RV32IFD-LABEL: minnum_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call fmin
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: minnum_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call fmin
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: minnum_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    fmin.d fa0, fa0, fa1
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: minnum_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV32IZFINXZDINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    call fmin
-; RV32IZFINXZDINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV32IZFINXZDINX-NEXT:    fmin.d a0, a0, a2
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: minnum_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call fmin
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    fmin.d a0, a0, a1
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: minnum_f64:
@@ -1325,40 +995,19 @@ define double @minnum_f64(double %a, double %b) nounwind strictfp {
 }
 
 define double @maxnum_f64(double %a, double %b) nounwind strictfp {
-; RV32IFD-LABEL: maxnum_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call fmax
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: maxnum_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call fmax
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: maxnum_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    fmax.d fa0, fa0, fa1
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: maxnum_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV32IZFINXZDINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINXZDINX-NEXT:    call fmax
-; RV32IZFINXZDINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV32IZFINXZDINX-NEXT:    fmax.d a0, a0, a2
 ; RV32IZFINXZDINX-NEXT:    ret
 ;
 ; RV64IZFINXZDINX-LABEL: maxnum_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call fmax
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    fmax.d a0, a0, a1
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: maxnum_f64:
@@ -1402,20 +1051,21 @@ define double @maxnum_f64(double %a, double %b) nounwind strictfp {
 define double @floor_f64(double %a) nounwind strictfp {
 ; RV32IFD-LABEL: floor_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call floor
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail floor
 ;
 ; RV64IFD-LABEL: floor_f64:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call floor
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
+; RV64IFD-NEXT:    li a0, 1075
+; RV64IFD-NEXT:    slli a0, a0, 52
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
+; RV64IFD-NEXT:    fabs.d fa4, fa0
+; RV64IFD-NEXT:    flt.d a0, fa4, fa5
+; RV64IFD-NEXT:    beqz a0, .LBB23_2
+; RV64IFD-NEXT:  # %bb.1:
+; RV64IFD-NEXT:    fcvt.l.d a0, fa0, rdn
+; RV64IFD-NEXT:    fcvt.d.l fa5, a0, rdn
+; RV64IFD-NEXT:    fsgnj.d fa0, fa5, fa0
+; RV64IFD-NEXT:  .LBB23_2:
 ; RV64IFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: floor_f64:
@@ -1429,11 +1079,16 @@ define double @floor_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: floor_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call floor
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    li a1, 1075
+; RV64IZFINXZDINX-NEXT:    slli a1, a1, 52
+; RV64IZFINXZDINX-NEXT:    fabs.d a2, a0
+; RV64IZFINXZDINX-NEXT:    flt.d a1, a2, a1
+; RV64IZFINXZDINX-NEXT:    beqz a1, .LBB23_2
+; RV64IZFINXZDINX-NEXT:  # %bb.1:
+; RV64IZFINXZDINX-NEXT:    fcvt.l.d a1, a0, rdn
+; RV64IZFINXZDINX-NEXT:    fcvt.d.l a1, a1, rdn
+; RV64IZFINXZDINX-NEXT:    fsgnj.d a0, a1, a0
+; RV64IZFINXZDINX-NEXT:  .LBB23_2:
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: floor_f64:
@@ -1460,20 +1115,21 @@ define double @floor_f64(double %a) nounwind strictfp {
 define double @ceil_f64(double %a) nounwind strictfp {
 ; RV32IFD-LABEL: ceil_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call ceil
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail ceil
 ;
 ; RV64IFD-LABEL: ceil_f64:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call ceil
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
+; RV64IFD-NEXT:    li a0, 1075
+; RV64IFD-NEXT:    slli a0, a0, 52
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
+; RV64IFD-NEXT:    fabs.d fa4, fa0
+; RV64IFD-NEXT:    flt.d a0, fa4, fa5
+; RV64IFD-NEXT:    beqz a0, .LBB24_2
+; RV64IFD-NEXT:  # %bb.1:
+; RV64IFD-NEXT:    fcvt.l.d a0, fa0, rup
+; RV64IFD-NEXT:    fcvt.d.l fa5, a0, rup
+; RV64IFD-NEXT:    fsgnj.d fa0, fa5, fa0
+; RV64IFD-NEXT:  .LBB24_2:
 ; RV64IFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: ceil_f64:
@@ -1487,11 +1143,16 @@ define double @ceil_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: ceil_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call ceil
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    li a1, 1075
+; RV64IZFINXZDINX-NEXT:    slli a1, a1, 52
+; RV64IZFINXZDINX-NEXT:    fabs.d a2, a0
+; RV64IZFINXZDINX-NEXT:    flt.d a1, a2, a1
+; RV64IZFINXZDINX-NEXT:    beqz a1, .LBB24_2
+; RV64IZFINXZDINX-NEXT:  # %bb.1:
+; RV64IZFINXZDINX-NEXT:    fcvt.l.d a1, a0, rup
+; RV64IZFINXZDINX-NEXT:    fcvt.d.l a1, a1, rup
+; RV64IZFINXZDINX-NEXT:    fsgnj.d a0, a1, a0
+; RV64IZFINXZDINX-NEXT:  .LBB24_2:
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: ceil_f64:
@@ -1518,20 +1179,21 @@ define double @ceil_f64(double %a) nounwind strictfp {
 define double @trunc_f64(double %a) nounwind strictfp {
 ; RV32IFD-LABEL: trunc_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call trunc
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail trunc
 ;
 ; RV64IFD-LABEL: trunc_f64:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call trunc
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
+; RV64IFD-NEXT:    li a0, 1075
+; RV64IFD-NEXT:    slli a0, a0, 52
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
+; RV64IFD-NEXT:    fabs.d fa4, fa0
+; RV64IFD-NEXT:    flt.d a0, fa4, fa5
+; RV64IFD-NEXT:    beqz a0, .LBB25_2
+; RV64IFD-NEXT:  # %bb.1:
+; RV64IFD-NEXT:    fcvt.l.d a0, fa0, rtz
+; RV64IFD-NEXT:    fcvt.d.l fa5, a0, rtz
+; RV64IFD-NEXT:    fsgnj.d fa0, fa5, fa0
+; RV64IFD-NEXT:  .LBB25_2:
 ; RV64IFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: trunc_f64:
@@ -1545,11 +1207,16 @@ define double @trunc_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: trunc_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call trunc
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    li a1, 1075
+; RV64IZFINXZDINX-NEXT:    slli a1, a1, 52
+; RV64IZFINXZDINX-NEXT:    fabs.d a2, a0
+; RV64IZFINXZDINX-NEXT:    flt.d a1, a2, a1
+; RV64IZFINXZDINX-NEXT:    beqz a1, .LBB25_2
+; RV64IZFINXZDINX-NEXT:  # %bb.1:
+; RV64IZFINXZDINX-NEXT:    fcvt.l.d a1, a0, rtz
+; RV64IZFINXZDINX-NEXT:    fcvt.d.l a1, a1, rtz
+; RV64IZFINXZDINX-NEXT:    fsgnj.d a0, a1, a0
+; RV64IZFINXZDINX-NEXT:  .LBB25_2:
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: trunc_f64:
@@ -1576,20 +1243,21 @@ define double @trunc_f64(double %a) nounwind strictfp {
 define double @rint_f64(double %a) nounwind strictfp {
 ; RV32IFD-LABEL: rint_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call rint
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail rint
 ;
 ; RV64IFD-LABEL: rint_f64:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call rint
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
+; RV64IFD-NEXT:    li a0, 1075
+; RV64IFD-NEXT:    slli a0, a0, 52
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
+; RV64IFD-NEXT:    fabs.d fa4, fa0
+; RV64IFD-NEXT:    flt.d a0, fa4, fa5
+; RV64IFD-NEXT:    beqz a0, .LBB26_2
+; RV64IFD-NEXT:  # %bb.1:
+; RV64IFD-NEXT:    fcvt.l.d a0, fa0
+; RV64IFD-NEXT:    fcvt.d.l fa5, a0
+; RV64IFD-NEXT:    fsgnj.d fa0, fa5, fa0
+; RV64IFD-NEXT:  .LBB26_2:
 ; RV64IFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: rint_f64:
@@ -1603,11 +1271,16 @@ define double @rint_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: rint_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call rint
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    li a1, 1075
+; RV64IZFINXZDINX-NEXT:    slli a1, a1, 52
+; RV64IZFINXZDINX-NEXT:    fabs.d a2, a0
+; RV64IZFINXZDINX-NEXT:    flt.d a1, a2, a1
+; RV64IZFINXZDINX-NEXT:    beqz a1, .LBB26_2
+; RV64IZFINXZDINX-NEXT:  # %bb.1:
+; RV64IZFINXZDINX-NEXT:    fcvt.l.d a1, a0
+; RV64IZFINXZDINX-NEXT:    fcvt.d.l a1, a1
+; RV64IZFINXZDINX-NEXT:    fsgnj.d a0, a1, a0
+; RV64IZFINXZDINX-NEXT:  .LBB26_2:
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: rint_f64:
@@ -1632,23 +1305,9 @@ define double @rint_f64(double %a) nounwind strictfp {
 }
 
 define double @nearbyint_f64(double %a) nounwind strictfp {
-; RV32IFD-LABEL: nearbyint_f64:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call nearbyint
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: nearbyint_f64:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call nearbyint
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: nearbyint_f64:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    tail nearbyint
 ;
 ; RV32IZFINXZDINX-LABEL: nearbyint_f64:
 ; RV32IZFINXZDINX:       # %bb.0:
@@ -1661,12 +1320,7 @@ define double @nearbyint_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: nearbyint_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call nearbyint
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
-; RV64IZFINXZDINX-NEXT:    ret
+; RV64IZFINXZDINX-NEXT:    tail nearbyint
 ;
 ; RV32I-LABEL: nearbyint_f64:
 ; RV32I:       # %bb.0:
@@ -1692,20 +1346,21 @@ define double @nearbyint_f64(double %a) nounwind strictfp {
 define double @round_f64(double %a) nounwind strictfp {
 ; RV32IFD-LABEL: round_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call round
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail round
 ;
 ; RV64IFD-LABEL: round_f64:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call round
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
+; RV64IFD-NEXT:    li a0, 1075
+; RV64IFD-NEXT:    slli a0, a0, 52
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
+; RV64IFD-NEXT:    fabs.d fa4, fa0
+; RV64IFD-NEXT:    flt.d a0, fa4, fa5
+; RV64IFD-NEXT:    beqz a0, .LBB28_2
+; RV64IFD-NEXT:  # %bb.1:
+; RV64IFD-NEXT:    fcvt.l.d a0, fa0, rmm
+; RV64IFD-NEXT:    fcvt.d.l fa5, a0, rmm
+; RV64IFD-NEXT:    fsgnj.d fa0, fa5, fa0
+; RV64IFD-NEXT:  .LBB28_2:
 ; RV64IFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: round_f64:
@@ -1719,11 +1374,16 @@ define double @round_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: round_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call round
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    li a1, 1075
+; RV64IZFINXZDINX-NEXT:    slli a1, a1, 52
+; RV64IZFINXZDINX-NEXT:    fabs.d a2, a0
+; RV64IZFINXZDINX-NEXT:    flt.d a1, a2, a1
+; RV64IZFINXZDINX-NEXT:    beqz a1, .LBB28_2
+; RV64IZFINXZDINX-NEXT:  # %bb.1:
+; RV64IZFINXZDINX-NEXT:    fcvt.l.d a1, a0, rmm
+; RV64IZFINXZDINX-NEXT:    fcvt.d.l a1, a1, rmm
+; RV64IZFINXZDINX-NEXT:    fsgnj.d a0, a1, a0
+; RV64IZFINXZDINX-NEXT:  .LBB28_2:
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: round_f64:
@@ -1750,20 +1410,21 @@ define double @round_f64(double %a) nounwind strictfp {
 define double @roundeven_f64(double %a) nounwind strictfp {
 ; RV32IFD-LABEL: roundeven_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call roundeven
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail roundeven
 ;
 ; RV64IFD-LABEL: roundeven_f64:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    addi sp, sp, -16
-; RV64IFD-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IFD-NEXT:    call roundeven
-; RV64IFD-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IFD-NEXT:    addi sp, sp, 16
+; RV64IFD-NEXT:    li a0, 1075
+; RV64IFD-NEXT:    slli a0, a0, 52
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
+; RV64IFD-NEXT:    fabs.d fa4, fa0
+; RV64IFD-NEXT:    flt.d a0, fa4, fa5
+; RV64IFD-NEXT:    beqz a0, .LBB29_2
+; RV64IFD-NEXT:  # %bb.1:
+; RV64IFD-NEXT:    fcvt.l.d a0, fa0, rne
+; RV64IFD-NEXT:    fcvt.d.l fa5, a0, rne
+; RV64IFD-NEXT:    fsgnj.d fa0, fa5, fa0
+; RV64IFD-NEXT:  .LBB29_2:
 ; RV64IFD-NEXT:    ret
 ;
 ; RV32IZFINXZDINX-LABEL: roundeven_f64:
@@ -1777,11 +1438,16 @@ define double @roundeven_f64(double %a) nounwind strictfp {
 ;
 ; RV64IZFINXZDINX-LABEL: roundeven_f64:
 ; RV64IZFINXZDINX:       # %bb.0:
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, -16
-; RV64IZFINXZDINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINXZDINX-NEXT:    call roundeven
-; RV64IZFINXZDINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINXZDINX-NEXT:    addi sp, sp, 16
+; RV64IZFINXZDINX-NEXT:    li a1, 1075
+; RV64IZFINXZDINX-NEXT:    slli a1, a1, 52
+; RV64IZFINXZDINX-NEXT:    fabs.d a2, a0
+; RV64IZFINXZDINX-NEXT:    flt.d a1, a2, a1
+; RV64IZFINXZDINX-NEXT:    beqz a1, .LBB29_2
+; RV64IZFINXZDINX-NEXT:  # %bb.1:
+; RV64IZFINXZDINX-NEXT:    fcvt.l.d a1, a0, rne
+; RV64IZFINXZDINX-NEXT:    fcvt.d.l a1, a1, rne
+; RV64IZFINXZDINX-NEXT:    fsgnj.d a0, a1, a0
+; RV64IZFINXZDINX-NEXT:  .LBB29_2:
 ; RV64IZFINXZDINX-NEXT:    ret
 ;
 ; RV32I-LABEL: roundeven_f64:
@@ -1992,12 +1658,7 @@ define i64 @llround_f64(double %a) nounwind strictfp {
 define double @ldexp_f64(double %x, i32 signext %y) nounwind {
 ; RV32IFD-LABEL: ldexp_f64:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    addi sp, sp, -16
-; RV32IFD-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IFD-NEXT:    call ldexp
-; RV32IFD-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IFD-NEXT:    addi sp, sp, 16
-; RV32IFD-NEXT:    ret
+; RV32IFD-NEXT:    tail ldexp
 ;
 ; RV64IFD-LABEL: ldexp_f64:
 ; RV64IFD:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/float-arith-strict.ll b/llvm/test/CodeGen/RISCV/float-arith-strict.ll
index 6a47c3f3c3926..abfa5edddacba 100644
--- a/llvm/test/CodeGen/RISCV/float-arith-strict.ll
+++ b/llvm/test/CodeGen/RISCV/float-arith-strict.ll
@@ -177,23 +177,10 @@ define float @fsqrt_s(float %a) nounwind strictfp {
 }
 
 define float @fmin_s(float %a, float %b) nounwind strictfp {
-; RV32IF-LABEL: fmin_s:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call fminf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fmin_s:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call fminf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fmin_s:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    fmin.s fa0, fa0, fa1
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fmin_s:
 ; RV32I:       # %bb.0:
@@ -213,45 +200,19 @@ define float @fmin_s(float %a, float %b) nounwind strictfp {
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
 ;
-; RV32IZFINX-LABEL: fmin_s:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call fminf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
-;
-; RV64IZFINX-LABEL: fmin_s:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call fminf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: fmin_s:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    fmin.s a0, a0, a1
+; CHECKIZFINX-NEXT:    ret
   %1 = call float @llvm.experimental.constrained.minnum.f32(float %a, float %b, metadata !"fpexcept.strict") strictfp
   ret float %1
 }
 
 define float @fmax_s(float %a, float %b) nounwind strictfp {
-; RV32IF-LABEL: fmax_s:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call fmaxf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fmax_s:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call fmaxf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fmax_s:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    fmax.s fa0, fa0, fa1
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fmax_s:
 ; RV32I:       # %bb.0:
@@ -271,23 +232,10 @@ define float @fmax_s(float %a, float %b) nounwind strictfp {
 ; RV64I-NEXT:    addi sp, sp, 16
 ; RV64I-NEXT:    ret
 ;
-; RV32IZFINX-LABEL: fmax_s:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call fmaxf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
-;
-; RV64IZFINX-LABEL: fmax_s:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call fmaxf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: fmax_s:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    fmax.s a0, a0, a1
+; CHECKIZFINX-NEXT:    ret
   %1 = call float @llvm.experimental.constrained.maxnum.f32(float %a, float %b, metadata !"fpexcept.strict") strictfp
   ret float %1
 }
@@ -668,3 +616,8 @@ define float @fnmsub_s_2(float %a, float %b, float %c) nounwind strictfp {
   %1 = call float @llvm.experimental.constrained.fma.f32(float %a, float %negb, float %c, metadata !"round.dynamic", metadata !"fpexcept.strict") strictfp
   ret float %1
 }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32IF: {{.*}}
+; RV32IZFINX: {{.*}}
+; RV64IF: {{.*}}
+; RV64IZFINX: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll b/llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll
index 3a4acfd8a41ee..dbf6da8ae20be 100644
--- a/llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll
+++ b/llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll
@@ -53,12 +53,7 @@ define float @sqrt_f32(float %a) nounwind strictfp {
 define float @powi_f32(float %a, i32 %b) nounwind strictfp {
 ; RV32IF-LABEL: powi_f32:
 ; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call __powisf2
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
+; RV32IF-NEXT:    tail __powisf2
 ;
 ; RV64IF-LABEL: powi_f32:
 ; RV64IF:       # %bb.0:
@@ -72,12 +67,7 @@ define float @powi_f32(float %a, i32 %b) nounwind strictfp {
 ;
 ; RV32IZFINX-LABEL: powi_f32:
 ; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call __powisf2
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; RV32IZFINX-NEXT:    tail __powisf2
 ;
 ; RV64IZFINX-LABEL: powi_f32:
 ; RV64IZFINX:       # %bb.0:
@@ -112,41 +102,13 @@ define float @powi_f32(float %a, i32 %b) nounwind strictfp {
 }
 
 define float @sin_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: sin_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call sinf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: sin_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call sinf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: sin_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call sinf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: sin_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail sinf
 ;
-; RV64IZFINX-LABEL: sin_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call sinf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: sin_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail sinf
 ;
 ; RV32I-LABEL: sin_f32:
 ; RV32I:       # %bb.0:
@@ -170,41 +132,13 @@ define float @sin_f32(float %a) nounwind strictfp {
 }
 
 define float @cos_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: cos_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call cosf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: cos_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call cosf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: cos_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call cosf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: cos_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail cosf
 ;
-; RV64IZFINX-LABEL: cos_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call cosf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: cos_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail cosf
 ;
 ; RV32I-LABEL: cos_f32:
 ; RV32I:       # %bb.0:
@@ -347,41 +281,13 @@ define float @sincos_f32(float %a) nounwind strictfp {
 }
 
 define float @tan_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: tan_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call tanf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: tan_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call tanf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: tan_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call tanf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: tan_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail tanf
 ;
-; RV64IZFINX-LABEL: tan_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call tanf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: tan_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail tanf
 ;
 ; RV32I-LABEL: tan_f32:
 ; RV32I:       # %bb.0:
@@ -405,41 +311,13 @@ define float @tan_f32(float %a) nounwind strictfp {
 }
 
 define float @asin_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: asin_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call asinf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: asin_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call asinf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: asin_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call asinf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: asin_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail asinf
 ;
-; RV64IZFINX-LABEL: asin_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call asinf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: asin_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail asinf
 ;
 ; RV32I-LABEL: asin_f32:
 ; RV32I:       # %bb.0:
@@ -463,41 +341,13 @@ define float @asin_f32(float %a) nounwind strictfp {
 }
 
 define float @acos_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: acos_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call acosf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: acos_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call acosf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: acos_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call acosf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: acos_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail acosf
 ;
-; RV64IZFINX-LABEL: acos_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call acosf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: acos_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail acosf
 ;
 ; RV32I-LABEL: acos_f32:
 ; RV32I:       # %bb.0:
@@ -521,41 +371,13 @@ define float @acos_f32(float %a) nounwind strictfp {
 }
 
 define float @atan_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: atan_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call atanf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: atan_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call atanf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: atan_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call atanf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: atan_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail atanf
 ;
-; RV64IZFINX-LABEL: atan_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call atanf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: atan_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail atanf
 ;
 ; RV32I-LABEL: atan_f32:
 ; RV32I:       # %bb.0:
@@ -579,41 +401,13 @@ define float @atan_f32(float %a) nounwind strictfp {
 }
 
 define float @atan2_f32(float %a, float %b) nounwind strictfp {
-; RV32IF-LABEL: atan2_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call atan2f
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: atan2_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call atan2f
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: atan2_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call atan2f
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: atan2_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail atan2f
 ;
-; RV64IZFINX-LABEL: atan2_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call atan2f
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: atan2_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail atan2f
 ;
 ; RV32I-LABEL: atan2_f32:
 ; RV32I:       # %bb.0:
@@ -637,41 +431,13 @@ define float @atan2_f32(float %a, float %b) nounwind strictfp {
 }
 
 define float @sinh_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: sinh_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call sinhf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: sinh_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call sinhf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: sinh_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call sinhf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: sinh_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail sinhf
 ;
-; RV64IZFINX-LABEL: sinh_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call sinhf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: sinh_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail sinhf
 ;
 ; RV32I-LABEL: sinh_f32:
 ; RV32I:       # %bb.0:
@@ -695,41 +461,13 @@ define float @sinh_f32(float %a) nounwind strictfp {
 }
 
 define float @cosh_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: cosh_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call coshf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: cosh_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call coshf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: cosh_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call coshf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: cosh_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail coshf
 ;
-; RV64IZFINX-LABEL: cosh_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call coshf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: cosh_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail coshf
 ;
 ; RV32I-LABEL: cosh_f32:
 ; RV32I:       # %bb.0:
@@ -753,41 +491,13 @@ define float @cosh_f32(float %a) nounwind strictfp {
 }
 
 define float @tanh_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: tanh_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call tanhf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: tanh_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call tanhf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: tanh_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call tanhf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: tanh_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail tanhf
 ;
-; RV64IZFINX-LABEL: tanh_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call tanhf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: tanh_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail tanhf
 ;
 ; RV32I-LABEL: tanh_f32:
 ; RV32I:       # %bb.0:
@@ -811,41 +521,13 @@ define float @tanh_f32(float %a) nounwind strictfp {
 }
 
 define float @pow_f32(float %a, float %b) nounwind strictfp {
-; RV32IF-LABEL: pow_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call powf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: pow_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call powf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: pow_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call powf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: pow_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail powf
 ;
-; RV64IZFINX-LABEL: pow_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call powf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: pow_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail powf
 ;
 ; RV32I-LABEL: pow_f32:
 ; RV32I:       # %bb.0:
@@ -869,41 +551,13 @@ define float @pow_f32(float %a, float %b) nounwind strictfp {
 }
 
 define float @exp_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: exp_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call expf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: exp_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call expf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: exp_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call expf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: exp_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail expf
 ;
-; RV64IZFINX-LABEL: exp_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call expf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: exp_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail expf
 ;
 ; RV32I-LABEL: exp_f32:
 ; RV32I:       # %bb.0:
@@ -927,41 +581,13 @@ define float @exp_f32(float %a) nounwind strictfp {
 }
 
 define float @exp2_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: exp2_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call exp2f
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: exp2_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call exp2f
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: exp2_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call exp2f
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: exp2_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail exp2f
 ;
-; RV64IZFINX-LABEL: exp2_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call exp2f
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: exp2_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail exp2f
 ;
 ; RV32I-LABEL: exp2_f32:
 ; RV32I:       # %bb.0:
@@ -985,41 +611,13 @@ define float @exp2_f32(float %a) nounwind strictfp {
 }
 
 define float @log_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: log_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call logf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: log_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call logf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: log_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call logf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: log_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail logf
 ;
-; RV64IZFINX-LABEL: log_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call logf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: log_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail logf
 ;
 ; RV32I-LABEL: log_f32:
 ; RV32I:       # %bb.0:
@@ -1043,41 +641,13 @@ define float @log_f32(float %a) nounwind strictfp {
 }
 
 define float @log10_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: log10_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call log10f
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: log10_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call log10f
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: log10_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call log10f
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: log10_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail log10f
 ;
-; RV64IZFINX-LABEL: log10_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call log10f
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: log10_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail log10f
 ;
 ; RV32I-LABEL: log10_f32:
 ; RV32I:       # %bb.0:
@@ -1101,41 +671,13 @@ define float @log10_f32(float %a) nounwind strictfp {
 }
 
 define float @log2_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: log2_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call log2f
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: log2_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call log2f
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: log2_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call log2f
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: log2_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail log2f
 ;
-; RV64IZFINX-LABEL: log2_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call log2f
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: log2_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail log2f
 ;
 ; RV32I-LABEL: log2_f32:
 ; RV32I:       # %bb.0:
@@ -1233,41 +775,15 @@ define float @fmuladd_f32(float %a, float %b, float %c) nounwind strictfp {
 }
 
 define float @minnum_f32(float %a, float %b) nounwind strictfp {
-; RV32IF-LABEL: minnum_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call fminf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: minnum_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call fminf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: minnum_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call fminf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: minnum_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    fmin.s fa0, fa0, fa1
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: minnum_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call fminf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: minnum_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    fmin.s a0, a0, a1
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: minnum_f32:
 ; RV32I:       # %bb.0:
@@ -1291,41 +807,15 @@ define float @minnum_f32(float %a, float %b) nounwind strictfp {
 }
 
 define float @maxnum_f32(float %a, float %b) nounwind strictfp {
-; RV32IF-LABEL: maxnum_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call fmaxf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: maxnum_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call fmaxf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: maxnum_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call fmaxf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: maxnum_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    fmax.s fa0, fa0, fa1
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: maxnum_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call fmaxf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: maxnum_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    fmax.s a0, a0, a1
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: maxnum_f32:
 ; RV32I:       # %bb.0:
@@ -1366,41 +856,32 @@ define float @maxnum_f32(float %a, float %b) nounwind strictfp {
 ; }
 
 define float @floor_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: floor_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call floorf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: floor_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call floorf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: floor_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call floorf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: floor_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lui a0, 307200
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
+; CHECKIF-NEXT:    fabs.s fa4, fa0
+; CHECKIF-NEXT:    flt.s a0, fa4, fa5
+; CHECKIF-NEXT:    beqz a0, .LBB23_2
+; CHECKIF-NEXT:  # %bb.1:
+; CHECKIF-NEXT:    fcvt.w.s a0, fa0, rdn
+; CHECKIF-NEXT:    fcvt.s.w fa5, a0, rdn
+; CHECKIF-NEXT:    fsgnj.s fa0, fa5, fa0
+; CHECKIF-NEXT:  .LBB23_2:
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: floor_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call floorf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: floor_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    lui a1, 307200
+; CHECKIZFINX-NEXT:    fabs.s a2, a0
+; CHECKIZFINX-NEXT:    flt.s a1, a2, a1
+; CHECKIZFINX-NEXT:    beqz a1, .LBB23_2
+; CHECKIZFINX-NEXT:  # %bb.1:
+; CHECKIZFINX-NEXT:    fcvt.w.s a1, a0, rdn
+; CHECKIZFINX-NEXT:    fcvt.s.w a1, a1, rdn
+; CHECKIZFINX-NEXT:    fsgnj.s a0, a1, a0
+; CHECKIZFINX-NEXT:  .LBB23_2:
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: floor_f32:
 ; RV32I:       # %bb.0:
@@ -1424,41 +905,32 @@ define float @floor_f32(float %a) nounwind strictfp {
 }
 
 define float @ceil_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: ceil_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call ceilf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: ceil_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call ceilf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: ceil_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call ceilf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: ceil_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lui a0, 307200
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
+; CHECKIF-NEXT:    fabs.s fa4, fa0
+; CHECKIF-NEXT:    flt.s a0, fa4, fa5
+; CHECKIF-NEXT:    beqz a0, .LBB24_2
+; CHECKIF-NEXT:  # %bb.1:
+; CHECKIF-NEXT:    fcvt.w.s a0, fa0, rup
+; CHECKIF-NEXT:    fcvt.s.w fa5, a0, rup
+; CHECKIF-NEXT:    fsgnj.s fa0, fa5, fa0
+; CHECKIF-NEXT:  .LBB24_2:
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: ceil_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call ceilf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: ceil_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    lui a1, 307200
+; CHECKIZFINX-NEXT:    fabs.s a2, a0
+; CHECKIZFINX-NEXT:    flt.s a1, a2, a1
+; CHECKIZFINX-NEXT:    beqz a1, .LBB24_2
+; CHECKIZFINX-NEXT:  # %bb.1:
+; CHECKIZFINX-NEXT:    fcvt.w.s a1, a0, rup
+; CHECKIZFINX-NEXT:    fcvt.s.w a1, a1, rup
+; CHECKIZFINX-NEXT:    fsgnj.s a0, a1, a0
+; CHECKIZFINX-NEXT:  .LBB24_2:
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: ceil_f32:
 ; RV32I:       # %bb.0:
@@ -1482,41 +954,32 @@ define float @ceil_f32(float %a) nounwind strictfp {
 }
 
 define float @trunc_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: trunc_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call truncf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: trunc_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call truncf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: trunc_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call truncf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: trunc_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lui a0, 307200
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
+; CHECKIF-NEXT:    fabs.s fa4, fa0
+; CHECKIF-NEXT:    flt.s a0, fa4, fa5
+; CHECKIF-NEXT:    beqz a0, .LBB25_2
+; CHECKIF-NEXT:  # %bb.1:
+; CHECKIF-NEXT:    fcvt.w.s a0, fa0, rtz
+; CHECKIF-NEXT:    fcvt.s.w fa5, a0, rtz
+; CHECKIF-NEXT:    fsgnj.s fa0, fa5, fa0
+; CHECKIF-NEXT:  .LBB25_2:
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: trunc_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call truncf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: trunc_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    lui a1, 307200
+; CHECKIZFINX-NEXT:    fabs.s a2, a0
+; CHECKIZFINX-NEXT:    flt.s a1, a2, a1
+; CHECKIZFINX-NEXT:    beqz a1, .LBB25_2
+; CHECKIZFINX-NEXT:  # %bb.1:
+; CHECKIZFINX-NEXT:    fcvt.w.s a1, a0, rtz
+; CHECKIZFINX-NEXT:    fcvt.s.w a1, a1, rtz
+; CHECKIZFINX-NEXT:    fsgnj.s a0, a1, a0
+; CHECKIZFINX-NEXT:  .LBB25_2:
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: trunc_f32:
 ; RV32I:       # %bb.0:
@@ -1540,41 +1003,32 @@ define float @trunc_f32(float %a) nounwind strictfp {
 }
 
 define float @rint_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: rint_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call rintf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: rint_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call rintf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: rint_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call rintf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: rint_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lui a0, 307200
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
+; CHECKIF-NEXT:    fabs.s fa4, fa0
+; CHECKIF-NEXT:    flt.s a0, fa4, fa5
+; CHECKIF-NEXT:    beqz a0, .LBB26_2
+; CHECKIF-NEXT:  # %bb.1:
+; CHECKIF-NEXT:    fcvt.w.s a0, fa0
+; CHECKIF-NEXT:    fcvt.s.w fa5, a0
+; CHECKIF-NEXT:    fsgnj.s fa0, fa5, fa0
+; CHECKIF-NEXT:  .LBB26_2:
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: rint_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call rintf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: rint_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    lui a1, 307200
+; CHECKIZFINX-NEXT:    fabs.s a2, a0
+; CHECKIZFINX-NEXT:    flt.s a1, a2, a1
+; CHECKIZFINX-NEXT:    beqz a1, .LBB26_2
+; CHECKIZFINX-NEXT:  # %bb.1:
+; CHECKIZFINX-NEXT:    fcvt.w.s a1, a0
+; CHECKIZFINX-NEXT:    fcvt.s.w a1, a1
+; CHECKIZFINX-NEXT:    fsgnj.s a0, a1, a0
+; CHECKIZFINX-NEXT:  .LBB26_2:
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: rint_f32:
 ; RV32I:       # %bb.0:
@@ -1598,41 +1052,13 @@ define float @rint_f32(float %a) nounwind strictfp {
 }
 
 define float @nearbyint_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: nearbyint_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call nearbyintf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: nearbyint_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call nearbyintf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: nearbyint_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call nearbyintf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: nearbyint_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    tail nearbyintf
 ;
-; RV64IZFINX-LABEL: nearbyint_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call nearbyintf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: nearbyint_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    tail nearbyintf
 ;
 ; RV32I-LABEL: nearbyint_f32:
 ; RV32I:       # %bb.0:
@@ -1656,41 +1082,32 @@ define float @nearbyint_f32(float %a) nounwind strictfp {
 }
 
 define float @round_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: round_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call roundf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: round_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call roundf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: round_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call roundf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: round_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lui a0, 307200
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
+; CHECKIF-NEXT:    fabs.s fa4, fa0
+; CHECKIF-NEXT:    flt.s a0, fa4, fa5
+; CHECKIF-NEXT:    beqz a0, .LBB28_2
+; CHECKIF-NEXT:  # %bb.1:
+; CHECKIF-NEXT:    fcvt.w.s a0, fa0, rmm
+; CHECKIF-NEXT:    fcvt.s.w fa5, a0, rmm
+; CHECKIF-NEXT:    fsgnj.s fa0, fa5, fa0
+; CHECKIF-NEXT:  .LBB28_2:
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: round_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call roundf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: round_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    lui a1, 307200
+; CHECKIZFINX-NEXT:    fabs.s a2, a0
+; CHECKIZFINX-NEXT:    flt.s a1, a2, a1
+; CHECKIZFINX-NEXT:    beqz a1, .LBB28_2
+; CHECKIZFINX-NEXT:  # %bb.1:
+; CHECKIZFINX-NEXT:    fcvt.w.s a1, a0, rmm
+; CHECKIZFINX-NEXT:    fcvt.s.w a1, a1, rmm
+; CHECKIZFINX-NEXT:    fsgnj.s a0, a1, a0
+; CHECKIZFINX-NEXT:  .LBB28_2:
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: round_f32:
 ; RV32I:       # %bb.0:
@@ -1714,41 +1131,32 @@ define float @round_f32(float %a) nounwind strictfp {
 }
 
 define float @roundeven_f32(float %a) nounwind strictfp {
-; RV32IF-LABEL: roundeven_f32:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call roundevenf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: roundeven_f32:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    addi sp, sp, -16
-; RV64IF-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IF-NEXT:    call roundevenf
-; RV64IF-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IF-NEXT:    addi sp, sp, 16
-; RV64IF-NEXT:    ret
-;
-; RV32IZFINX-LABEL: roundeven_f32:
-; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call roundevenf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; CHECKIF-LABEL: roundeven_f32:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lui a0, 307200
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
+; CHECKIF-NEXT:    fabs.s fa4, fa0
+; CHECKIF-NEXT:    flt.s a0, fa4, fa5
+; CHECKIF-NEXT:    beqz a0, .LBB29_2
+; CHECKIF-NEXT:  # %bb.1:
+; CHECKIF-NEXT:    fcvt.w.s a0, fa0, rne
+; CHECKIF-NEXT:    fcvt.s.w fa5, a0, rne
+; CHECKIF-NEXT:    fsgnj.s fa0, fa5, fa0
+; CHECKIF-NEXT:  .LBB29_2:
+; CHECKIF-NEXT:    ret
 ;
-; RV64IZFINX-LABEL: roundeven_f32:
-; RV64IZFINX:       # %bb.0:
-; RV64IZFINX-NEXT:    addi sp, sp, -16
-; RV64IZFINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFINX-NEXT:    call roundevenf
-; RV64IZFINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFINX-NEXT:    addi sp, sp, 16
-; RV64IZFINX-NEXT:    ret
+; CHECKIZFINX-LABEL: roundeven_f32:
+; CHECKIZFINX:       # %bb.0:
+; CHECKIZFINX-NEXT:    lui a1, 307200
+; CHECKIZFINX-NEXT:    fabs.s a2, a0
+; CHECKIZFINX-NEXT:    flt.s a1, a2, a1
+; CHECKIZFINX-NEXT:    beqz a1, .LBB29_2
+; CHECKIZFINX-NEXT:  # %bb.1:
+; CHECKIZFINX-NEXT:    fcvt.w.s a1, a0, rne
+; CHECKIZFINX-NEXT:    fcvt.s.w a1, a1, rne
+; CHECKIZFINX-NEXT:    fsgnj.s a0, a1, a0
+; CHECKIZFINX-NEXT:  .LBB29_2:
+; CHECKIZFINX-NEXT:    ret
 ;
 ; RV32I-LABEL: roundeven_f32:
 ; RV32I:       # %bb.0:
@@ -1958,12 +1366,7 @@ define i64 @llround_f32(float %a) nounwind strictfp {
 define float @ldexp_f32(float %x, i32 signext %y) nounwind {
 ; RV32IF-LABEL: ldexp_f32:
 ; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    addi sp, sp, -16
-; RV32IF-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IF-NEXT:    call ldexpf
-; RV32IF-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IF-NEXT:    addi sp, sp, 16
-; RV32IF-NEXT:    ret
+; RV32IF-NEXT:    tail ldexpf
 ;
 ; RV64IF-LABEL: ldexp_f32:
 ; RV64IF:       # %bb.0:
@@ -1976,12 +1379,7 @@ define float @ldexp_f32(float %x, i32 signext %y) nounwind {
 ;
 ; RV32IZFINX-LABEL: ldexp_f32:
 ; RV32IZFINX:       # %bb.0:
-; RV32IZFINX-NEXT:    addi sp, sp, -16
-; RV32IZFINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFINX-NEXT:    call ldexpf
-; RV32IZFINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFINX-NEXT:    addi sp, sp, 16
-; RV32IZFINX-NEXT:    ret
+; RV32IZFINX-NEXT:    tail ldexpf
 ;
 ; RV64IZFINX-LABEL: ldexp_f32:
 ; RV64IZFINX:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/half-convert-strict.ll b/llvm/test/CodeGen/RISCV/half-convert-strict.ll
index daeb75c31d614..d80e9da7cb44a 100644
--- a/llvm/test/CodeGen/RISCV/half-convert-strict.ll
+++ b/llvm/test/CodeGen/RISCV/half-convert-strict.ll
@@ -108,10 +108,6 @@ define i16 @fcvt_si_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    fcvt.w.s a0, fa0, rtz
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -124,22 +120,22 @@ define i16 @fcvt_si_h(half %a) nounwind strictfp {
 define i16 @fcvt_ui_h(half %a) nounwind strictfp {
 ; CHECK32-IZFH-LABEL: fcvt_ui_h:
 ; CHECK32-IZFH:       # %bb.0:
-; CHECK32-IZFH-NEXT:    fcvt.w.h a0, fa0, rtz
+; CHECK32-IZFH-NEXT:    fcvt.wu.h a0, fa0, rtz
 ; CHECK32-IZFH-NEXT:    ret
 ;
 ; CHECK64-IZFH-LABEL: fcvt_ui_h:
 ; CHECK64-IZFH:       # %bb.0:
-; CHECK64-IZFH-NEXT:    fcvt.l.h a0, fa0, rtz
+; CHECK64-IZFH-NEXT:    fcvt.lu.h a0, fa0, rtz
 ; CHECK64-IZFH-NEXT:    ret
 ;
 ; CHECK32-IZHINX-LABEL: fcvt_ui_h:
 ; CHECK32-IZHINX:       # %bb.0:
-; CHECK32-IZHINX-NEXT:    fcvt.w.h a0, a0, rtz
+; CHECK32-IZHINX-NEXT:    fcvt.wu.h a0, a0, rtz
 ; CHECK32-IZHINX-NEXT:    ret
 ;
 ; CHECK64-IZHINX-LABEL: fcvt_ui_h:
 ; CHECK64-IZHINX:       # %bb.0:
-; CHECK64-IZHINX-NEXT:    fcvt.l.h a0, a0, rtz
+; CHECK64-IZHINX-NEXT:    fcvt.lu.h a0, a0, rtz
 ; CHECK64-IZHINX-NEXT:    ret
 ;
 ; CHECK32-IZFHMIN-LABEL: fcvt_ui_h:
@@ -170,12 +166,8 @@ define i16 @fcvt_ui_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
-; CHECK32-D-NEXT:    fcvt.w.s a0, fa0, rtz
+; CHECK32-D-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; CHECK32-D-NEXT:    addi sp, sp, 16
 ; CHECK32-D-NEXT:    ret
@@ -232,10 +224,6 @@ define i32 @fcvt_w_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    fcvt.w.s a0, fa0, rtz
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -294,10 +282,6 @@ define i32 @fcvt_wu_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -377,10 +361,6 @@ define i32 @fcvt_wu_h_multiple_use(half %x, ptr %y) strictfp {
 ; CHECK32-D-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; CHECK32-D-NEXT:    .cfi_offset ra, -4
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; CHECK32-D-NEXT:    seqz a1, a0
@@ -459,10 +439,6 @@ define i64 @fcvt_l_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    call __fixsfdi
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -535,10 +511,6 @@ define i64 @fcvt_lu_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    call __fixunssfdi
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -1354,10 +1326,6 @@ define float @fcvt_s_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; CHECK32-D-NEXT:    addi sp, sp, 16
@@ -1580,10 +1548,6 @@ define double @fcvt_d_h(half %a) nounwind strictfp {
 ; CHECK32-D:       # %bb.0:
 ; CHECK32-D-NEXT:    addi sp, sp, -16
 ; CHECK32-D-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    fcvt.d.s fa0, fa0
 ; CHECK32-D-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
@@ -2050,10 +2014,6 @@ define fp128 @fcvt_q_h(half %a) nounwind strictfp {
 ; CHECK32-D-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
 ; CHECK32-D-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
 ; CHECK32-D-NEXT:    mv s0, a0
-; CHECK32-D-NEXT:    fmv.x.w a0, fa0
-; CHECK32-D-NEXT:    slli a0, a0, 16
-; CHECK32-D-NEXT:    srli a0, a0, 16
-; CHECK32-D-NEXT:    fmv.w.x fa0, a0
 ; CHECK32-D-NEXT:    call __extendhfsf2
 ; CHECK32-D-NEXT:    addi a0, sp, 8
 ; CHECK32-D-NEXT:    call __extendsftf2
diff --git a/llvm/test/CodeGen/RISCV/rvv/fceil-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fceil-constrained-sdnode.ll
index 1f4eaea90628b..33e18b23ab6e2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fceil-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fceil-constrained-sdnode.ll
@@ -7,16 +7,13 @@
 define <vscale x 1 x half> @ceil_nxv1f16(<vscale x 1 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -30,16 +27,13 @@ define <vscale x 1 x half> @ceil_nxv1f16(<vscale x 1 x half> %x) strictfp {
 define <vscale x 2 x half> @ceil_nxv2f16(<vscale x 2 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -53,16 +47,13 @@ define <vscale x 2 x half> @ceil_nxv2f16(<vscale x 2 x half> %x) strictfp {
 define <vscale x 4 x half> @ceil_nxv4f16(<vscale x 4 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -76,16 +67,13 @@ define <vscale x 4 x half> @ceil_nxv4f16(<vscale x 4 x half> %x) strictfp {
 define <vscale x 8 x half> @ceil_nxv8f16(<vscale x 8 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -99,16 +87,13 @@ define <vscale x 8 x half> @ceil_nxv8f16(<vscale x 8 x half> %x) strictfp {
 define <vscale x 16 x half> @ceil_nxv16f16(<vscale x 16 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -122,16 +107,13 @@ define <vscale x 16 x half> @ceil_nxv16f16(<vscale x 16 x half> %x) strictfp {
 define <vscale x 32 x half> @ceil_nxv32f16(<vscale x 32 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv32f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -145,15 +127,12 @@ define <vscale x 32 x half> @ceil_nxv32f16(<vscale x 32 x half> %x) strictfp {
 define <vscale x 1 x float> @ceil_nxv1f32(<vscale x 1 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -167,15 +146,12 @@ define <vscale x 1 x float> @ceil_nxv1f32(<vscale x 1 x float> %x) strictfp {
 define <vscale x 2 x float> @ceil_nxv2f32(<vscale x 2 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -189,15 +165,12 @@ define <vscale x 2 x float> @ceil_nxv2f32(<vscale x 2 x float> %x) strictfp {
 define <vscale x 4 x float> @ceil_nxv4f32(<vscale x 4 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -211,15 +184,12 @@ define <vscale x 4 x float> @ceil_nxv4f32(<vscale x 4 x float> %x) strictfp {
 define <vscale x 8 x float> @ceil_nxv8f32(<vscale x 8 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -233,15 +203,12 @@ define <vscale x 8 x float> @ceil_nxv8f32(<vscale x 8 x float> %x) strictfp {
 define <vscale x 16 x float> @ceil_nxv16f32(<vscale x 16 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_nxv16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -255,15 +222,12 @@ define <vscale x 16 x float> @ceil_nxv16f32(<vscale x 16 x float> %x) strictfp {
 define <vscale x 1 x double> @ceil_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ; RV32-LABEL: ceil_nxv1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -273,16 +237,13 @@ define <vscale x 1 x double> @ceil_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_nxv1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -296,15 +257,12 @@ define <vscale x 1 x double> @ceil_nxv1f64(<vscale x 1 x double> %x) strictfp {
 define <vscale x 2 x double> @ceil_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ; RV32-LABEL: ceil_nxv2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -314,16 +272,13 @@ define <vscale x 2 x double> @ceil_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_nxv2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -337,15 +292,12 @@ define <vscale x 2 x double> @ceil_nxv2f64(<vscale x 2 x double> %x) strictfp {
 define <vscale x 4 x double> @ceil_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ; RV32-LABEL: ceil_nxv4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -355,16 +307,13 @@ define <vscale x 4 x double> @ceil_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_nxv4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -378,15 +327,12 @@ define <vscale x 4 x double> @ceil_nxv4f64(<vscale x 4 x double> %x) strictfp {
 define <vscale x 8 x double> @ceil_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ; RV32-LABEL: ceil_nxv8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfabs.v v16, v8
 ; RV32-NEXT:    vmflt.vf v0, v16, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -396,16 +342,13 @@ define <vscale x 8 x double> @ceil_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_nxv8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v16, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v16, v16, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/ffloor-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/ffloor-constrained-sdnode.ll
index 3a7de21c14390..23af0e6233076 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ffloor-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ffloor-constrained-sdnode.ll
@@ -7,16 +7,13 @@
 define <vscale x 1 x half> @floor_nxv1f16(<vscale x 1 x half> %x) strictfp {
 ; CHECK-LABEL: floor_nxv1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -30,16 +27,13 @@ define <vscale x 1 x half> @floor_nxv1f16(<vscale x 1 x half> %x) strictfp {
 define <vscale x 2 x half> @floor_nxv2f16(<vscale x 2 x half> %x) strictfp {
 ; CHECK-LABEL: floor_nxv2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -53,16 +47,13 @@ define <vscale x 2 x half> @floor_nxv2f16(<vscale x 2 x half> %x) strictfp {
 define <vscale x 4 x half> @floor_nxv4f16(<vscale x 4 x half> %x) strictfp {
 ; CHECK-LABEL: floor_nxv4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -76,16 +67,13 @@ define <vscale x 4 x half> @floor_nxv4f16(<vscale x 4 x half> %x) strictfp {
 define <vscale x 8 x half> @floor_nxv8f16(<vscale x 8 x half> %x) strictfp {
 ; CHECK-LABEL: floor_nxv8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -99,16 +87,13 @@ define <vscale x 8 x half> @floor_nxv8f16(<vscale x 8 x half> %x) strictfp {
 define <vscale x 16 x half> @floor_nxv16f16(<vscale x 16 x half> %x) strictfp {
 ; CHECK-LABEL: floor_nxv16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -122,16 +107,13 @@ define <vscale x 16 x half> @floor_nxv16f16(<vscale x 16 x half> %x) strictfp {
 define <vscale x 32 x half> @floor_nxv32f16(<vscale x 32 x half> %x) strictfp {
 ; CHECK-LABEL: floor_nxv32f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -145,15 +127,12 @@ define <vscale x 32 x half> @floor_nxv32f16(<vscale x 32 x half> %x) strictfp {
 define <vscale x 1 x float> @floor_nxv1f32(<vscale x 1 x float> %x) strictfp {
 ; CHECK-LABEL: floor_nxv1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -167,15 +146,12 @@ define <vscale x 1 x float> @floor_nxv1f32(<vscale x 1 x float> %x) strictfp {
 define <vscale x 2 x float> @floor_nxv2f32(<vscale x 2 x float> %x) strictfp {
 ; CHECK-LABEL: floor_nxv2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -189,15 +165,12 @@ define <vscale x 2 x float> @floor_nxv2f32(<vscale x 2 x float> %x) strictfp {
 define <vscale x 4 x float> @floor_nxv4f32(<vscale x 4 x float> %x) strictfp {
 ; CHECK-LABEL: floor_nxv4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -211,15 +184,12 @@ define <vscale x 4 x float> @floor_nxv4f32(<vscale x 4 x float> %x) strictfp {
 define <vscale x 8 x float> @floor_nxv8f32(<vscale x 8 x float> %x) strictfp {
 ; CHECK-LABEL: floor_nxv8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -233,15 +203,12 @@ define <vscale x 8 x float> @floor_nxv8f32(<vscale x 8 x float> %x) strictfp {
 define <vscale x 16 x float> @floor_nxv16f32(<vscale x 16 x float> %x) strictfp {
 ; CHECK-LABEL: floor_nxv16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -255,15 +222,12 @@ define <vscale x 16 x float> @floor_nxv16f32(<vscale x 16 x float> %x) strictfp
 define <vscale x 1 x double> @floor_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ; RV32-LABEL: floor_nxv1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -273,16 +237,13 @@ define <vscale x 1 x double> @floor_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_nxv1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -296,15 +257,12 @@ define <vscale x 1 x double> @floor_nxv1f64(<vscale x 1 x double> %x) strictfp {
 define <vscale x 2 x double> @floor_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ; RV32-LABEL: floor_nxv2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -314,16 +272,13 @@ define <vscale x 2 x double> @floor_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_nxv2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -337,15 +292,12 @@ define <vscale x 2 x double> @floor_nxv2f64(<vscale x 2 x double> %x) strictfp {
 define <vscale x 4 x double> @floor_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ; RV32-LABEL: floor_nxv4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -355,16 +307,13 @@ define <vscale x 4 x double> @floor_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_nxv4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -378,15 +327,12 @@ define <vscale x 4 x double> @floor_nxv4f64(<vscale x 4 x double> %x) strictfp {
 define <vscale x 8 x double> @floor_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ; RV32-LABEL: floor_nxv8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfabs.v v16, v8
 ; RV32-NEXT:    vmflt.vf v0, v16, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -396,16 +342,13 @@ define <vscale x 8 x double> @floor_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_nxv8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v16, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v16, v16, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fceil-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fceil-constrained-sdnode.ll
index 22aef4899a6c2..403811383ec73 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fceil-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fceil-constrained-sdnode.ll
@@ -7,16 +7,13 @@
 define <1 x half> @ceil_v1f16(<1 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_v1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -30,16 +27,13 @@ define <1 x half> @ceil_v1f16(<1 x half> %x) strictfp {
 define <2 x half> @ceil_v2f16(<2 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_v2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -53,16 +47,13 @@ define <2 x half> @ceil_v2f16(<2 x half> %x) strictfp {
 define <4 x half> @ceil_v4f16(<4 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -76,16 +67,13 @@ define <4 x half> @ceil_v4f16(<4 x half> %x) strictfp {
 define <8 x half> @ceil_v8f16(<8 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_v8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -99,16 +87,13 @@ define <8 x half> @ceil_v8f16(<8 x half> %x) strictfp {
 define <16 x half> @ceil_v16f16(<16 x half> %x) strictfp {
 ; CHECK-LABEL: ceil_v16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -124,15 +109,12 @@ define <32 x half> @ceil_v32f16(<32 x half> %x) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    slli a1, a1, 10
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.h.x fa5, a1
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -146,15 +128,12 @@ define <32 x half> @ceil_v32f16(<32 x half> %x) strictfp {
 define <1 x float> @ceil_v1f32(<1 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_v1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -168,15 +147,12 @@ define <1 x float> @ceil_v1f32(<1 x float> %x) strictfp {
 define <2 x float> @ceil_v2f32(<2 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_v2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -190,15 +166,12 @@ define <2 x float> @ceil_v2f32(<2 x float> %x) strictfp {
 define <4 x float> @ceil_v4f32(<4 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -212,15 +185,12 @@ define <4 x float> @ceil_v4f32(<4 x float> %x) strictfp {
 define <8 x float> @ceil_v8f32(<8 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -234,15 +204,12 @@ define <8 x float> @ceil_v8f32(<8 x float> %x) strictfp {
 define <16 x float> @ceil_v16f32(<16 x float> %x) strictfp {
 ; CHECK-LABEL: ceil_v16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 3
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -256,15 +223,12 @@ define <16 x float> @ceil_v16f32(<16 x float> %x) strictfp {
 define <1 x double> @ceil_v1f64(<1 x double> %x) strictfp {
 ; RV32-LABEL: ceil_v1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -274,16 +238,13 @@ define <1 x double> @ceil_v1f64(<1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_v1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -297,15 +258,12 @@ define <1 x double> @ceil_v1f64(<1 x double> %x) strictfp {
 define <2 x double> @ceil_v2f64(<2 x double> %x) strictfp {
 ; RV32-LABEL: ceil_v2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -315,16 +273,13 @@ define <2 x double> @ceil_v2f64(<2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_v2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -338,15 +293,12 @@ define <2 x double> @ceil_v2f64(<2 x double> %x) strictfp {
 define <4 x double> @ceil_v4f64(<4 x double> %x) strictfp {
 ; RV32-LABEL: ceil_v4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -356,16 +308,13 @@ define <4 x double> @ceil_v4f64(<4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_v4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -379,15 +328,12 @@ define <4 x double> @ceil_v4f64(<4 x double> %x) strictfp {
 define <8 x double> @ceil_v8f64(<8 x double> %x) strictfp {
 ; RV32-LABEL: ceil_v8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 3
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -397,16 +343,13 @@ define <8 x double> @ceil_v8f64(<8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: ceil_v8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 3
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ffloor-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ffloor-constrained-sdnode.ll
index 511382cf5436e..38c945a7c6959 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ffloor-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ffloor-constrained-sdnode.ll
@@ -7,16 +7,13 @@
 define <1 x half> @floor_v1f16(<1 x half> %x) strictfp {
 ; CHECK-LABEL: floor_v1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -30,16 +27,13 @@ define <1 x half> @floor_v1f16(<1 x half> %x) strictfp {
 define <2 x half> @floor_v2f16(<2 x half> %x) strictfp {
 ; CHECK-LABEL: floor_v2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -53,16 +47,13 @@ define <2 x half> @floor_v2f16(<2 x half> %x) strictfp {
 define <4 x half> @floor_v4f16(<4 x half> %x) strictfp {
 ; CHECK-LABEL: floor_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -76,16 +67,13 @@ define <4 x half> @floor_v4f16(<4 x half> %x) strictfp {
 define <8 x half> @floor_v8f16(<8 x half> %x) strictfp {
 ; CHECK-LABEL: floor_v8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -99,16 +87,13 @@ define <8 x half> @floor_v8f16(<8 x half> %x) strictfp {
 define <16 x half> @floor_v16f16(<16 x half> %x) strictfp {
 ; CHECK-LABEL: floor_v16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -124,15 +109,12 @@ define <32 x half> @floor_v32f16(<32 x half> %x) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    slli a1, a1, 10
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.h.x fa5, a1
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -146,15 +128,12 @@ define <32 x half> @floor_v32f16(<32 x half> %x) strictfp {
 define <1 x float> @floor_v1f32(<1 x float> %x) strictfp {
 ; CHECK-LABEL: floor_v1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -168,15 +147,12 @@ define <1 x float> @floor_v1f32(<1 x float> %x) strictfp {
 define <2 x float> @floor_v2f32(<2 x float> %x) strictfp {
 ; CHECK-LABEL: floor_v2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -190,15 +166,12 @@ define <2 x float> @floor_v2f32(<2 x float> %x) strictfp {
 define <4 x float> @floor_v4f32(<4 x float> %x) strictfp {
 ; CHECK-LABEL: floor_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -212,15 +185,12 @@ define <4 x float> @floor_v4f32(<4 x float> %x) strictfp {
 define <8 x float> @floor_v8f32(<8 x float> %x) strictfp {
 ; CHECK-LABEL: floor_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -234,15 +204,12 @@ define <8 x float> @floor_v8f32(<8 x float> %x) strictfp {
 define <16 x float> @floor_v16f32(<16 x float> %x) strictfp {
 ; CHECK-LABEL: floor_v16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 2
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -256,15 +223,12 @@ define <16 x float> @floor_v16f32(<16 x float> %x) strictfp {
 define <1 x double> @floor_v1f64(<1 x double> %x) strictfp {
 ; RV32-LABEL: floor_v1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -274,16 +238,13 @@ define <1 x double> @floor_v1f64(<1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_v1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -297,15 +258,12 @@ define <1 x double> @floor_v1f64(<1 x double> %x) strictfp {
 define <2 x double> @floor_v2f64(<2 x double> %x) strictfp {
 ; RV32-LABEL: floor_v2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -315,16 +273,13 @@ define <2 x double> @floor_v2f64(<2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_v2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -338,15 +293,12 @@ define <2 x double> @floor_v2f64(<2 x double> %x) strictfp {
 define <4 x double> @floor_v4f64(<4 x double> %x) strictfp {
 ; RV32-LABEL: floor_v4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -356,16 +308,13 @@ define <4 x double> @floor_v4f64(<4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_v4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -379,15 +328,12 @@ define <4 x double> @floor_v4f64(<4 x double> %x) strictfp {
 define <8 x double> @floor_v8f64(<8 x double> %x) strictfp {
 ; RV32-LABEL: floor_v8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 2
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -397,16 +343,13 @@ define <8 x double> @floor_v8f64(<8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: floor_v8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 2
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fnearbyint-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fnearbyint-constrained-sdnode.ll
index 8485eb8ac1caa..d368e200a9425 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fnearbyint-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fnearbyint-constrained-sdnode.ll
@@ -7,16 +7,13 @@
 define <2 x half> @nearbyint_v2f16(<2 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, mu
@@ -30,16 +27,13 @@ define <2 x half> @nearbyint_v2f16(<2 x half> %v) strictfp {
 define <4 x half> @nearbyint_v4f16(<4 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, mu
@@ -53,16 +47,13 @@ define <4 x half> @nearbyint_v4f16(<4 x half> %v) strictfp {
 define <8 x half> @nearbyint_v8f16(<8 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, mu
@@ -76,16 +67,13 @@ define <8 x half> @nearbyint_v8f16(<8 x half> %v) strictfp {
 define <16 x half> @nearbyint_v16f16(<16 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, mu
@@ -101,15 +89,12 @@ define <32 x half> @nearbyint_v32f16(<32 x half> %v) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    slli a1, a1, 10
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.h.x fa5, a1
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, mu
@@ -123,15 +108,12 @@ define <32 x half> @nearbyint_v32f16(<32 x half> %v) strictfp {
 define <2 x float> @nearbyint_v2f32(<2 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
@@ -145,15 +127,12 @@ define <2 x float> @nearbyint_v2f32(<2 x float> %v) strictfp {
 define <4 x float> @nearbyint_v4f32(<4 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
@@ -167,15 +146,12 @@ define <4 x float> @nearbyint_v4f32(<4 x float> %v) strictfp {
 define <8 x float> @nearbyint_v8f32(<8 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
@@ -189,15 +165,12 @@ define <8 x float> @nearbyint_v8f32(<8 x float> %v) strictfp {
 define <16 x float> @nearbyint_v16f32(<16 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_v16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
@@ -211,15 +184,12 @@ define <16 x float> @nearbyint_v16f32(<16 x float> %v) strictfp {
 define <2 x double> @nearbyint_v2f64(<2 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_v2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI9_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI9_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -229,16 +199,13 @@ define <2 x double> @nearbyint_v2f64(<2 x double> %v) strictfp {
 ;
 ; RV64-LABEL: nearbyint_v2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -252,15 +219,12 @@ define <2 x double> @nearbyint_v2f64(<2 x double> %v) strictfp {
 define <4 x double> @nearbyint_v4f64(<4 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_v4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI10_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI10_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -270,16 +234,13 @@ define <4 x double> @nearbyint_v4f64(<4 x double> %v) strictfp {
 ;
 ; RV64-LABEL: nearbyint_v4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -293,15 +254,12 @@ define <4 x double> @nearbyint_v4f64(<4 x double> %v) strictfp {
 define <8 x double> @nearbyint_v8f64(<8 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_v8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
@@ -311,16 +269,13 @@ define <8 x double> @nearbyint_v8f64(<8 x double> %v) strictfp {
 ;
 ; RV64-LABEL: nearbyint_v8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround-constrained-sdnode.ll
index ad56aee72a432..24e261993077d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fround-constrained-sdnode.ll
@@ -9,16 +9,13 @@
 define <1 x half> @round_v1f16(<1 x half> %x) strictfp {
 ; CHECK-LABEL: round_v1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -32,16 +29,13 @@ define <1 x half> @round_v1f16(<1 x half> %x) strictfp {
 define <2 x half> @round_v2f16(<2 x half> %x) strictfp {
 ; CHECK-LABEL: round_v2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -55,16 +49,13 @@ define <2 x half> @round_v2f16(<2 x half> %x) strictfp {
 define <4 x half> @round_v4f16(<4 x half> %x) strictfp {
 ; CHECK-LABEL: round_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -78,16 +69,13 @@ define <4 x half> @round_v4f16(<4 x half> %x) strictfp {
 define <8 x half> @round_v8f16(<8 x half> %x) strictfp {
 ; CHECK-LABEL: round_v8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -101,16 +89,13 @@ define <8 x half> @round_v8f16(<8 x half> %x) strictfp {
 define <16 x half> @round_v16f16(<16 x half> %x) strictfp {
 ; CHECK-LABEL: round_v16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -126,15 +111,12 @@ define <32 x half> @round_v32f16(<32 x half> %x) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    slli a1, a1, 10
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.h.x fa5, a1
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -148,15 +130,12 @@ define <32 x half> @round_v32f16(<32 x half> %x) strictfp {
 define <1 x float> @round_v1f32(<1 x float> %x) strictfp {
 ; CHECK-LABEL: round_v1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -170,15 +149,12 @@ define <1 x float> @round_v1f32(<1 x float> %x) strictfp {
 define <2 x float> @round_v2f32(<2 x float> %x) strictfp {
 ; CHECK-LABEL: round_v2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -192,15 +168,12 @@ define <2 x float> @round_v2f32(<2 x float> %x) strictfp {
 define <4 x float> @round_v4f32(<4 x float> %x) strictfp {
 ; CHECK-LABEL: round_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -214,15 +187,12 @@ define <4 x float> @round_v4f32(<4 x float> %x) strictfp {
 define <8 x float> @round_v8f32(<8 x float> %x) strictfp {
 ; CHECK-LABEL: round_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -236,15 +206,12 @@ define <8 x float> @round_v8f32(<8 x float> %x) strictfp {
 define <16 x float> @round_v16f32(<16 x float> %x) strictfp {
 ; CHECK-LABEL: round_v16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -258,15 +225,12 @@ define <16 x float> @round_v16f32(<16 x float> %x) strictfp {
 define <1 x double> @round_v1f64(<1 x double> %x) strictfp {
 ; RV32-LABEL: round_v1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -276,16 +240,13 @@ define <1 x double> @round_v1f64(<1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_v1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -299,15 +260,12 @@ define <1 x double> @round_v1f64(<1 x double> %x) strictfp {
 define <2 x double> @round_v2f64(<2 x double> %x) strictfp {
 ; RV32-LABEL: round_v2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -317,16 +275,13 @@ define <2 x double> @round_v2f64(<2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_v2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -340,15 +295,12 @@ define <2 x double> @round_v2f64(<2 x double> %x) strictfp {
 define <4 x double> @round_v4f64(<4 x double> %x) strictfp {
 ; RV32-LABEL: round_v4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -358,16 +310,13 @@ define <4 x double> @round_v4f64(<4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_v4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -381,15 +330,12 @@ define <4 x double> @round_v4f64(<4 x double> %x) strictfp {
 define <8 x double> @round_v8f64(<8 x double> %x) strictfp {
 ; RV32-LABEL: round_v8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -399,16 +345,13 @@ define <8 x double> @round_v8f64(<8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_v8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven-constrained-sdnode.ll
index 5e5c64fd891fd..16c8ce6acd9d3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-froundeven-constrained-sdnode.ll
@@ -9,16 +9,13 @@
 define <1 x half> @roundeven_v1f16(<1 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_v1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -32,16 +29,13 @@ define <1 x half> @roundeven_v1f16(<1 x half> %x) strictfp {
 define <2 x half> @roundeven_v2f16(<2 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_v2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -55,16 +49,13 @@ define <2 x half> @roundeven_v2f16(<2 x half> %x) strictfp {
 define <4 x half> @roundeven_v4f16(<4 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -78,16 +69,13 @@ define <4 x half> @roundeven_v4f16(<4 x half> %x) strictfp {
 define <8 x half> @roundeven_v8f16(<8 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_v8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -101,16 +89,13 @@ define <8 x half> @roundeven_v8f16(<8 x half> %x) strictfp {
 define <16 x half> @roundeven_v16f16(<16 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_v16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -126,15 +111,12 @@ define <32 x half> @roundeven_v32f16(<32 x half> %x) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    slli a1, a1, 10
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.h.x fa5, a1
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -148,15 +130,12 @@ define <32 x half> @roundeven_v32f16(<32 x half> %x) strictfp {
 define <1 x float> @roundeven_v1f32(<1 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_v1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -170,15 +149,12 @@ define <1 x float> @roundeven_v1f32(<1 x float> %x) strictfp {
 define <2 x float> @roundeven_v2f32(<2 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_v2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -192,15 +168,12 @@ define <2 x float> @roundeven_v2f32(<2 x float> %x) strictfp {
 define <4 x float> @roundeven_v4f32(<4 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -214,15 +187,12 @@ define <4 x float> @roundeven_v4f32(<4 x float> %x) strictfp {
 define <8 x float> @roundeven_v8f32(<8 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -236,15 +206,12 @@ define <8 x float> @roundeven_v8f32(<8 x float> %x) strictfp {
 define <16 x float> @roundeven_v16f32(<16 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_v16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -258,15 +225,12 @@ define <16 x float> @roundeven_v16f32(<16 x float> %x) strictfp {
 define <1 x double> @roundeven_v1f64(<1 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_v1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -276,16 +240,13 @@ define <1 x double> @roundeven_v1f64(<1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: roundeven_v1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -299,15 +260,12 @@ define <1 x double> @roundeven_v1f64(<1 x double> %x) strictfp {
 define <2 x double> @roundeven_v2f64(<2 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_v2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -317,16 +275,13 @@ define <2 x double> @roundeven_v2f64(<2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: roundeven_v2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -340,15 +295,12 @@ define <2 x double> @roundeven_v2f64(<2 x double> %x) strictfp {
 define <4 x double> @roundeven_v4f64(<4 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_v4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -358,16 +310,13 @@ define <4 x double> @roundeven_v4f64(<4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: roundeven_v4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -381,15 +330,12 @@ define <4 x double> @roundeven_v4f64(<4 x double> %x) strictfp {
 define <8 x double> @roundeven_v8f64(<8 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_v8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -399,16 +345,13 @@ define <8 x double> @roundeven_v8f64(<8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: roundeven_v8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll
index 7813d7f309b6a..f187c19d0b70d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll
@@ -7,15 +7,12 @@
 define <1 x half> @trunc_v1f16(<1 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_v1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, mu
@@ -28,15 +25,12 @@ define <1 x half> @trunc_v1f16(<1 x half> %x) strictfp {
 define <2 x half> @trunc_v2f16(<2 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_v2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, mu
@@ -49,15 +43,12 @@ define <2 x half> @trunc_v2f16(<2 x half> %x) strictfp {
 define <4 x half> @trunc_v4f16(<4 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, mu
@@ -70,15 +61,12 @@ define <4 x half> @trunc_v4f16(<4 x half> %x) strictfp {
 define <8 x half> @trunc_v8f16(<8 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_v8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, mu
@@ -91,15 +79,12 @@ define <8 x half> @trunc_v8f16(<8 x half> %x) strictfp {
 define <16 x half> @trunc_v16f16(<16 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_v16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, mu
@@ -114,14 +99,11 @@ define <32 x half> @trunc_v32f16(<32 x half> %x) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    slli a1, a1, 10
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.h.x fa5, a1
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, mu
@@ -134,14 +116,11 @@ define <32 x half> @trunc_v32f16(<32 x half> %x) strictfp {
 define <1 x float> @trunc_v1f32(<1 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_v1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
@@ -154,14 +133,11 @@ define <1 x float> @trunc_v1f32(<1 x float> %x) strictfp {
 define <2 x float> @trunc_v2f32(<2 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_v2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
@@ -174,14 +150,11 @@ define <2 x float> @trunc_v2f32(<2 x float> %x) strictfp {
 define <4 x float> @trunc_v4f32(<4 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_v4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
@@ -194,14 +167,11 @@ define <4 x float> @trunc_v4f32(<4 x float> %x) strictfp {
 define <8 x float> @trunc_v8f32(<8 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
@@ -214,14 +184,11 @@ define <8 x float> @trunc_v8f32(<8 x float> %x) strictfp {
 define <16 x float> @trunc_v16f32(<16 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_v16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
@@ -234,14 +201,11 @@ define <16 x float> @trunc_v16f32(<16 x float> %x) strictfp {
 define <1 x double> @trunc_v1f64(<1 x double> %x) strictfp {
 ; RV32-LABEL: trunc_v1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -250,15 +214,12 @@ define <1 x double> @trunc_v1f64(<1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_v1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -271,14 +232,11 @@ define <1 x double> @trunc_v1f64(<1 x double> %x) strictfp {
 define <2 x double> @trunc_v2f64(<2 x double> %x) strictfp {
 ; RV32-LABEL: trunc_v2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -287,15 +245,12 @@ define <2 x double> @trunc_v2f64(<2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_v2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -308,14 +263,11 @@ define <2 x double> @trunc_v2f64(<2 x double> %x) strictfp {
 define <4 x double> @trunc_v4f64(<4 x double> %x) strictfp {
 ; RV32-LABEL: trunc_v4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -324,15 +276,12 @@ define <4 x double> @trunc_v4f64(<4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_v4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -345,14 +294,11 @@ define <4 x double> @trunc_v4f64(<4 x double> %x) strictfp {
 define <8 x double> @trunc_v8f64(<8 x double> %x) strictfp {
 ; RV32-LABEL: trunc_v8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
@@ -361,15 +307,12 @@ define <8 x double> @trunc_v8f64(<8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_v8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmadd-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmadd-constrained-sdnode.ll
index f052de4b0f514..d51afaad71643 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmadd-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfmadd-constrained-sdnode.ll
@@ -18,12 +18,12 @@ define <1 x bfloat> @vfmadd_vv_v1bf16(<1 x bfloat> %va, <1 x bfloat> %vb, <1 x b
 ; ZVFH:       # %bb.0:
 ; ZVFH-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
 ; ZVFH-NEXT:    vfwcvtbf16.f.f.v v11, v10
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v10, v9
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v9, v8
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v10, v8
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v12, v9
 ; ZVFH-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFH-NEXT:    vfmadd.vv v9, v10, v11
+; ZVFH-NEXT:    vfmadd.vv v12, v10, v11
 ; ZVFH-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v9
+; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v12
 ; ZVFH-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v1bf16:
@@ -40,12 +40,12 @@ define <2 x bfloat> @vfmadd_vv_v2bf16(<2 x bfloat> %va, <2 x bfloat> %vb, <2 x b
 ; ZVFH:       # %bb.0:
 ; ZVFH-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
 ; ZVFH-NEXT:    vfwcvtbf16.f.f.v v11, v9
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v9, v10
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v10, v8
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v9, v8
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v12, v10
 ; ZVFH-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFH-NEXT:    vfmadd.vv v10, v9, v11
+; ZVFH-NEXT:    vfmadd.vv v12, v9, v11
 ; ZVFH-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v10
+; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v12
 ; ZVFH-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v2bf16:
@@ -62,12 +62,12 @@ define <4 x bfloat> @vfmadd_vv_v4bf16(<4 x bfloat> %va, <4 x bfloat> %vb, <4 x b
 ; ZVFH:       # %bb.0:
 ; ZVFH-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; ZVFH-NEXT:    vfwcvtbf16.f.f.v v11, v10
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v10, v8
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v12, v9
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v10, v9
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v9, v8
 ; ZVFH-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFH-NEXT:    vfmadd.vv v12, v10, v11
+; ZVFH-NEXT:    vfmadd.vv v9, v10, v11
 ; ZVFH-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v12
+; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v9
 ; ZVFH-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v4bf16:
@@ -84,12 +84,12 @@ define <8 x bfloat> @vfmadd_vv_v8bf16(<8 x bfloat> %va, <8 x bfloat> %vb, <8 x b
 ; ZVFH:       # %bb.0:
 ; ZVFH-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; ZVFH-NEXT:    vfwcvtbf16.f.f.v v12, v8
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v14, v10
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v10, v9
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v14, v9
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v16, v10
 ; ZVFH-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
-; ZVFH-NEXT:    vfmadd.vv v10, v14, v12
+; ZVFH-NEXT:    vfmadd.vv v16, v14, v12
 ; ZVFH-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
-; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v10
+; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v16
 ; ZVFH-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v8bf16:
@@ -106,12 +106,12 @@ define <16 x bfloat> @vfmadd_vv_v16bf16(<16 x bfloat> %va, <16 x bfloat> %vb, <1
 ; ZVFH:       # %bb.0:
 ; ZVFH-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
 ; ZVFH-NEXT:    vfwcvtbf16.f.f.v v16, v10
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v20, v8
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v24, v12
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v20, v12
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v12, v8
 ; ZVFH-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFH-NEXT:    vfmadd.vv v24, v20, v16
+; ZVFH-NEXT:    vfmadd.vv v12, v20, v16
 ; ZVFH-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
-; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v24
+; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v12
 ; ZVFH-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v16bf16:
@@ -126,15 +126,31 @@ define <16 x bfloat> @vfmadd_vv_v16bf16(<16 x bfloat> %va, <16 x bfloat> %vb, <1
 define <32 x bfloat> @vfmadd_vv_v32bf16(<32 x bfloat> %va, <32 x bfloat> %vb, <32 x bfloat> %vc) strictfp {
 ; ZVFH-LABEL: vfmadd_vv_v32bf16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    addi sp, sp, -16
+; ZVFH-NEXT:    .cfi_def_cfa_offset 16
+; ZVFH-NEXT:    csrr a0, vlenb
+; ZVFH-NEXT:    slli a0, a0, 2
+; ZVFH-NEXT:    sub sp, sp, a0
+; ZVFH-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x04, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 4 * vlenb
+; ZVFH-NEXT:    addi a0, sp, 16
+; ZVFH-NEXT:    vs4r.v v16, (a0) # vscale x 32-byte Folded Spill
 ; ZVFH-NEXT:    li a0, 32
 ; ZVFH-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v16, v8
+; ZVFH-NEXT:    vfwcvtbf16.f.f.v v0, v12
+; ZVFH-NEXT:    addi a0, sp, 16
+; ZVFH-NEXT:    vl4r.v v8, (a0) # vscale x 32-byte Folded Reload
 ; ZVFH-NEXT:    vfwcvtbf16.f.f.v v24, v8
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v0, v16
-; ZVFH-NEXT:    vfwcvtbf16.f.f.v v16, v12
 ; ZVFH-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFH-NEXT:    vfmadd.vv v16, v0, v24
+; ZVFH-NEXT:    vfmadd.vv v24, v0, v16
 ; ZVFH-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v16
+; ZVFH-NEXT:    vfncvtbf16.f.f.w v8, v24
+; ZVFH-NEXT:    csrr a0, vlenb
+; ZVFH-NEXT:    slli a0, a0, 2
+; ZVFH-NEXT:    add sp, sp, a0
+; ZVFH-NEXT:    .cfi_def_cfa sp, 16
+; ZVFH-NEXT:    addi sp, sp, 16
+; ZVFH-NEXT:    .cfi_def_cfa_offset 0
 ; ZVFH-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v32bf16:
@@ -158,12 +174,12 @@ define <2 x half> @vfmadd_vv_v2f16(<2 x half> %va, <2 x half> %vb, <2 x half> %v
 ; ZVFBFA:       # %bb.0:
 ; ZVFBFA-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
 ; ZVFBFA-NEXT:    vfwcvt.f.f.v v11, v9
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v9, v10
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v10, v8
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v9, v8
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v12, v10
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFBFA-NEXT:    vfmadd.vv v10, v9, v11
+; ZVFBFA-NEXT:    vfmadd.vv v12, v9, v11
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v10
+; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFBFA-NEXT:    ret
   %vd = call <2 x half> @llvm.experimental.constrained.fma.v2f16(<2 x half> %va, <2 x half> %vc, <2 x half> %vb, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <2 x half> %vd
@@ -206,12 +222,12 @@ define <4 x half> @vfmadd_vv_v4f16(<4 x half> %va, <4 x half> %vb, <4 x half> %v
 ; ZVFBFA:       # %bb.0:
 ; ZVFBFA-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; ZVFBFA-NEXT:    vfwcvt.f.f.v v11, v10
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v10, v8
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v12, v9
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v10, v9
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v9, v8
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFBFA-NEXT:    vfmadd.vv v12, v10, v11
+; ZVFBFA-NEXT:    vfmadd.vv v9, v10, v11
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v9
 ; ZVFBFA-NEXT:    ret
   %vd = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> %vb, <4 x half> %va, <4 x half> %vc, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <4 x half> %vd
@@ -254,12 +270,12 @@ define <8 x half> @vfmadd_vv_v8f16(<8 x half> %va, <8 x half> %vb, <8 x half> %v
 ; ZVFBFA:       # %bb.0:
 ; ZVFBFA-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; ZVFBFA-NEXT:    vfwcvt.f.f.v v12, v8
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v14, v10
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v10, v9
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v14, v9
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v16, v10
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
-; ZVFBFA-NEXT:    vfmadd.vv v10, v14, v12
+; ZVFBFA-NEXT:    vfmadd.vv v16, v14, v12
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
-; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v10
+; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v16
 ; ZVFBFA-NEXT:    ret
   %vd = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> %vb, <8 x half> %vc, <8 x half> %va, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <8 x half> %vd
@@ -302,12 +318,12 @@ define <16 x half> @vfmadd_vv_v16f16(<16 x half> %va, <16 x half> %vb, <16 x hal
 ; ZVFBFA:       # %bb.0:
 ; ZVFBFA-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
 ; ZVFBFA-NEXT:    vfwcvt.f.f.v v16, v10
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v20, v8
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v24, v12
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v20, v12
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v12, v8
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFBFA-NEXT:    vfmadd.vv v24, v20, v16
+; ZVFBFA-NEXT:    vfmadd.vv v12, v20, v16
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
-; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v24
+; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFBFA-NEXT:    ret
   %vd = call <16 x half> @llvm.experimental.constrained.fma.v16f16(<16 x half> %vc, <16 x half> %va, <16 x half> %vb, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <16 x half> %vd
@@ -349,31 +365,15 @@ define <32 x half> @vfmadd_vv_v32f16(<32 x half> %va, <32 x half> %vb, <32 x hal
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_v32f16:
 ; ZVFBFA:       # %bb.0:
-; ZVFBFA-NEXT:    addi sp, sp, -16
-; ZVFBFA-NEXT:    .cfi_def_cfa_offset 16
-; ZVFBFA-NEXT:    csrr a0, vlenb
-; ZVFBFA-NEXT:    slli a0, a0, 2
-; ZVFBFA-NEXT:    sub sp, sp, a0
-; ZVFBFA-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x04, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 4 * vlenb
-; ZVFBFA-NEXT:    addi a0, sp, 16
-; ZVFBFA-NEXT:    vs4r.v v16, (a0) # vscale x 32-byte Folded Spill
 ; ZVFBFA-NEXT:    li a0, 32
 ; ZVFBFA-NEXT:    vsetvli zero, a0, e16, m4, ta, ma
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v16, v8
-; ZVFBFA-NEXT:    vfwcvt.f.f.v v0, v12
-; ZVFBFA-NEXT:    addi a0, sp, 16
-; ZVFBFA-NEXT:    vl4r.v v8, (a0) # vscale x 32-byte Folded Reload
 ; ZVFBFA-NEXT:    vfwcvt.f.f.v v24, v8
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v0, v16
+; ZVFBFA-NEXT:    vfwcvt.f.f.v v16, v12
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFBFA-NEXT:    vfmadd.vv v24, v0, v16
+; ZVFBFA-NEXT:    vfmadd.vv v16, v0, v24
 ; ZVFBFA-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v24
-; ZVFBFA-NEXT:    csrr a0, vlenb
-; ZVFBFA-NEXT:    slli a0, a0, 2
-; ZVFBFA-NEXT:    add sp, sp, a0
-; ZVFBFA-NEXT:    .cfi_def_cfa sp, 16
-; ZVFBFA-NEXT:    addi sp, sp, 16
-; ZVFBFA-NEXT:    .cfi_def_cfa_offset 0
+; ZVFBFA-NEXT:    vfncvt.f.f.w v8, v16
 ; ZVFBFA-NEXT:    ret
   %vd = call <32 x half> @llvm.experimental.constrained.fma.v32f16(<32 x half> %vc, <32 x half> %vb, <32 x half> %va, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <32 x half> %vd
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfptoi-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfptoi-constrained-sdnode.ll
index 77a67f1619dd0..7c5c6ecde7eea 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfptoi-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfptoi-constrained-sdnode.ll
@@ -9,8 +9,7 @@ define <1 x i1> @vfptosi_v1f16_v1i1(<1 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <1 x i1> @llvm.experimental.constrained.fptosi.v1i1.v1f16(<1 x half> %va, metadata !"fpexcept.strict")
   ret <1 x i1> %evec
@@ -21,8 +20,7 @@ define <1 x i1> @vfptoui_v1f16_v1i1(<1 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <1 x i1> @llvm.experimental.constrained.fptoui.v1i1.v1f16(<1 x half> %va, metadata !"fpexcept.strict")
   ret <1 x i1> %evec
@@ -51,14 +49,14 @@ define <1 x i7> @vfptoui_v1f16_v1i7(<1 x half> %va) strictfp {
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV32-NEXT:    vfmv.f.s fa5, v8
-; RV32-NEXT:    fcvt.w.h a0, fa5, rtz
+; RV32-NEXT:    fcvt.wu.h a0, fa5, rtz
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: vfptoui_v1f16_v1i7:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
 ; RV64-NEXT:    vfmv.f.s fa5, v8
-; RV64-NEXT:    fcvt.l.h a0, fa5, rtz
+; RV64-NEXT:    fcvt.lu.h a0, fa5, rtz
 ; RV64-NEXT:    ret
   %evec = call <1 x i7> @llvm.experimental.constrained.fptoui.v1i7.v1f16(<1 x half> %va, metadata !"fpexcept.strict")
   ret <1 x i7> %evec
@@ -157,8 +155,7 @@ define <2 x i1> @vfptosi_v2f16_v2i1(<2 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <2 x i1> @llvm.experimental.constrained.fptosi.v2i1.v2f16(<2 x half> %va, metadata !"fpexcept.strict")
   ret <2 x i1> %evec
@@ -169,8 +166,7 @@ define <2 x i1> @vfptoui_v2f16_v2i1(<2 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <2 x i1> @llvm.experimental.constrained.fptoui.v2i1.v2f16(<2 x half> %va, metadata !"fpexcept.strict")
   ret <2 x i1> %evec
@@ -269,8 +265,7 @@ define <4 x i1> @vfptosi_v4f16_v4i1(<4 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <4 x i1> @llvm.experimental.constrained.fptosi.v4i1.v4f16(<4 x half> %va, metadata !"fpexcept.strict")
   ret <4 x i1> %evec
@@ -281,8 +276,7 @@ define <4 x i1> @vfptoui_v4f16_v4i1(<4 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e8, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <4 x i1> @llvm.experimental.constrained.fptoui.v4i1.v4f16(<4 x half> %va, metadata !"fpexcept.strict")
   ret <4 x i1> %evec
@@ -381,8 +375,7 @@ define <8 x i1> @vfptosi_v8f16_v8i1(<8 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <8 x i1> @llvm.experimental.constrained.fptosi.v8i1.v8f16(<8 x half> %va, metadata !"fpexcept.strict")
   ret <8 x i1> %evec
@@ -393,8 +386,7 @@ define <8 x i1> @vfptoui_v8f16_v8i1(<8 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <8 x i1> @llvm.experimental.constrained.fptoui.v8i1.v8f16(<8 x half> %va, metadata !"fpexcept.strict")
   ret <8 x i1> %evec
@@ -493,8 +485,7 @@ define <16 x i1> @vfptosi_v16f16_v16i1(<16 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <16 x i1> @llvm.experimental.constrained.fptosi.v16i1.v16f16(<16 x half> %va, metadata !"fpexcept.strict")
   ret <16 x i1> %evec
@@ -505,8 +496,7 @@ define <16 x i1> @vfptoui_v16f16_v16i1(<16 x half> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <16 x i1> @llvm.experimental.constrained.fptoui.v16i1.v16f16(<16 x half> %va, metadata !"fpexcept.strict")
   ret <16 x i1> %evec
@@ -582,8 +572,7 @@ define <32 x i1> @vfptosi_v32f16_v32i1(<32 x half> %va) strictfp {
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    vsetvli zero, a0, e8, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <32 x i1> @llvm.experimental.constrained.fptosi.v32i1.v32f16(<32 x half> %va, metadata !"fpexcept.strict")
   ret <32 x i1> %evec
@@ -595,8 +584,7 @@ define <32 x i1> @vfptoui_v32f16_v32i1(<32 x half> %va) strictfp {
 ; CHECK-NEXT:    li a0, 32
 ; CHECK-NEXT:    vsetvli zero, a0, e8, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <32 x i1> @llvm.experimental.constrained.fptoui.v32i1.v32f16(<32 x half> %va, metadata !"fpexcept.strict")
   ret <32 x i1> %evec
@@ -653,8 +641,7 @@ define <1 x i1> @vfptosi_v1f32_v1i1(<1 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <1 x i1> @llvm.experimental.constrained.fptosi.v1i1.v1f32(<1 x float> %va, metadata !"fpexcept.strict")
   ret <1 x i1> %evec
@@ -665,8 +652,7 @@ define <1 x i1> @vfptoui_v1f32_v1i1(<1 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <1 x i1> @llvm.experimental.constrained.fptoui.v1i1.v1f32(<1 x float> %va, metadata !"fpexcept.strict")
   ret <1 x i1> %evec
@@ -765,8 +751,7 @@ define <2 x i1> @vfptosi_v2f32_v2i1(<2 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <2 x i1> @llvm.experimental.constrained.fptosi.v2i1.v2f32(<2 x float> %va, metadata !"fpexcept.strict")
   ret <2 x i1> %evec
@@ -777,8 +762,7 @@ define <2 x i1> @vfptoui_v2f32_v2i1(<2 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <2 x i1> @llvm.experimental.constrained.fptoui.v2i1.v2f32(<2 x float> %va, metadata !"fpexcept.strict")
   ret <2 x i1> %evec
@@ -877,8 +861,7 @@ define <4 x i1> @vfptosi_v4f32_v4i1(<4 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <4 x i1> @llvm.experimental.constrained.fptosi.v4i1.v4f32(<4 x float> %va, metadata !"fpexcept.strict")
   ret <4 x i1> %evec
@@ -889,8 +872,7 @@ define <4 x i1> @vfptoui_v4f32_v4i1(<4 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <4 x i1> @llvm.experimental.constrained.fptoui.v4i1.v4f32(<4 x float> %va, metadata !"fpexcept.strict")
   ret <4 x i1> %evec
@@ -989,8 +971,7 @@ define <8 x i1> @vfptosi_v8f32_v8i1(<8 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <8 x i1> @llvm.experimental.constrained.fptosi.v8i1.v8f32(<8 x float> %va, metadata !"fpexcept.strict")
   ret <8 x i1> %evec
@@ -1001,8 +982,7 @@ define <8 x i1> @vfptoui_v8f32_v8i1(<8 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <8 x i1> @llvm.experimental.constrained.fptoui.v8i1.v8f32(<8 x float> %va, metadata !"fpexcept.strict")
   ret <8 x i1> %evec
@@ -1101,8 +1081,7 @@ define <16 x i1> @vfptosi_v16f32_v16i1(<16 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <16 x i1> @llvm.experimental.constrained.fptosi.v16i1.v16f32(<16 x float> %va, metadata !"fpexcept.strict")
   ret <16 x i1> %evec
@@ -1113,8 +1092,7 @@ define <16 x i1> @vfptoui_v16f32_v16i1(<16 x float> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e16, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <16 x i1> @llvm.experimental.constrained.fptoui.v16i1.v16f32(<16 x float> %va, metadata !"fpexcept.strict")
   ret <16 x i1> %evec
@@ -1191,8 +1169,7 @@ define <1 x i1> @vfptosi_v1f64_v1i1(<1 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <1 x i1> @llvm.experimental.constrained.fptosi.v1i1.v1f64(<1 x double> %va, metadata !"fpexcept.strict")
   ret <1 x i1> %evec
@@ -1203,8 +1180,7 @@ define <1 x i1> @vfptoui_v1f64_v1i1(<1 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <1 x i1> @llvm.experimental.constrained.fptoui.v1i1.v1f64(<1 x double> %va, metadata !"fpexcept.strict")
   ret <1 x i1> %evec
@@ -1309,8 +1285,7 @@ define <2 x i1> @vfptosi_v2f64_v2i1(<2 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <2 x i1> @llvm.experimental.constrained.fptosi.v2i1.v2f64(<2 x double> %va, metadata !"fpexcept.strict")
   ret <2 x i1> %evec
@@ -1321,8 +1296,7 @@ define <2 x i1> @vfptoui_v2f64_v2i1(<2 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <2 x i1> @llvm.experimental.constrained.fptoui.v2i1.v2f64(<2 x double> %va, metadata !"fpexcept.strict")
   ret <2 x i1> %evec
@@ -1427,8 +1401,7 @@ define <4 x i1> @vfptosi_v4f64_v4i1(<4 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <4 x i1> @llvm.experimental.constrained.fptosi.v4i1.v4f64(<4 x double> %va, metadata !"fpexcept.strict")
   ret <4 x i1> %evec
@@ -1439,8 +1412,7 @@ define <4 x i1> @vfptoui_v4f64_v4i1(<4 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <4 x i1> @llvm.experimental.constrained.fptoui.v4i1.v4f64(<4 x double> %va, metadata !"fpexcept.strict")
   ret <4 x i1> %evec
@@ -1545,8 +1517,7 @@ define <8 x i1> @vfptosi_v8f64_v8i1(<8 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <8 x i1> @llvm.experimental.constrained.fptosi.v8i1.v8f64(<8 x double> %va, metadata !"fpexcept.strict")
   ret <8 x i1> %evec
@@ -1557,8 +1528,7 @@ define <8 x i1> @vfptoui_v8f64_v8i1(<8 x double> %va) strictfp {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <8 x i1> @llvm.experimental.constrained.fptoui.v8i1.v8f64(<8 x double> %va, metadata !"fpexcept.strict")
   ret <8 x i1> %evec
diff --git a/llvm/test/CodeGen/RISCV/rvv/fnearbyint-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fnearbyint-constrained-sdnode.ll
index 6c5b6ff31a24b..2ccb847907481 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fnearbyint-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fnearbyint-constrained-sdnode.ll
@@ -7,16 +7,13 @@
 define <vscale x 1 x half> @nearbyint_nxv1f16(<vscale x 1 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, mu
@@ -30,16 +27,13 @@ define <vscale x 1 x half> @nearbyint_nxv1f16(<vscale x 1 x half> %v) strictfp {
 define <vscale x 2 x half> @nearbyint_nxv2f16(<vscale x 2 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, mu
@@ -53,16 +47,13 @@ define <vscale x 2 x half> @nearbyint_nxv2f16(<vscale x 2 x half> %v) strictfp {
 define <vscale x 4 x half> @nearbyint_nxv4f16(<vscale x 4 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, mu
@@ -76,16 +67,13 @@ define <vscale x 4 x half> @nearbyint_nxv4f16(<vscale x 4 x half> %v) strictfp {
 define <vscale x 8 x half> @nearbyint_nxv8f16(<vscale x 8 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, mu
@@ -99,16 +87,13 @@ define <vscale x 8 x half> @nearbyint_nxv8f16(<vscale x 8 x half> %v) strictfp {
 define <vscale x 16 x half> @nearbyint_nxv16f16(<vscale x 16 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, mu
@@ -122,16 +107,13 @@ define <vscale x 16 x half> @nearbyint_nxv16f16(<vscale x 16 x half> %v) strictf
 define <vscale x 32 x half> @nearbyint_nxv32f16(<vscale x 32 x half> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv32f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, mu
@@ -145,15 +127,12 @@ define <vscale x 32 x half> @nearbyint_nxv32f16(<vscale x 32 x half> %v) strictf
 define <vscale x 1 x float> @nearbyint_nxv1f32(<vscale x 1 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
@@ -167,15 +146,12 @@ define <vscale x 1 x float> @nearbyint_nxv1f32(<vscale x 1 x float> %v) strictfp
 define <vscale x 2 x float> @nearbyint_nxv2f32(<vscale x 2 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
@@ -189,15 +165,12 @@ define <vscale x 2 x float> @nearbyint_nxv2f32(<vscale x 2 x float> %v) strictfp
 define <vscale x 4 x float> @nearbyint_nxv4f32(<vscale x 4 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
@@ -211,15 +184,12 @@ define <vscale x 4 x float> @nearbyint_nxv4f32(<vscale x 4 x float> %v) strictfp
 define <vscale x 8 x float> @nearbyint_nxv8f32(<vscale x 8 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
@@ -233,15 +203,12 @@ define <vscale x 8 x float> @nearbyint_nxv8f32(<vscale x 8 x float> %v) strictfp
 define <vscale x 16 x float> @nearbyint_nxv16f32(<vscale x 16 x float> %v) strictfp {
 ; CHECK-LABEL: nearbyint_nxv16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    frflags a0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, mu
@@ -255,15 +222,12 @@ define <vscale x 16 x float> @nearbyint_nxv16f32(<vscale x 16 x float> %v) stric
 define <vscale x 1 x double> @nearbyint_nxv1f64(<vscale x 1 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_nxv1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -273,16 +237,13 @@ define <vscale x 1 x double> @nearbyint_nxv1f64(<vscale x 1 x double> %v) strict
 ;
 ; RV64-LABEL: nearbyint_nxv1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -296,15 +257,12 @@ define <vscale x 1 x double> @nearbyint_nxv1f64(<vscale x 1 x double> %v) strict
 define <vscale x 2 x double> @nearbyint_nxv2f64(<vscale x 2 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_nxv2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -314,16 +272,13 @@ define <vscale x 2 x double> @nearbyint_nxv2f64(<vscale x 2 x double> %v) strict
 ;
 ; RV64-LABEL: nearbyint_nxv2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -337,15 +292,12 @@ define <vscale x 2 x double> @nearbyint_nxv2f64(<vscale x 2 x double> %v) strict
 define <vscale x 4 x double> @nearbyint_nxv4f64(<vscale x 4 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_nxv4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
@@ -355,16 +307,13 @@ define <vscale x 4 x double> @nearbyint_nxv4f64(<vscale x 4 x double> %v) strict
 ;
 ; RV64-LABEL: nearbyint_nxv4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
@@ -378,15 +327,12 @@ define <vscale x 4 x double> @nearbyint_nxv4f64(<vscale x 4 x double> %v) strict
 define <vscale x 8 x double> @nearbyint_nxv8f64(<vscale x 8 x double> %v) strictfp {
 ; RV32-LABEL: nearbyint_nxv8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfabs.v v16, v8
 ; RV32-NEXT:    vmflt.vf v0, v16, fa5
 ; RV32-NEXT:    frflags a0
-; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
@@ -396,16 +342,13 @@ define <vscale x 8 x double> @nearbyint_nxv8f64(<vscale x 8 x double> %v) strict
 ;
 ; RV64-LABEL: nearbyint_nxv8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v16, fa5
 ; RV64-NEXT:    frflags a0
-; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
diff --git a/llvm/test/CodeGen/RISCV/rvv/fround-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/fround-constrained-sdnode.ll
index 91897ef7fbac3..4631a47aa8c74 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fround-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fround-constrained-sdnode.ll
@@ -9,16 +9,13 @@
 define <vscale x 1 x half> @round_nxv1f16(<vscale x 1 x half> %x) strictfp {
 ; CHECK-LABEL: round_nxv1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -32,16 +29,13 @@ define <vscale x 1 x half> @round_nxv1f16(<vscale x 1 x half> %x) strictfp {
 define <vscale x 2 x half> @round_nxv2f16(<vscale x 2 x half> %x) strictfp {
 ; CHECK-LABEL: round_nxv2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -55,16 +49,13 @@ define <vscale x 2 x half> @round_nxv2f16(<vscale x 2 x half> %x) strictfp {
 define <vscale x 4 x half> @round_nxv4f16(<vscale x 4 x half> %x) strictfp {
 ; CHECK-LABEL: round_nxv4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -78,16 +69,13 @@ define <vscale x 4 x half> @round_nxv4f16(<vscale x 4 x half> %x) strictfp {
 define <vscale x 8 x half> @round_nxv8f16(<vscale x 8 x half> %x) strictfp {
 ; CHECK-LABEL: round_nxv8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -101,16 +89,13 @@ define <vscale x 8 x half> @round_nxv8f16(<vscale x 8 x half> %x) strictfp {
 define <vscale x 16 x half> @round_nxv16f16(<vscale x 16 x half> %x) strictfp {
 ; CHECK-LABEL: round_nxv16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -124,16 +109,13 @@ define <vscale x 16 x half> @round_nxv16f16(<vscale x 16 x half> %x) strictfp {
 define <vscale x 32 x half> @round_nxv32f16(<vscale x 32 x half> %x) strictfp {
 ; CHECK-LABEL: round_nxv32f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -147,15 +129,12 @@ define <vscale x 32 x half> @round_nxv32f16(<vscale x 32 x half> %x) strictfp {
 define <vscale x 1 x float> @round_nxv1f32(<vscale x 1 x float> %x) strictfp {
 ; CHECK-LABEL: round_nxv1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -169,15 +148,12 @@ define <vscale x 1 x float> @round_nxv1f32(<vscale x 1 x float> %x) strictfp {
 define <vscale x 2 x float> @round_nxv2f32(<vscale x 2 x float> %x) strictfp {
 ; CHECK-LABEL: round_nxv2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -191,15 +167,12 @@ define <vscale x 2 x float> @round_nxv2f32(<vscale x 2 x float> %x) strictfp {
 define <vscale x 4 x float> @round_nxv4f32(<vscale x 4 x float> %x) strictfp {
 ; CHECK-LABEL: round_nxv4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -213,15 +186,12 @@ define <vscale x 4 x float> @round_nxv4f32(<vscale x 4 x float> %x) strictfp {
 define <vscale x 8 x float> @round_nxv8f32(<vscale x 8 x float> %x) strictfp {
 ; CHECK-LABEL: round_nxv8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -235,15 +205,12 @@ define <vscale x 8 x float> @round_nxv8f32(<vscale x 8 x float> %x) strictfp {
 define <vscale x 16 x float> @round_nxv16f32(<vscale x 16 x float> %x) strictfp {
 ; CHECK-LABEL: round_nxv16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 4
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -257,15 +224,12 @@ define <vscale x 16 x float> @round_nxv16f32(<vscale x 16 x float> %x) strictfp
 define <vscale x 1 x double> @round_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ; RV32-LABEL: round_nxv1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -275,16 +239,13 @@ define <vscale x 1 x double> @round_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_nxv1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -298,15 +259,12 @@ define <vscale x 1 x double> @round_nxv1f64(<vscale x 1 x double> %x) strictfp {
 define <vscale x 2 x double> @round_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ; RV32-LABEL: round_nxv2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -316,16 +274,13 @@ define <vscale x 2 x double> @round_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_nxv2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -339,15 +294,12 @@ define <vscale x 2 x double> @round_nxv2f64(<vscale x 2 x double> %x) strictfp {
 define <vscale x 4 x double> @round_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ; RV32-LABEL: round_nxv4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -357,16 +309,13 @@ define <vscale x 4 x double> @round_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_nxv4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -380,15 +329,12 @@ define <vscale x 4 x double> @round_nxv4f64(<vscale x 4 x double> %x) strictfp {
 define <vscale x 8 x double> @round_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ; RV32-LABEL: round_nxv8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfabs.v v16, v8
 ; RV32-NEXT:    vmflt.vf v0, v16, fa5
 ; RV32-NEXT:    fsrmi a0, 4
-; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -398,16 +344,13 @@ define <vscale x 8 x double> @round_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: round_nxv8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v16, fa5
 ; RV64-NEXT:    fsrmi a0, 4
-; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v16, v16, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/froundeven-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/froundeven-constrained-sdnode.ll
index cd9d124e4b08c..5c84c2fa7c983 100644
--- a/llvm/test/CodeGen/RISCV/rvv/froundeven-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/froundeven-constrained-sdnode.ll
@@ -9,16 +9,13 @@
 define <vscale x 1 x half> @roundeven_nxv1f16(<vscale x 1 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -32,16 +29,13 @@ define <vscale x 1 x half> @roundeven_nxv1f16(<vscale x 1 x half> %x) strictfp {
 define <vscale x 2 x half> @roundeven_nxv2f16(<vscale x 2 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -55,16 +49,13 @@ define <vscale x 2 x half> @roundeven_nxv2f16(<vscale x 2 x half> %x) strictfp {
 define <vscale x 4 x half> @roundeven_nxv4f16(<vscale x 4 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -78,16 +69,13 @@ define <vscale x 4 x half> @roundeven_nxv4f16(<vscale x 4 x half> %x) strictfp {
 define <vscale x 8 x half> @roundeven_nxv8f16(<vscale x 8 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -101,16 +89,13 @@ define <vscale x 8 x half> @roundeven_nxv8f16(<vscale x 8 x half> %x) strictfp {
 define <vscale x 16 x half> @roundeven_nxv16f16(<vscale x 16 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -124,16 +109,13 @@ define <vscale x 16 x half> @roundeven_nxv16f16(<vscale x 16 x half> %x) strictf
 define <vscale x 32 x half> @roundeven_nxv32f16(<vscale x 32 x half> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv32f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -147,15 +129,12 @@ define <vscale x 32 x half> @roundeven_nxv32f16(<vscale x 32 x half> %x) strictf
 define <vscale x 1 x float> @roundeven_nxv1f32(<vscale x 1 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -169,15 +148,12 @@ define <vscale x 1 x float> @roundeven_nxv1f32(<vscale x 1 x float> %x) strictfp
 define <vscale x 2 x float> @roundeven_nxv2f32(<vscale x 2 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -191,15 +167,12 @@ define <vscale x 2 x float> @roundeven_nxv2f32(<vscale x 2 x float> %x) strictfp
 define <vscale x 4 x float> @roundeven_nxv4f32(<vscale x 4 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -213,15 +186,12 @@ define <vscale x 4 x float> @roundeven_nxv4f32(<vscale x 4 x float> %x) strictfp
 define <vscale x 8 x float> @roundeven_nxv8f32(<vscale x 8 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -235,15 +205,12 @@ define <vscale x 8 x float> @roundeven_nxv8f32(<vscale x 8 x float> %x) strictfp
 define <vscale x 16 x float> @roundeven_nxv16f32(<vscale x 16 x float> %x) strictfp {
 ; CHECK-LABEL: roundeven_nxv16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
 ; CHECK-NEXT:    fsrmi a0, 0
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    fsrm a0
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -257,15 +224,12 @@ define <vscale x 16 x float> @roundeven_nxv16f32(<vscale x 16 x float> %x) stric
 define <vscale x 1 x double> @roundeven_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_nxv1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -275,16 +239,13 @@ define <vscale x 1 x double> @roundeven_nxv1f64(<vscale x 1 x double> %x) strict
 ;
 ; RV64-LABEL: roundeven_nxv1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
@@ -298,15 +259,12 @@ define <vscale x 1 x double> @roundeven_nxv1f64(<vscale x 1 x double> %x) strict
 define <vscale x 2 x double> @roundeven_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_nxv2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -316,16 +274,13 @@ define <vscale x 2 x double> @roundeven_nxv2f64(<vscale x 2 x double> %x) strict
 ;
 ; RV64-LABEL: roundeven_nxv2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
@@ -339,15 +294,12 @@ define <vscale x 2 x double> @roundeven_nxv2f64(<vscale x 2 x double> %x) strict
 define <vscale x 4 x double> @roundeven_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_nxv4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -357,16 +309,13 @@ define <vscale x 4 x double> @roundeven_nxv4f64(<vscale x 4 x double> %x) strict
 ;
 ; RV64-LABEL: roundeven_nxv4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
@@ -380,15 +329,12 @@ define <vscale x 4 x double> @roundeven_nxv4f64(<vscale x 4 x double> %x) strict
 define <vscale x 8 x double> @roundeven_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ; RV32-LABEL: roundeven_nxv8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfabs.v v16, v8
 ; RV32-NEXT:    vmflt.vf v0, v16, fa5
 ; RV32-NEXT:    fsrmi a0, 0
-; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV32-NEXT:    fsrm a0
 ; RV32-NEXT:    vfcvt.f.x.v v16, v16, v0.t
@@ -398,16 +344,13 @@ define <vscale x 8 x double> @roundeven_nxv8f64(<vscale x 8 x double> %x) strict
 ;
 ; RV64-LABEL: roundeven_nxv8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v16, fa5
 ; RV64-NEXT:    fsrmi a0, 0
-; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfcvt.x.f.v v16, v8, v0.t
 ; RV64-NEXT:    fsrm a0
 ; RV64-NEXT:    vfcvt.f.x.v v16, v16, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll
index adeee2bd82b57..7f6118fcd4b35 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll
@@ -7,15 +7,12 @@
 define <vscale x 1 x half> @trunc_nxv1f16(<vscale x 1 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv1f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, mu
@@ -28,15 +25,12 @@ define <vscale x 1 x half> @trunc_nxv1f16(<vscale x 1 x half> %x) strictfp {
 define <vscale x 2 x half> @trunc_nxv2f16(<vscale x 2 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv2f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, mu
@@ -49,15 +43,12 @@ define <vscale x 2 x half> @trunc_nxv2f16(<vscale x 2 x half> %x) strictfp {
 define <vscale x 4 x half> @trunc_nxv4f16(<vscale x 4 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m1, ta, mu
@@ -70,15 +61,12 @@ define <vscale x 4 x half> @trunc_nxv4f16(<vscale x 4 x half> %x) strictfp {
 define <vscale x 8 x half> @trunc_nxv8f16(<vscale x 8 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv8f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, mu
@@ -91,15 +79,12 @@ define <vscale x 8 x half> @trunc_nxv8f16(<vscale x 8 x half> %x) strictfp {
 define <vscale x 16 x half> @trunc_nxv16f16(<vscale x 16 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv16f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m4, ta, mu
@@ -112,15 +97,12 @@ define <vscale x 16 x half> @trunc_nxv16f16(<vscale x 16 x half> %x) strictfp {
 define <vscale x 32 x half> @trunc_nxv32f16(<vscale x 32 x half> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv32f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e16, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    slli a0, a0, 10
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    fmv.h.x fa5, a0
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e16, m8, ta, mu
@@ -133,14 +115,11 @@ define <vscale x 32 x half> @trunc_nxv32f16(<vscale x 32 x half> %x) strictfp {
 define <vscale x 1 x float> @trunc_nxv1f32(<vscale x 1 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv1f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, mu
@@ -153,14 +132,11 @@ define <vscale x 1 x float> @trunc_nxv1f32(<vscale x 1 x float> %x) strictfp {
 define <vscale x 2 x float> @trunc_nxv2f32(<vscale x 2 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv2f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v9, v8
 ; CHECK-NEXT:    vmflt.vf v0, v9, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m1, ta, mu
@@ -173,14 +149,11 @@ define <vscale x 2 x float> @trunc_nxv2f32(<vscale x 2 x float> %x) strictfp {
 define <vscale x 4 x float> @trunc_nxv4f32(<vscale x 4 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv4f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v10, v8
 ; CHECK-NEXT:    vmflt.vf v0, v10, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m2, ta, mu
@@ -193,14 +166,11 @@ define <vscale x 4 x float> @trunc_nxv4f32(<vscale x 4 x float> %x) strictfp {
 define <vscale x 8 x float> @trunc_nxv8f32(<vscale x 8 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v12, v8
 ; CHECK-NEXT:    vmflt.vf v0, v12, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, mu
@@ -213,14 +183,11 @@ define <vscale x 8 x float> @trunc_nxv8f32(<vscale x 8 x float> %x) strictfp {
 define <vscale x 16 x float> @trunc_nxv16f32(<vscale x 16 x float> %x) strictfp {
 ; CHECK-LABEL: trunc_nxv16f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, mu
-; CHECK-NEXT:    vmfne.vv v0, v8, v8
+; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
+; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    lui a0, 307200
-; CHECK-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; CHECK-NEXT:    fmv.w.x fa5, a0
-; CHECK-NEXT:    vfabs.v v16, v8
 ; CHECK-NEXT:    vmflt.vf v0, v16, fa5
-; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; CHECK-NEXT:    vfcvt.rtz.x.f.v v16, v8, v0.t
 ; CHECK-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, m8, ta, mu
@@ -233,14 +200,11 @@ define <vscale x 16 x float> @trunc_nxv16f32(<vscale x 16 x float> %x) strictfp
 define <vscale x 1 x double> @trunc_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ; RV32-LABEL: trunc_nxv1f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI11_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI11_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfabs.v v9, v8
 ; RV32-NEXT:    vmflt.vf v0, v9, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -249,15 +213,12 @@ define <vscale x 1 x double> @trunc_nxv1f64(<vscale x 1 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_nxv1f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m1, ta, ma
+; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v9, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v9, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v9, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v9, v9, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m1, ta, mu
@@ -270,14 +231,11 @@ define <vscale x 1 x double> @trunc_nxv1f64(<vscale x 1 x double> %x) strictfp {
 define <vscale x 2 x double> @trunc_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ; RV32-LABEL: trunc_nxv2f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI12_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI12_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfabs.v v10, v8
 ; RV32-NEXT:    vmflt.vf v0, v10, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -286,15 +244,12 @@ define <vscale x 2 x double> @trunc_nxv2f64(<vscale x 2 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_nxv2f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v10, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v10, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v10, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v10, v10, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m2, ta, mu
@@ -307,14 +262,11 @@ define <vscale x 2 x double> @trunc_nxv2f64(<vscale x 2 x double> %x) strictfp {
 define <vscale x 4 x double> @trunc_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ; RV32-LABEL: trunc_nxv4f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI13_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI13_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfabs.v v12, v8
 ; RV32-NEXT:    vmflt.vf v0, v12, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
@@ -323,15 +275,12 @@ define <vscale x 4 x double> @trunc_nxv4f64(<vscale x 4 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_nxv4f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v12, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v12, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v12, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v12, v12, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m4, ta, mu
@@ -344,14 +293,11 @@ define <vscale x 4 x double> @trunc_nxv4f64(<vscale x 4 x double> %x) strictfp {
 define <vscale x 8 x double> @trunc_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ; RV32-LABEL: trunc_nxv8f64:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV32-NEXT:    vmfne.vv v0, v8, v8
 ; RV32-NEXT:    lui a0, %hi(.LCPI14_0)
 ; RV32-NEXT:    fld fa5, %lo(.LCPI14_0)(a0)
-; RV32-NEXT:    vfadd.vv v8, v8, v8, v0.t
+; RV32-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfabs.v v16, v8
 ; RV32-NEXT:    vmflt.vf v0, v16, fa5
-; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV32-NEXT:    vfcvt.rtz.x.f.v v16, v8, v0.t
 ; RV32-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; RV32-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
@@ -360,15 +306,12 @@ define <vscale x 8 x double> @trunc_nxv8f64(<vscale x 8 x double> %x) strictfp {
 ;
 ; RV64-LABEL: trunc_nxv8f64:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, mu
-; RV64-NEXT:    vmfne.vv v0, v8, v8
+; RV64-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    li a0, 1075
-; RV64-NEXT:    vfadd.vv v8, v8, v8, v0.t
 ; RV64-NEXT:    slli a0, a0, 52
-; RV64-NEXT:    vfabs.v v16, v8
 ; RV64-NEXT:    fmv.d.x fa5, a0
 ; RV64-NEXT:    vmflt.vf v0, v16, fa5
-; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, ma
 ; RV64-NEXT:    vfcvt.rtz.x.f.v v16, v8, v0.t
 ; RV64-NEXT:    vfcvt.f.x.v v16, v16, v0.t
 ; RV64-NEXT:    vsetvli zero, zero, e64, m8, ta, mu
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
index acd9519bb5a8e..83c8492128b46 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
@@ -328,10 +328,8 @@ define <vscale x 2 x float> @vpmerge_fadd(<vscale x 2 x float> %passthru, <vscal
 define <vscale x 2 x float> @vpmerge_constrained_fadd(<vscale x 2 x float> %passthru, <vscale x 2 x float> %x, <vscale x 2 x float> %y, <vscale x 2 x i1> %m, i64 %vl) strictfp {
 ; CHECK-LABEL: vpmerge_constrained_fadd:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e32, m1, ta, ma
-; CHECK-NEXT:    vfadd.vv v9, v9, v10
-; CHECK-NEXT:    vsetvli zero, a0, e32, m1, tu, ma
-; CHECK-NEXT:    vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT:    vsetvli zero, a0, e32, m1, tu, mu
+; CHECK-NEXT:    vfadd.vv v8, v9, v10, v0.t
 ; CHECK-NEXT:    ret
   %a = call <vscale x 2 x float> @llvm.experimental.constrained.fadd.nxv2f32(<vscale x 2 x float> %x, <vscale x 2 x float> %y, metadata !"round.dynamic", metadata !"fpexcept.strict") strictfp
   %b = call <vscale x 2 x float> @llvm.riscv.vmerge.nxv2f32.nxv2f32(<vscale x 2 x float> %passthru, <vscale x 2 x float> %passthru, <vscale x 2 x float> %a, <vscale x 2 x i1> %m, i64 %vl) strictfp
@@ -343,10 +341,8 @@ define <vscale x 2 x float> @vpmerge_constrained_fadd(<vscale x 2 x float> %pass
 define <vscale x 2 x float> @vpmerge_constrained_fadd_vlmax(<vscale x 2 x float> %passthru, <vscale x 2 x float> %x, <vscale x 2 x float> %y, <vscale x 2 x i1> %m) strictfp {
 ; CHECK-LABEL: vpmerge_constrained_fadd_vlmax:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
-; CHECK-NEXT:    vfadd.vv v9, v9, v10
-; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, ma
-; CHECK-NEXT:    vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, tu, mu
+; CHECK-NEXT:    vfadd.vv v8, v9, v10, v0.t
 ; CHECK-NEXT:    ret
   %a = call <vscale x 2 x float> @llvm.experimental.constrained.fadd.nxv2f32(<vscale x 2 x float> %x, <vscale x 2 x float> %y, metadata !"round.dynamic", metadata !"fpexcept.strict") strictfp
   %b = call <vscale x 2 x float> @llvm.riscv.vmerge.nxv2f32.nxv2f32(<vscale x 2 x float> %passthru, <vscale x 2 x float> %passthru, <vscale x 2 x float> %a, <vscale x 2 x i1> %m, i64 -1) strictfp
diff --git a/llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll b/llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll
index dcdf548020e9d..c34390b1d3ea3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll
@@ -80,7 +80,6 @@ define void @f1(ptr %p, ptr %q, double %t) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 2, e64, m1, ta, ma
 ; CHECK-NEXT:    vle64.v v8, (a0)
-; CHECK-NEXT:    fcvt.wu.d a0, fa0, rtz
 ; CHECK-NEXT:    vse64.v v8, (a1)
 ; CHECK-NEXT:    ret
   %x0 = load i64, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll
index 2c60529b13005..06f6b28a79ad1 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll
@@ -30,12 +30,12 @@ define <vscale x 1 x bfloat> @vfmadd_vv_nxv1bf16(<vscale x 1 x bfloat> %va, <vsc
 ; ZVFBFMIN:       # %bb.0:
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v11, v10
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v9
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v9, v8
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v8
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v9
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v9, v10, v11
+; ZVFBFMIN-NEXT:    vfmadd.vv v12, v10, v11
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v9
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v12
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_nxv1bf16:
@@ -50,16 +50,14 @@ define <vscale x 1 x bfloat> @vfmadd_vv_nxv1bf16(<vscale x 1 x bfloat> %va, <vsc
 define <vscale x 1 x bfloat> @vfmadd_vf_nxv1bf16(<vscale x 1 x bfloat> %va, <vscale x 1 x bfloat> %vb, bfloat %c) strictfp {
 ; ZVFBFMIN-LABEL: vfmadd_vf_nxv1bf16:
 ; ZVFBFMIN:       # %bb.0:
-; ZVFBFMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFBFMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
+; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v9
-; ZVFBFMIN-NEXT:    vmv.v.x v9, a0
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v11, v8
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v9
+; ZVFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v9, v8
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v12, v11, v10
+; ZVFBFMIN-NEXT:    vfmadd.vf v9, fa5, v10
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v12
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v9
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vf_nxv1bf16:
@@ -78,12 +76,12 @@ define <vscale x 2 x bfloat> @vfmadd_vv_nxv2bf16(<vscale x 2 x bfloat> %va, <vsc
 ; ZVFBFMIN:       # %bb.0:
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v11, v9
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v9, v10
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v8
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v9, v8
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v10
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v10, v9, v11
+; ZVFBFMIN-NEXT:    vfmadd.vv v12, v9, v11
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v10
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v12
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_nxv2bf16:
@@ -98,16 +96,14 @@ define <vscale x 2 x bfloat> @vfmadd_vv_nxv2bf16(<vscale x 2 x bfloat> %va, <vsc
 define <vscale x 2 x bfloat> @vfmadd_vf_nxv2bf16(<vscale x 2 x bfloat> %va, <vscale x 2 x bfloat> %vb, bfloat %c) strictfp {
 ; ZVFBFMIN-LABEL: vfmadd_vf_nxv2bf16:
 ; ZVFBFMIN:       # %bb.0:
-; ZVFBFMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFBFMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
+; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v8
-; ZVFBFMIN-NEXT:    vmv.v.x v8, a0
+; ZVFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v11, v9
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v9, v8
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v9, v11, v10
+; ZVFBFMIN-NEXT:    vfmadd.vf v11, fa5, v10
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v9
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v11
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vf_nxv2bf16:
@@ -126,8 +122,8 @@ define <vscale x 4 x bfloat> @vfmadd_vv_nxv4bf16(<vscale x 4 x bfloat> %va, <vsc
 ; ZVFBFMIN:       # %bb.0:
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v10
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v8
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v14, v9
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v9
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v14, v8
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; ZVFBFMIN-NEXT:    vfmadd.vv v14, v10, v12
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
@@ -146,16 +142,14 @@ define <vscale x 4 x bfloat> @vfmadd_vv_nxv4bf16(<vscale x 4 x bfloat> %va, <vsc
 define <vscale x 4 x bfloat> @vfmadd_vf_nxv4bf16(<vscale x 4 x bfloat> %va, <vscale x 4 x bfloat> %vb, bfloat %c) strictfp {
 ; ZVFBFMIN-LABEL: vfmadd_vf_nxv4bf16:
 ; ZVFBFMIN:       # %bb.0:
-; ZVFBFMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFBFMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v10, v9
-; ZVFBFMIN-NEXT:    vmv.v.x v9, a0
+; ZVFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v8
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v14, v9
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v14, v12, v10
+; ZVFBFMIN-NEXT:    vfmadd.vf v12, fa5, v10
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v14
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v12
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vf_nxv4bf16:
@@ -174,12 +168,12 @@ define <vscale x 8 x bfloat> @vfmadd_vv_nxv8bf16(<vscale x 8 x bfloat> %va, <vsc
 ; ZVFBFMIN:       # %bb.0:
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v8
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v20, v12
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v10
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v20, v10
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v24, v12
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v12, v20, v16
+; ZVFBFMIN-NEXT:    vfmadd.vv v24, v20, v16
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v12
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v24
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_nxv8bf16:
@@ -194,16 +188,14 @@ define <vscale x 8 x bfloat> @vfmadd_vv_nxv8bf16(<vscale x 8 x bfloat> %va, <vsc
 define <vscale x 8 x bfloat> @vfmadd_vf_nxv8bf16(<vscale x 8 x bfloat> %va, <vscale x 8 x bfloat> %vb, bfloat %c) strictfp {
 ; ZVFBFMIN-LABEL: vfmadd_vf_nxv8bf16:
 ; ZVFBFMIN:       # %bb.0:
-; ZVFBFMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFBFMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v12, v8
-; ZVFBFMIN-NEXT:    vmv.v.x v8, a0
+; ZVFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v10
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v20, v8
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v20, v16, v12
+; ZVFBFMIN-NEXT:    vfmadd.vf v16, fa5, v12
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v20
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v16
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vf_nxv8bf16:
@@ -220,30 +212,14 @@ define <vscale x 8 x bfloat> @vfmadd_vf_nxv8bf16(<vscale x 8 x bfloat> %va, <vsc
 define <vscale x 16 x bfloat> @vfmadd_vv_nxv16bf16(<vscale x 16 x bfloat> %va, <vscale x 16 x bfloat> %vb, <vscale x 16 x bfloat> %vc) strictfp {
 ; ZVFBFMIN-LABEL: vfmadd_vv_nxv16bf16:
 ; ZVFBFMIN:       # %bb.0:
-; ZVFBFMIN-NEXT:    addi sp, sp, -16
-; ZVFBFMIN-NEXT:    .cfi_def_cfa_offset 16
-; ZVFBFMIN-NEXT:    csrr a0, vlenb
-; ZVFBFMIN-NEXT:    slli a0, a0, 2
-; ZVFBFMIN-NEXT:    sub sp, sp, a0
-; ZVFBFMIN-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x04, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 4 * vlenb
-; ZVFBFMIN-NEXT:    addi a0, sp, 16
-; ZVFBFMIN-NEXT:    vs4r.v v16, (a0) # vscale x 32-byte Folded Spill
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v12
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v0, v8
-; ZVFBFMIN-NEXT:    addi a0, sp, 16
-; ZVFBFMIN-NEXT:    vl4r.v v8, (a0) # vscale x 32-byte Folded Reload
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v24, v8
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v24, v12
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v0, v16
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v8
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v24, v0, v16
+; ZVFBFMIN-NEXT:    vfmadd.vv v16, v0, v24
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v24
-; ZVFBFMIN-NEXT:    csrr a0, vlenb
-; ZVFBFMIN-NEXT:    slli a0, a0, 2
-; ZVFBFMIN-NEXT:    add sp, sp, a0
-; ZVFBFMIN-NEXT:    .cfi_def_cfa sp, 16
-; ZVFBFMIN-NEXT:    addi sp, sp, 16
-; ZVFBFMIN-NEXT:    .cfi_def_cfa_offset 0
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v16
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vv_nxv16bf16:
@@ -258,16 +234,14 @@ define <vscale x 16 x bfloat> @vfmadd_vv_nxv16bf16(<vscale x 16 x bfloat> %va, <
 define <vscale x 16 x bfloat> @vfmadd_vf_nxv16bf16(<vscale x 16 x bfloat> %va, <vscale x 16 x bfloat> %vb, bfloat %c) strictfp {
 ; ZVFBFMIN-LABEL: vfmadd_vf_nxv16bf16:
 ; ZVFBFMIN:       # %bb.0:
-; ZVFBFMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFBFMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
+; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v12
-; ZVFBFMIN-NEXT:    vmv.v.x v12, a0
+; ZVFBFMIN-NEXT:    fcvt.s.bf16 fa5, fa0
 ; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v24, v8
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v0, v12
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v0, v24, v16
+; ZVFBFMIN-NEXT:    vfmadd.vf v24, fa5, v16
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v0
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v24
 ; ZVFBFMIN-NEXT:    ret
 ;
 ; ZVFBFA-LABEL: vfmadd_vf_nxv16bf16:
@@ -389,69 +363,72 @@ define <vscale x 32 x bfloat> @vfmadd_vf_nxv32bf16(<vscale x 32 x bfloat> %va, <
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
 ; ZVFBFMIN-NEXT:    vmv8r.v v24, v16
 ; ZVFBFMIN-NEXT:    csrr a0, vlenb
-; ZVFBFMIN-NEXT:    li a1, 24
-; ZVFBFMIN-NEXT:    mul a0, a0, a1
+; ZVFBFMIN-NEXT:    slli a0, a0, 4
 ; ZVFBFMIN-NEXT:    add a0, sp, a0
 ; ZVFBFMIN-NEXT:    addi a0, a0, 16
 ; ZVFBFMIN-NEXT:    vs8r.v v16, (a0) # vscale x 64-byte Folded Spill
+; ZVFBFMIN-NEXT:    vmv8r.v v16, v8
+; ZVFBFMIN-NEXT:    addi a0, sp, 16
+; ZVFBFMIN-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
 ; ZVFBFMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v8
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v8, v16
 ; ZVFBFMIN-NEXT:    csrr a1, vlenb
-; ZVFBFMIN-NEXT:    slli a1, a1, 4
+; ZVFBFMIN-NEXT:    li a2, 24
+; ZVFBFMIN-NEXT:    mul a1, a1, a2
 ; ZVFBFMIN-NEXT:    add a1, sp, a1
 ; ZVFBFMIN-NEXT:    addi a1, a1, 16
-; ZVFBFMIN-NEXT:    vs8r.v v16, (a1) # vscale x 64-byte Folded Spill
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v24
+; ZVFBFMIN-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v8, v24
 ; ZVFBFMIN-NEXT:    csrr a1, vlenb
 ; ZVFBFMIN-NEXT:    slli a1, a1, 3
 ; ZVFBFMIN-NEXT:    add a1, sp, a1
 ; ZVFBFMIN-NEXT:    addi a1, a1, 16
-; ZVFBFMIN-NEXT:    vs8r.v v16, (a1) # vscale x 64-byte Folded Spill
+; ZVFBFMIN-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
 ; ZVFBFMIN-NEXT:    vsetvli a1, zero, e16, m8, ta, ma
-; ZVFBFMIN-NEXT:    vmv.v.x v16, a0
-; ZVFBFMIN-NEXT:    addi a0, sp, 16
-; ZVFBFMIN-NEXT:    vs8r.v v16, (a0) # vscale x 64-byte Folded Spill
+; ZVFBFMIN-NEXT:    vmv.v.x v8, a0
 ; ZVFBFMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v24, v16
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v24, v8
 ; ZVFBFMIN-NEXT:    csrr a0, vlenb
-; ZVFBFMIN-NEXT:    slli a0, a0, 4
+; ZVFBFMIN-NEXT:    li a1, 24
+; ZVFBFMIN-NEXT:    mul a0, a0, a1
 ; ZVFBFMIN-NEXT:    add a0, sp, a0
 ; ZVFBFMIN-NEXT:    addi a0, a0, 16
-; ZVFBFMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
+; ZVFBFMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
 ; ZVFBFMIN-NEXT:    csrr a0, vlenb
 ; ZVFBFMIN-NEXT:    slli a0, a0, 3
 ; ZVFBFMIN-NEXT:    add a0, sp, a0
 ; ZVFBFMIN-NEXT:    addi a0, a0, 16
-; ZVFBFMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
+; ZVFBFMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v24, v0, v16
+; ZVFBFMIN-NEXT:    vfmadd.vv v24, v16, v0
+; ZVFBFMIN-NEXT:    addi a0, sp, 16
+; ZVFBFMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v12
-; ZVFBFMIN-NEXT:    csrr a0, vlenb
-; ZVFBFMIN-NEXT:    slli a0, a0, 4
-; ZVFBFMIN-NEXT:    add a0, sp, a0
-; ZVFBFMIN-NEXT:    addi a0, a0, 16
-; ZVFBFMIN-NEXT:    vs8r.v v16, (a0) # vscale x 64-byte Folded Spill
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v0, v20
 ; ZVFBFMIN-NEXT:    csrr a0, vlenb
 ; ZVFBFMIN-NEXT:    li a1, 24
 ; ZVFBFMIN-NEXT:    mul a0, a0, a1
 ; ZVFBFMIN-NEXT:    add a0, sp, a0
 ; ZVFBFMIN-NEXT:    addi a0, a0, 16
-; ZVFBFMIN-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v12
-; ZVFBFMIN-NEXT:    addi a0, sp, 16
-; ZVFBFMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
-; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v8, v4
+; ZVFBFMIN-NEXT:    vs8r.v v0, (a0) # vscale x 64-byte Folded Spill
 ; ZVFBFMIN-NEXT:    csrr a0, vlenb
 ; ZVFBFMIN-NEXT:    slli a0, a0, 4
 ; ZVFBFMIN-NEXT:    add a0, sp, a0
 ; ZVFBFMIN-NEXT:    addi a0, a0, 16
 ; ZVFBFMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v16, v4
+; ZVFBFMIN-NEXT:    vfwcvtbf16.f.f.v v0, v12
+; ZVFBFMIN-NEXT:    csrr a0, vlenb
+; ZVFBFMIN-NEXT:    li a1, 24
+; ZVFBFMIN-NEXT:    mul a0, a0, a1
+; ZVFBFMIN-NEXT:    add a0, sp, a0
+; ZVFBFMIN-NEXT:    addi a0, a0, 16
+; ZVFBFMIN-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFBFMIN-NEXT:    vfmadd.vv v16, v8, v0
+; ZVFBFMIN-NEXT:    vfmadd.vv v0, v16, v8
 ; ZVFBFMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v8, v24
-; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v12, v16
+; ZVFBFMIN-NEXT:    vfncvtbf16.f.f.w v12, v0
 ; ZVFBFMIN-NEXT:    csrr a0, vlenb
 ; ZVFBFMIN-NEXT:    slli a0, a0, 5
 ; ZVFBFMIN-NEXT:    add sp, sp, a0
@@ -482,12 +459,12 @@ define <vscale x 1 x half> @vfmadd_vv_nxv1f16(<vscale x 1 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v10
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v9, v10, v11
+; ZVFHMIN-NEXT:    vfmadd.vv v12, v10, v11
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFHMIN-NEXT:    ret
   %vd = call <vscale x 1 x half> @llvm.experimental.constrained.fma.nxv1f16(<vscale x 1 x half> %va, <vscale x 1 x half> %vb, <vscale x 1 x half> %vc, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <vscale x 1 x half> %vd
@@ -502,16 +479,14 @@ define <vscale x 1 x half> @vfmadd_vf_nxv1f16(<vscale x 1 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmadd_vf_nxv1f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
-; ZVFHMIN-NEXT:    vmv.v.x v9, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v9
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v11, v10
+; ZVFHMIN-NEXT:    vfmadd.vf v9, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 1 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 1 x half> %head, <vscale x 1 x half> poison, <vscale x 1 x i32> zeroinitializer
@@ -530,12 +505,12 @@ define <vscale x 2 x half> @vfmadd_vv_nxv2f16(<vscale x 2 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v10
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v10, v9, v11
+; ZVFHMIN-NEXT:    vfmadd.vv v12, v9, v11
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v10
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFHMIN-NEXT:    ret
   %vd = call <vscale x 2 x half> @llvm.experimental.constrained.fma.nxv2f16(<vscale x 2 x half> %va, <vscale x 2 x half> %vc, <vscale x 2 x half> %vb, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <vscale x 2 x half> %vd
@@ -550,16 +525,14 @@ define <vscale x 2 x half> @vfmadd_vf_nxv2f16(<vscale x 2 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmadd_vf_nxv2f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
+; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
-; ZVFHMIN-NEXT:    vmv.v.x v8, a0
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v9, v11, v10
+; ZVFHMIN-NEXT:    vfmadd.vf v11, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v11
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 2 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 2 x half> %head, <vscale x 2 x half> poison, <vscale x 2 x i32> zeroinitializer
@@ -578,8 +551,8 @@ define <vscale x 4 x half> @vfmadd_vv_nxv4f16(<vscale x 4 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v14, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v14, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfmadd.vv v14, v10, v12
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
@@ -598,16 +571,14 @@ define <vscale x 4 x half> @vfmadd_vf_nxv4f16(<vscale x 4 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmadd_vf_nxv4f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
-; ZVFHMIN-NEXT:    vmv.v.x v9, a0
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v14, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v14, v12, v10
+; ZVFHMIN-NEXT:    vfmadd.vf v12, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v14
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 4 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 4 x half> %head, <vscale x 4 x half> poison, <vscale x 4 x i32> zeroinitializer
@@ -626,12 +597,12 @@ define <vscale x 8 x half> @vfmadd_vv_nxv8f16(<vscale x 8 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v20, v12
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v20, v10
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v12
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v20, v16
+; ZVFHMIN-NEXT:    vfmadd.vv v24, v20, v16
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v24
 ; ZVFHMIN-NEXT:    ret
   %vd = call <vscale x 8 x half> @llvm.experimental.constrained.fma.nxv8f16(<vscale x 8 x half> %vb, <vscale x 8 x half> %vc, <vscale x 8 x half> %va, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <vscale x 8 x half> %vd
@@ -646,16 +617,14 @@ define <vscale x 8 x half> @vfmadd_vf_nxv8f16(<vscale x 8 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmadd_vf_nxv8f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
-; ZVFHMIN-NEXT:    vmv.v.x v8, a0
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v10
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v20, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v20, v16, v12
+; ZVFHMIN-NEXT:    vfmadd.vf v16, fa5, v12
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v20
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v16
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 8 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 8 x half> %head, <vscale x 8 x half> poison, <vscale x 8 x i32> zeroinitializer
@@ -672,30 +641,14 @@ define <vscale x 16 x half> @vfmadd_vv_nxv16f16(<vscale x 16 x half> %va, <vscal
 ;
 ; ZVFHMIN-LABEL: vfmadd_vv_nxv16f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    addi sp, sp, -16
-; ZVFHMIN-NEXT:    .cfi_def_cfa_offset 16
-; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    slli a0, a0, 2
-; ZVFHMIN-NEXT:    sub sp, sp, a0
-; ZVFHMIN-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x04, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 4 * vlenb
-; ZVFHMIN-NEXT:    addi a0, sp, 16
-; ZVFHMIN-NEXT:    vs4r.v v16, (a0) # vscale x 32-byte Folded Spill
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v8
-; ZVFHMIN-NEXT:    addi a0, sp, 16
-; ZVFHMIN-NEXT:    vl4r.v v8, (a0) # vscale x 32-byte Folded Reload
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v12
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v16
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v24, v0, v16
+; ZVFHMIN-NEXT:    vfmadd.vv v16, v0, v24
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v24
-; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    slli a0, a0, 2
-; ZVFHMIN-NEXT:    add sp, sp, a0
-; ZVFHMIN-NEXT:    .cfi_def_cfa sp, 16
-; ZVFHMIN-NEXT:    addi sp, sp, 16
-; ZVFHMIN-NEXT:    .cfi_def_cfa_offset 0
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v16
 ; ZVFHMIN-NEXT:    ret
   %vd = call <vscale x 16 x half> @llvm.experimental.constrained.fma.nxv16f16(<vscale x 16 x half> %vc, <vscale x 16 x half> %va, <vscale x 16 x half> %vb, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret <vscale x 16 x half> %vd
@@ -710,16 +663,14 @@ define <vscale x 16 x half> @vfmadd_vf_nxv16f16(<vscale x 16 x half> %va, <vscal
 ;
 ; ZVFHMIN-LABEL: vfmadd_vf_nxv16f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
-; ZVFHMIN-NEXT:    vmv.v.x v12, a0
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v12
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v0, v24, v16
+; ZVFHMIN-NEXT:    vfmadd.vf v24, fa5, v16
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v0
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v24
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 16 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 16 x half> %head, <vscale x 16 x half> poison, <vscale x 16 x i32> zeroinitializer
@@ -841,69 +792,72 @@ define <vscale x 32 x half> @vfmadd_vf_nxv32f16(<vscale x 32 x half> %va, <vscal
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vmv8r.v v24, v16
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    li a1, 24
-; ZVFHMIN-NEXT:    mul a0, a0, a1
+; ZVFHMIN-NEXT:    slli a0, a0, 4
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
 ; ZVFHMIN-NEXT:    vs8r.v v16, (a0) # vscale x 64-byte Folded Spill
+; ZVFHMIN-NEXT:    vmv8r.v v16, v8
+; ZVFHMIN-NEXT:    addi a0, sp, 16
+; ZVFHMIN-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
 ; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v16
 ; ZVFHMIN-NEXT:    csrr a1, vlenb
-; ZVFHMIN-NEXT:    slli a1, a1, 4
+; ZVFHMIN-NEXT:    li a2, 24
+; ZVFHMIN-NEXT:    mul a1, a1, a2
 ; ZVFHMIN-NEXT:    add a1, sp, a1
 ; ZVFHMIN-NEXT:    addi a1, a1, 16
-; ZVFHMIN-NEXT:    vs8r.v v16, (a1) # vscale x 64-byte Folded Spill
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v24
+; ZVFHMIN-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v24
 ; ZVFHMIN-NEXT:    csrr a1, vlenb
 ; ZVFHMIN-NEXT:    slli a1, a1, 3
 ; ZVFHMIN-NEXT:    add a1, sp, a1
 ; ZVFHMIN-NEXT:    addi a1, a1, 16
-; ZVFHMIN-NEXT:    vs8r.v v16, (a1) # vscale x 64-byte Folded Spill
+; ZVFHMIN-NEXT:    vs8r.v v8, (a1) # vscale x 64-byte Folded Spill
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m8, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v16, a0
-; ZVFHMIN-NEXT:    addi a0, sp, 16
-; ZVFHMIN-NEXT:    vs8r.v v16, (a0) # vscale x 64-byte Folded Spill
+; ZVFHMIN-NEXT:    vmv.v.x v8, a0
 ; ZVFHMIN-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v16
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v8
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    slli a0, a0, 4
+; ZVFHMIN-NEXT:    li a1, 24
+; ZVFHMIN-NEXT:    mul a0, a0, a1
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
+; ZVFHMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
 ; ZVFHMIN-NEXT:    slli a0, a0, 3
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
+; ZVFHMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v24, v0, v16
+; ZVFHMIN-NEXT:    vfmadd.vv v24, v16, v0
+; ZVFHMIN-NEXT:    addi a0, sp, 16
+; ZVFHMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
-; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    slli a0, a0, 4
-; ZVFHMIN-NEXT:    add a0, sp, a0
-; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vs8r.v v16, (a0) # vscale x 64-byte Folded Spill
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v20
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
 ; ZVFHMIN-NEXT:    li a1, 24
 ; ZVFHMIN-NEXT:    mul a0, a0, a1
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
-; ZVFHMIN-NEXT:    addi a0, sp, 16
-; ZVFHMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v4
+; ZVFHMIN-NEXT:    vs8r.v v0, (a0) # vscale x 64-byte Folded Spill
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
 ; ZVFHMIN-NEXT:    slli a0, a0, 4
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
 ; ZVFHMIN-NEXT:    vl8r.v v0, (a0) # vscale x 64-byte Folded Reload
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v4
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v12
+; ZVFHMIN-NEXT:    csrr a0, vlenb
+; ZVFHMIN-NEXT:    li a1, 24
+; ZVFHMIN-NEXT:    mul a0, a0, a1
+; ZVFHMIN-NEXT:    add a0, sp, a0
+; ZVFHMIN-NEXT:    addi a0, a0, 16
+; ZVFHMIN-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v16, v8, v0
+; ZVFHMIN-NEXT:    vfmadd.vv v0, v16, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v24
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v12, v16
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v12, v0
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
 ; ZVFHMIN-NEXT:    slli a0, a0, 5
 ; ZVFHMIN-NEXT:    add sp, sp, a0
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll
index 5f9dc1ae273bf..d4494afa77f21 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfmsub-constrained-sdnode.ll
@@ -22,14 +22,14 @@ define <vscale x 1 x half> @vfmsub_vv_nxv1f16(<vscale x 1 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    lui a0, 8
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
-; ZVFHMIN-NEXT:    vxor.vx v9, v10, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v8
+; ZVFHMIN-NEXT:    vxor.vx v8, v10, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v9, v11, v10
+; ZVFHMIN-NEXT:    vfmadd.vv v12, v11, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFHMIN-NEXT:    ret
   %neg = fneg <vscale x 1 x half> %vc
   %vd = call <vscale x 1 x half> @llvm.experimental.constrained.fma.nxv1f16(<vscale x 1 x half> %va, <vscale x 1 x half> %vb, <vscale x 1 x half> %neg, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -45,18 +45,16 @@ define <vscale x 1 x half> @vfmsub_vf_nxv1f16(<vscale x 1 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmsub_vf_nxv1f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v10, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v8
-; ZVFHMIN-NEXT:    vxor.vx v8, v9, a0
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vxor.vx v9, v9, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v11, v9
+; ZVFHMIN-NEXT:    vfmadd.vf v9, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 1 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 1 x half> %head, <vscale x 1 x half> poison, <vscale x 1 x i32> zeroinitializer
@@ -76,14 +74,14 @@ define <vscale x 2 x half> @vfmsub_vv_nxv2f16(<vscale x 2 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    lui a0, 8
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v10
-; ZVFHMIN-NEXT:    vxor.vx v9, v9, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v8
+; ZVFHMIN-NEXT:    vxor.vx v8, v9, a0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v9, v11, v10
+; ZVFHMIN-NEXT:    vfmadd.vv v12, v11, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFHMIN-NEXT:    ret
   %neg = fneg <vscale x 2 x half> %vb
   %vd = call <vscale x 2 x half> @llvm.experimental.constrained.fma.nxv2f16(<vscale x 2 x half> %va, <vscale x 2 x half> %vc, <vscale x 2 x half> %neg, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -99,18 +97,16 @@ define <vscale x 2 x half> @vfmsub_vf_nxv2f16(<vscale x 2 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmsub_vf_nxv2f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v10, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v11, v9
+; ZVFHMIN-NEXT:    vfmadd.vf v11, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v11
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 2 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 2 x half> %head, <vscale x 2 x half> poison, <vscale x 2 x i32> zeroinitializer
@@ -130,10 +126,10 @@ define <vscale x 4 x half> @vfmsub_vv_nxv4f16(<vscale x 4 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    lui a0, 8
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
-; ZVFHMIN-NEXT:    vxor.vx v8, v10, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v14, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v9
+; ZVFHMIN-NEXT:    vxor.vx v9, v10, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v14, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfmadd.vv v14, v12, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
@@ -153,16 +149,14 @@ define <vscale x 4 x half> @vfmsub_vf_nxv4f16(<vscale x 4 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmsub_vf_nxv4f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v14, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
-; ZVFHMIN-NEXT:    vxor.vx v12, v9, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v12
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v14
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; ZVFHMIN-NEXT:    vxor.vx v9, v9, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v10, v8
+; ZVFHMIN-NEXT:    vfmadd.vf v12, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
 ; ZVFHMIN-NEXT:    ret
@@ -184,12 +178,12 @@ define <vscale x 8 x half> @vfmsub_vv_nxv8f16(<vscale x 8 x half> %va, <vscale x
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    lui a0, 8
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
-; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v20, v10
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v10
+; ZVFHMIN-NEXT:    vxor.vx v14, v8, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v14
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v20, v12
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v20, v16, v12
+; ZVFHMIN-NEXT:    vfmadd.vv v20, v16, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v20
 ; ZVFHMIN-NEXT:    ret
@@ -207,16 +201,14 @@ define <vscale x 8 x half> @vfmsub_vf_nxv8f16(<vscale x 8 x half> %va, <vscale x
 ;
 ; ZVFHMIN-LABEL: vfmsub_vf_nxv8f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v20, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
-; ZVFHMIN-NEXT:    vxor.vx v16, v8, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v16
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v20
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v16, v12, v8
+; ZVFHMIN-NEXT:    vfmadd.vf v16, fa5, v12
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v16
 ; ZVFHMIN-NEXT:    ret
@@ -238,12 +230,12 @@ define <vscale x 16 x half> @vfmsub_vv_nxv16f16(<vscale x 16 x half> %va, <vscal
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    lui a0, 8
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v8
-; ZVFHMIN-NEXT:    vxor.vx v20, v12, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v20
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v16
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v16
+; ZVFHMIN-NEXT:    vxor.vx v12, v12, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v0, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v0, v24, v8
+; ZVFHMIN-NEXT:    vfmadd.vv v0, v24, v16
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v0
 ; ZVFHMIN-NEXT:    ret
@@ -261,16 +253,14 @@ define <vscale x 16 x half> @vfmsub_vf_nxv16f16(<vscale x 16 x half> %va, <vscal
 ;
 ; ZVFHMIN-LABEL: vfmsub_vf_nxv16f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v4, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v8
-; ZVFHMIN-NEXT:    vxor.vx v24, v12, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v24
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v4
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
+; ZVFHMIN-NEXT:    vxor.vx v12, v12, a0
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v8
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v24, v16, v8
+; ZVFHMIN-NEXT:    vfmadd.vf v24, fa5, v16
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v24
 ; ZVFHMIN-NEXT:    ret
@@ -417,26 +407,20 @@ define <vscale x 32 x half> @vfmsub_vf_nxv32f16(<vscale x 32 x half> %va, <vscal
 ; ZVFHMIN-NEXT:    slli a0, a0, 3
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; ZVFHMIN-NEXT:    vl8r.v v16, (a0) # vscale x 64-byte Folded Reload
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v28
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v12
-; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    slli a0, a0, 3
-; ZVFHMIN-NEXT:    add a0, sp, a0
-; ZVFHMIN-NEXT:    addi a0, a0, 16
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v20
+; ZVFHMIN-NEXT:    addi a0, sp, 16
 ; ZVFHMIN-NEXT:    vs8r.v v24, (a0) # vscale x 64-byte Folded Spill
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v24, v12
 ; ZVFHMIN-NEXT:    csrr a0, vlenb
 ; ZVFHMIN-NEXT:    slli a0, a0, 4
 ; ZVFHMIN-NEXT:    add a0, sp, a0
 ; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v8, v28
-; ZVFHMIN-NEXT:    csrr a0, vlenb
-; ZVFHMIN-NEXT:    slli a0, a0, 3
-; ZVFHMIN-NEXT:    add a0, sp, a0
-; ZVFHMIN-NEXT:    addi a0, a0, 16
-; ZVFHMIN-NEXT:    vl8r.v v24, (a0) # vscale x 64-byte Folded Reload
+; ZVFHMIN-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v12
+; ZVFHMIN-NEXT:    addi a0, sp, 16
+; ZVFHMIN-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m8, ta, ma
 ; ZVFHMIN-NEXT:    vfmadd.vv v16, v8, v24
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, m4, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll
index 16ff3b719a927..9fad809adc2c8 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll
@@ -47,19 +47,17 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16(<vscale x 1 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v10, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v9, v11
+; ZVFHMIN-NEXT:    vfmadd.vf v9, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 1 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 1 x half> %head, <vscale x 1 x half> poison, <vscale x 1 x i32> zeroinitializer
@@ -105,19 +103,17 @@ define <vscale x 2 x half> @vfnmsub_vf_nxv2f16(<vscale x 2 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv2f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v10, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a0
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v9, v11
+; ZVFHMIN-NEXT:    vfmadd.vf v9, fa5, v10
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v9
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 2 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 2 x half> %head, <vscale x 2 x half> poison, <vscale x 2 x i32> zeroinitializer
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll
index 68af72da4126f..e59c38956c19c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll
@@ -45,18 +45,16 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16(<vscale x 1 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v10, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
 ; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v9, v11
+; ZVFHMIN-NEXT:    vfmacc.vf v10, fa5, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v10
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 1 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 1 x half> %head, <vscale x 1 x half> poison, <vscale x 1 x i32> zeroinitializer
@@ -99,18 +97,16 @@ define <vscale x 2 x half> @vfnmsub_vf_nxv2f16(<vscale x 2 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv2f16:
 ; ZVFHMIN:       # %bb.0:
-; ZVFHMIN-NEXT:    fmv.x.h a0, fa0
-; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vmv.v.x v10, a0
 ; ZVFHMIN-NEXT:    lui a0, 8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
+; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
+; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v9
 ; ZVFHMIN-NEXT:    vxor.vx v8, v8, a0
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v9, v8
-; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT:    fcvt.s.h fa5, fa0
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT:    vfmadd.vv v12, v9, v11
+; ZVFHMIN-NEXT:    vfmacc.vf v10, fa5, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v12
+; ZVFHMIN-NEXT:    vfncvt.f.f.w v8, v10
 ; ZVFHMIN-NEXT:    ret
   %head = insertelement <vscale x 2 x half> poison, half %c, i32 0
   %splat = shufflevector <vscale x 2 x half> %head, <vscale x 2 x half> poison, <vscale x 2 x i32> zeroinitializer
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfptoi-constrained-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/vfptoi-constrained-sdnode.ll
index efcdc1e24b0b3..33290a8035ce8 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfptoi-constrained-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfptoi-constrained-sdnode.ll
@@ -9,8 +9,7 @@ define <vscale x 1 x i1> @vfptosi_nxv1f16_nxv1i1(<vscale x 1 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, mf8, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 1 x i1> @llvm.experimental.constrained.fptosi.nxv1i1.nxv1f16(<vscale x 1 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 1 x i1> %evec
@@ -21,8 +20,7 @@ define <vscale x 1 x i1> @vfptoui_nxv1f16_nxv1i1(<vscale x 1 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, mf8, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 1 x i1> @llvm.experimental.constrained.fptoui.nxv1i1.nxv1f16(<vscale x 1 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 1 x i1> %evec
@@ -143,8 +141,7 @@ define <vscale x 2 x i1> @vfptosi_nxv2f16_nxv2i1(<vscale x 2 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 2 x i1> @llvm.experimental.constrained.fptosi.nxv2i1.nxv2f16(<vscale x 2 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 2 x i1> %evec
@@ -155,8 +152,7 @@ define <vscale x 2 x i1> @vfptoui_nxv2f16_nxv2i1(<vscale x 2 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 2 x i1> @llvm.experimental.constrained.fptoui.nxv2i1.nxv2f16(<vscale x 2 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 2 x i1> %evec
@@ -255,8 +251,7 @@ define <vscale x 4 x i1> @vfptosi_nxv4f16_nxv4i1(<vscale x 4 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 4 x i1> @llvm.experimental.constrained.fptosi.nxv4i1.nxv4f16(<vscale x 4 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 4 x i1> %evec
@@ -267,8 +262,7 @@ define <vscale x 4 x i1> @vfptoui_nxv4f16_nxv4i1(<vscale x 4 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 4 x i1> @llvm.experimental.constrained.fptoui.nxv4i1.nxv4f16(<vscale x 4 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 4 x i1> %evec
@@ -367,8 +361,7 @@ define <vscale x 8 x i1> @vfptosi_nxv8f16_nxv8i1(<vscale x 8 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 8 x i1> @llvm.experimental.constrained.fptosi.nxv8i1.nxv8f16(<vscale x 8 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 8 x i1> %evec
@@ -379,8 +372,7 @@ define <vscale x 8 x i1> @vfptoui_nxv8f16_nxv8i1(<vscale x 8 x half> %va) strict
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 8 x i1> @llvm.experimental.constrained.fptoui.nxv8i1.nxv8f16(<vscale x 8 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 8 x i1> %evec
@@ -479,8 +471,7 @@ define <vscale x 16 x i1> @vfptosi_nxv16f16_nxv16i1(<vscale x 16 x half> %va) st
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 16 x i1> @llvm.experimental.constrained.fptosi.nxv16i1.nxv16f16(<vscale x 16 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 16 x i1> %evec
@@ -491,8 +482,7 @@ define <vscale x 16 x i1> @vfptoui_nxv16f16_nxv16i1(<vscale x 16 x half> %va) st
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 16 x i1> @llvm.experimental.constrained.fptoui.nxv16i1.nxv16f16(<vscale x 16 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 16 x i1> %evec
@@ -567,8 +557,7 @@ define <vscale x 32 x i1> @vfptosi_nxv32f16_nxv32i1(<vscale x 32 x half> %va) st
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v16, v8
-; CHECK-NEXT:    vand.vi v8, v16, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v16, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 32 x i1> @llvm.experimental.constrained.fptosi.nxv32i1.nxv32f16(<vscale x 32 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 32 x i1> %evec
@@ -579,8 +568,7 @@ define <vscale x 32 x i1> @vfptoui_nxv32f16_nxv32i1(<vscale x 32 x half> %va) st
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e8, m4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v16, v8
-; CHECK-NEXT:    vand.vi v8, v16, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v16, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 32 x i1> @llvm.experimental.constrained.fptoui.nxv32i1.nxv32f16(<vscale x 32 x half> %va, metadata !"fpexcept.strict")
   ret <vscale x 32 x i1> %evec
@@ -633,8 +621,7 @@ define <vscale x 1 x i1> @vfptosi_nxv1f32_nxv1i1(<vscale x 1 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 1 x i1> @llvm.experimental.constrained.fptosi.nxv1i1.nxv1f32(<vscale x 1 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 1 x i1> %evec
@@ -645,8 +632,7 @@ define <vscale x 1 x i1> @vfptoui_nxv1f32_nxv1i1(<vscale x 1 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 1 x i1> @llvm.experimental.constrained.fptoui.nxv1i1.nxv1f32(<vscale x 1 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 1 x i1> %evec
@@ -745,8 +731,7 @@ define <vscale x 2 x i1> @vfptosi_nxv2f32_nxv2i1(<vscale x 2 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 2 x i1> @llvm.experimental.constrained.fptosi.nxv2i1.nxv2f32(<vscale x 2 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 2 x i1> %evec
@@ -757,8 +742,7 @@ define <vscale x 2 x i1> @vfptoui_nxv2f32_nxv2i1(<vscale x 2 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 2 x i1> @llvm.experimental.constrained.fptoui.nxv2i1.nxv2f32(<vscale x 2 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 2 x i1> %evec
@@ -857,8 +841,7 @@ define <vscale x 4 x i1> @vfptosi_nxv4f32_nxv4i1(<vscale x 4 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 4 x i1> @llvm.experimental.constrained.fptosi.nxv4i1.nxv4f32(<vscale x 4 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 4 x i1> %evec
@@ -869,8 +852,7 @@ define <vscale x 4 x i1> @vfptoui_nxv4f32_nxv4i1(<vscale x 4 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 4 x i1> @llvm.experimental.constrained.fptoui.nxv4i1.nxv4f32(<vscale x 4 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 4 x i1> %evec
@@ -969,8 +951,7 @@ define <vscale x 8 x i1> @vfptosi_nxv8f32_nxv8i1(<vscale x 8 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 8 x i1> @llvm.experimental.constrained.fptosi.nxv8i1.nxv8f32(<vscale x 8 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 8 x i1> %evec
@@ -981,8 +962,7 @@ define <vscale x 8 x i1> @vfptoui_nxv8f32_nxv8i1(<vscale x 8 x float> %va) stric
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 8 x i1> @llvm.experimental.constrained.fptoui.nxv8i1.nxv8f32(<vscale x 8 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 8 x i1> %evec
@@ -1081,8 +1061,7 @@ define <vscale x 16 x i1> @vfptosi_nxv16f32_nxv16i1(<vscale x 16 x float> %va) s
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v16, v8
-; CHECK-NEXT:    vand.vi v8, v16, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v16, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 16 x i1> @llvm.experimental.constrained.fptosi.nxv16i1.nxv16f32(<vscale x 16 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 16 x i1> %evec
@@ -1093,8 +1072,7 @@ define <vscale x 16 x i1> @vfptoui_nxv16f32_nxv16i1(<vscale x 16 x float> %va) s
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v16, v8
-; CHECK-NEXT:    vand.vi v8, v16, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v16, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 16 x i1> @llvm.experimental.constrained.fptoui.nxv16i1.nxv16f32(<vscale x 16 x float> %va, metadata !"fpexcept.strict")
   ret <vscale x 16 x i1> %evec
@@ -1171,8 +1149,7 @@ define <vscale x 1 x i1> @vfptosi_nxv1f64_nxv1i1(<vscale x 1 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 1 x i1> @llvm.experimental.constrained.fptosi.nxv1i1.nxv1f64(<vscale x 1 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 1 x i1> %evec
@@ -1183,8 +1160,7 @@ define <vscale x 1 x i1> @vfptoui_nxv1f64_nxv1i1(<vscale x 1 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v9, v8
-; CHECK-NEXT:    vand.vi v8, v9, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v9, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 1 x i1> @llvm.experimental.constrained.fptoui.nxv1i1.nxv1f64(<vscale x 1 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 1 x i1> %evec
@@ -1289,8 +1265,7 @@ define <vscale x 2 x i1> @vfptosi_nxv2f64_nxv2i1(<vscale x 2 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 2 x i1> @llvm.experimental.constrained.fptosi.nxv2i1.nxv2f64(<vscale x 2 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 2 x i1> %evec
@@ -1301,8 +1276,7 @@ define <vscale x 2 x i1> @vfptoui_nxv2f64_nxv2i1(<vscale x 2 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v10, v8
-; CHECK-NEXT:    vand.vi v8, v10, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v10, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 2 x i1> @llvm.experimental.constrained.fptoui.nxv2i1.nxv2f64(<vscale x 2 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 2 x i1> %evec
@@ -1407,8 +1381,7 @@ define <vscale x 4 x i1> @vfptosi_nxv4f64_nxv4i1(<vscale x 4 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 4 x i1> @llvm.experimental.constrained.fptosi.nxv4i1.nxv4f64(<vscale x 4 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 4 x i1> %evec
@@ -1419,8 +1392,7 @@ define <vscale x 4 x i1> @vfptoui_nxv4f64_nxv4i1(<vscale x 4 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m2, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v12, v8
-; CHECK-NEXT:    vand.vi v8, v12, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v12, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 4 x i1> @llvm.experimental.constrained.fptoui.nxv4i1.nxv4f64(<vscale x 4 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 4 x i1> %evec
@@ -1525,8 +1497,7 @@ define <vscale x 8 x i1> @vfptosi_nxv8f64_nxv8i1(<vscale x 8 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.x.f.w v16, v8
-; CHECK-NEXT:    vand.vi v8, v16, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v16, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 8 x i1> @llvm.experimental.constrained.fptosi.nxv8i1.nxv8f64(<vscale x 8 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 8 x i1> %evec
@@ -1537,8 +1508,7 @@ define <vscale x 8 x i1> @vfptoui_nxv8f64_nxv8i1(<vscale x 8 x double> %va) stri
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m4, ta, ma
 ; CHECK-NEXT:    vfncvt.rtz.xu.f.w v16, v8
-; CHECK-NEXT:    vand.vi v8, v16, 1
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-NEXT:    vmsne.vi v0, v16, 0
 ; CHECK-NEXT:    ret
   %evec = call <vscale x 8 x i1> @llvm.experimental.constrained.fptoui.nxv8i1.nxv8f64(<vscale x 8 x double> %va, metadata !"fpexcept.strict")
   ret <vscale x 8 x i1> %evec
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
index 698d47f3be720..e226fe30031bc 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
@@ -170,10 +170,8 @@ define <vscale x 4 x float> @foldable_vfadd(<vscale x 4 x float> %passthru, <vsc
 define <vscale x 4 x float> @unfoldable_constrained_fadd(<vscale x 4 x float> %passthru, <vscale x 4 x float> %x, <vscale x 4 x float> %y, iXLen %vl) strictfp {
 ; CHECK-LABEL: unfoldable_constrained_fadd:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e32, m2, ta, ma
-; CHECK-NEXT:    vfadd.vv v10, v10, v12
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m2, tu, ma
-; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    vfadd.vv v8, v10, v12
 ; CHECK-NEXT:    ret
   %a = call <vscale x 4 x float> @llvm.experimental.constrained.fadd(<vscale x 4 x float> %x, <vscale x 4 x float> %y, metadata !"round.dynamic", metadata !"fpexcept.strict") strictfp
   %b = call <vscale x 4 x float> @llvm.riscv.vmv.v.v.nxv4f32(<vscale x 4 x float> %passthru, <vscale x 4 x float> %a, iXLen %vl) strictfp
diff --git a/llvm/test/CodeGen/RISCV/zfh-half-intrinsics-strict.ll b/llvm/test/CodeGen/RISCV/zfh-half-intrinsics-strict.ll
index eb1848965a9ba..780d8aa6a1d82 100644
--- a/llvm/test/CodeGen/RISCV/zfh-half-intrinsics-strict.ll
+++ b/llvm/test/CodeGen/RISCV/zfh-half-intrinsics-strict.ll
@@ -61,68 +61,88 @@ define half @sqrt_f16(half %a) nounwind strictfp {
 define half @floor_f16(half %a) nounwind strictfp {
 ; RV32IZFH-LABEL: floor_f16:
 ; RV32IZFH:       # %bb.0:
-; RV32IZFH-NEXT:    addi sp, sp, -16
-; RV32IZFH-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFH-NEXT:    call floorf
-; RV32IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFH-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFH-NEXT:    addi sp, sp, 16
+; RV32IZFH-NEXT:    li a0, 25
+; RV32IZFH-NEXT:    slli a0, a0, 10
+; RV32IZFH-NEXT:    fmv.h.x fa5, a0
+; RV32IZFH-NEXT:    fabs.h fa4, fa0
+; RV32IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV32IZFH-NEXT:    beqz a0, .LBB1_2
+; RV32IZFH-NEXT:  # %bb.1:
+; RV32IZFH-NEXT:    fcvt.w.h a0, fa0, rdn
+; RV32IZFH-NEXT:    fcvt.h.w fa5, a0, rdn
+; RV32IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV32IZFH-NEXT:  .LBB1_2:
 ; RV32IZFH-NEXT:    ret
 ;
 ; RV64IZFH-LABEL: floor_f16:
 ; RV64IZFH:       # %bb.0:
-; RV64IZFH-NEXT:    addi sp, sp, -16
-; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFH-NEXT:    call floorf
-; RV64IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    li a0, 25
+; RV64IZFH-NEXT:    slli a0, a0, 10
+; RV64IZFH-NEXT:    fmv.h.x fa5, a0
+; RV64IZFH-NEXT:    fabs.h fa4, fa0
+; RV64IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV64IZFH-NEXT:    beqz a0, .LBB1_2
+; RV64IZFH-NEXT:  # %bb.1:
+; RV64IZFH-NEXT:    fcvt.w.h a0, fa0, rdn
+; RV64IZFH-NEXT:    fcvt.h.w fa5, a0, rdn
+; RV64IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV64IZFH-NEXT:  .LBB1_2:
 ; RV64IZFH-NEXT:    ret
 ;
 ; RV32IZHINX-LABEL: floor_f16:
 ; RV32IZHINX:       # %bb.0:
-; RV32IZHINX-NEXT:    addi sp, sp, -16
-; RV32IZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINX-NEXT:    call floorf
-; RV32IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINX-NEXT:    addi sp, sp, 16
+; RV32IZHINX-NEXT:    li a1, 25
+; RV32IZHINX-NEXT:    slli a1, a1, 10
+; RV32IZHINX-NEXT:    fabs.h a2, a0
+; RV32IZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZHINX-NEXT:    beqz a1, .LBB1_2
+; RV32IZHINX-NEXT:  # %bb.1:
+; RV32IZHINX-NEXT:    fcvt.w.h a1, a0, rdn
+; RV32IZHINX-NEXT:    fcvt.h.w a1, a1, rdn
+; RV32IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZHINX-NEXT:  .LBB1_2:
 ; RV32IZHINX-NEXT:    ret
 ;
 ; RV64IZHINX-LABEL: floor_f16:
 ; RV64IZHINX:       # %bb.0:
-; RV64IZHINX-NEXT:    addi sp, sp, -16
-; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINX-NEXT:    call floorf
-; RV64IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    li a1, 25
+; RV64IZHINX-NEXT:    slli a1, a1, 10
+; RV64IZHINX-NEXT:    fabs.h a2, a0
+; RV64IZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZHINX-NEXT:    beqz a1, .LBB1_2
+; RV64IZHINX-NEXT:  # %bb.1:
+; RV64IZHINX-NEXT:    fcvt.w.h a1, a0, rdn
+; RV64IZHINX-NEXT:    fcvt.h.w a1, a1, rdn
+; RV64IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZHINX-NEXT:  .LBB1_2:
 ; RV64IZHINX-NEXT:    ret
 ;
 ; RV32IZDINXZHINX-LABEL: floor_f16:
 ; RV32IZDINXZHINX:       # %bb.0:
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINX-NEXT:    call floorf
-; RV32IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV32IZDINXZHINX-NEXT:    li a1, 25
+; RV32IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV32IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV32IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZDINXZHINX-NEXT:    beqz a1, .LBB1_2
+; RV32IZDINXZHINX-NEXT:  # %bb.1:
+; RV32IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rdn
+; RV32IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rdn
+; RV32IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZDINXZHINX-NEXT:  .LBB1_2:
 ; RV32IZDINXZHINX-NEXT:    ret
 ;
 ; RV64IZDINXZHINX-LABEL: floor_f16:
 ; RV64IZDINXZHINX:       # %bb.0:
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINX-NEXT:    call floorf
-; RV64IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    li a1, 25
+; RV64IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV64IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV64IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZDINXZHINX-NEXT:    beqz a1, .LBB1_2
+; RV64IZDINXZHINX-NEXT:  # %bb.1:
+; RV64IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rdn
+; RV64IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rdn
+; RV64IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZDINXZHINX-NEXT:  .LBB1_2:
 ; RV64IZDINXZHINX-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.floor.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -131,68 +151,88 @@ define half @floor_f16(half %a) nounwind strictfp {
 define half @ceil_f16(half %a) nounwind strictfp {
 ; RV32IZFH-LABEL: ceil_f16:
 ; RV32IZFH:       # %bb.0:
-; RV32IZFH-NEXT:    addi sp, sp, -16
-; RV32IZFH-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFH-NEXT:    call ceilf
-; RV32IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFH-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFH-NEXT:    addi sp, sp, 16
+; RV32IZFH-NEXT:    li a0, 25
+; RV32IZFH-NEXT:    slli a0, a0, 10
+; RV32IZFH-NEXT:    fmv.h.x fa5, a0
+; RV32IZFH-NEXT:    fabs.h fa4, fa0
+; RV32IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV32IZFH-NEXT:    beqz a0, .LBB2_2
+; RV32IZFH-NEXT:  # %bb.1:
+; RV32IZFH-NEXT:    fcvt.w.h a0, fa0, rup
+; RV32IZFH-NEXT:    fcvt.h.w fa5, a0, rup
+; RV32IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV32IZFH-NEXT:  .LBB2_2:
 ; RV32IZFH-NEXT:    ret
 ;
 ; RV64IZFH-LABEL: ceil_f16:
 ; RV64IZFH:       # %bb.0:
-; RV64IZFH-NEXT:    addi sp, sp, -16
-; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFH-NEXT:    call ceilf
-; RV64IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    li a0, 25
+; RV64IZFH-NEXT:    slli a0, a0, 10
+; RV64IZFH-NEXT:    fmv.h.x fa5, a0
+; RV64IZFH-NEXT:    fabs.h fa4, fa0
+; RV64IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV64IZFH-NEXT:    beqz a0, .LBB2_2
+; RV64IZFH-NEXT:  # %bb.1:
+; RV64IZFH-NEXT:    fcvt.w.h a0, fa0, rup
+; RV64IZFH-NEXT:    fcvt.h.w fa5, a0, rup
+; RV64IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV64IZFH-NEXT:  .LBB2_2:
 ; RV64IZFH-NEXT:    ret
 ;
 ; RV32IZHINX-LABEL: ceil_f16:
 ; RV32IZHINX:       # %bb.0:
-; RV32IZHINX-NEXT:    addi sp, sp, -16
-; RV32IZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINX-NEXT:    call ceilf
-; RV32IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINX-NEXT:    addi sp, sp, 16
+; RV32IZHINX-NEXT:    li a1, 25
+; RV32IZHINX-NEXT:    slli a1, a1, 10
+; RV32IZHINX-NEXT:    fabs.h a2, a0
+; RV32IZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZHINX-NEXT:    beqz a1, .LBB2_2
+; RV32IZHINX-NEXT:  # %bb.1:
+; RV32IZHINX-NEXT:    fcvt.w.h a1, a0, rup
+; RV32IZHINX-NEXT:    fcvt.h.w a1, a1, rup
+; RV32IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZHINX-NEXT:  .LBB2_2:
 ; RV32IZHINX-NEXT:    ret
 ;
 ; RV64IZHINX-LABEL: ceil_f16:
 ; RV64IZHINX:       # %bb.0:
-; RV64IZHINX-NEXT:    addi sp, sp, -16
-; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINX-NEXT:    call ceilf
-; RV64IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    li a1, 25
+; RV64IZHINX-NEXT:    slli a1, a1, 10
+; RV64IZHINX-NEXT:    fabs.h a2, a0
+; RV64IZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZHINX-NEXT:    beqz a1, .LBB2_2
+; RV64IZHINX-NEXT:  # %bb.1:
+; RV64IZHINX-NEXT:    fcvt.w.h a1, a0, rup
+; RV64IZHINX-NEXT:    fcvt.h.w a1, a1, rup
+; RV64IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZHINX-NEXT:  .LBB2_2:
 ; RV64IZHINX-NEXT:    ret
 ;
 ; RV32IZDINXZHINX-LABEL: ceil_f16:
 ; RV32IZDINXZHINX:       # %bb.0:
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINX-NEXT:    call ceilf
-; RV32IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV32IZDINXZHINX-NEXT:    li a1, 25
+; RV32IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV32IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV32IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZDINXZHINX-NEXT:    beqz a1, .LBB2_2
+; RV32IZDINXZHINX-NEXT:  # %bb.1:
+; RV32IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rup
+; RV32IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rup
+; RV32IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZDINXZHINX-NEXT:  .LBB2_2:
 ; RV32IZDINXZHINX-NEXT:    ret
 ;
 ; RV64IZDINXZHINX-LABEL: ceil_f16:
 ; RV64IZDINXZHINX:       # %bb.0:
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINX-NEXT:    call ceilf
-; RV64IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    li a1, 25
+; RV64IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV64IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV64IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZDINXZHINX-NEXT:    beqz a1, .LBB2_2
+; RV64IZDINXZHINX-NEXT:  # %bb.1:
+; RV64IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rup
+; RV64IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rup
+; RV64IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZDINXZHINX-NEXT:  .LBB2_2:
 ; RV64IZDINXZHINX-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.ceil.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -201,68 +241,88 @@ define half @ceil_f16(half %a) nounwind strictfp {
 define half @trunc_f16(half %a) nounwind strictfp {
 ; RV32IZFH-LABEL: trunc_f16:
 ; RV32IZFH:       # %bb.0:
-; RV32IZFH-NEXT:    addi sp, sp, -16
-; RV32IZFH-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFH-NEXT:    call truncf
-; RV32IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFH-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFH-NEXT:    addi sp, sp, 16
+; RV32IZFH-NEXT:    li a0, 25
+; RV32IZFH-NEXT:    slli a0, a0, 10
+; RV32IZFH-NEXT:    fmv.h.x fa5, a0
+; RV32IZFH-NEXT:    fabs.h fa4, fa0
+; RV32IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV32IZFH-NEXT:    beqz a0, .LBB3_2
+; RV32IZFH-NEXT:  # %bb.1:
+; RV32IZFH-NEXT:    fcvt.w.h a0, fa0, rtz
+; RV32IZFH-NEXT:    fcvt.h.w fa5, a0, rtz
+; RV32IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV32IZFH-NEXT:  .LBB3_2:
 ; RV32IZFH-NEXT:    ret
 ;
 ; RV64IZFH-LABEL: trunc_f16:
 ; RV64IZFH:       # %bb.0:
-; RV64IZFH-NEXT:    addi sp, sp, -16
-; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFH-NEXT:    call truncf
-; RV64IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    li a0, 25
+; RV64IZFH-NEXT:    slli a0, a0, 10
+; RV64IZFH-NEXT:    fmv.h.x fa5, a0
+; RV64IZFH-NEXT:    fabs.h fa4, fa0
+; RV64IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV64IZFH-NEXT:    beqz a0, .LBB3_2
+; RV64IZFH-NEXT:  # %bb.1:
+; RV64IZFH-NEXT:    fcvt.w.h a0, fa0, rtz
+; RV64IZFH-NEXT:    fcvt.h.w fa5, a0, rtz
+; RV64IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV64IZFH-NEXT:  .LBB3_2:
 ; RV64IZFH-NEXT:    ret
 ;
 ; RV32IZHINX-LABEL: trunc_f16:
 ; RV32IZHINX:       # %bb.0:
-; RV32IZHINX-NEXT:    addi sp, sp, -16
-; RV32IZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINX-NEXT:    call truncf
-; RV32IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINX-NEXT:    addi sp, sp, 16
+; RV32IZHINX-NEXT:    li a1, 25
+; RV32IZHINX-NEXT:    slli a1, a1, 10
+; RV32IZHINX-NEXT:    fabs.h a2, a0
+; RV32IZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZHINX-NEXT:    beqz a1, .LBB3_2
+; RV32IZHINX-NEXT:  # %bb.1:
+; RV32IZHINX-NEXT:    fcvt.w.h a1, a0, rtz
+; RV32IZHINX-NEXT:    fcvt.h.w a1, a1, rtz
+; RV32IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZHINX-NEXT:  .LBB3_2:
 ; RV32IZHINX-NEXT:    ret
 ;
 ; RV64IZHINX-LABEL: trunc_f16:
 ; RV64IZHINX:       # %bb.0:
-; RV64IZHINX-NEXT:    addi sp, sp, -16
-; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINX-NEXT:    call truncf
-; RV64IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    li a1, 25
+; RV64IZHINX-NEXT:    slli a1, a1, 10
+; RV64IZHINX-NEXT:    fabs.h a2, a0
+; RV64IZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZHINX-NEXT:    beqz a1, .LBB3_2
+; RV64IZHINX-NEXT:  # %bb.1:
+; RV64IZHINX-NEXT:    fcvt.w.h a1, a0, rtz
+; RV64IZHINX-NEXT:    fcvt.h.w a1, a1, rtz
+; RV64IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZHINX-NEXT:  .LBB3_2:
 ; RV64IZHINX-NEXT:    ret
 ;
 ; RV32IZDINXZHINX-LABEL: trunc_f16:
 ; RV32IZDINXZHINX:       # %bb.0:
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINX-NEXT:    call truncf
-; RV32IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV32IZDINXZHINX-NEXT:    li a1, 25
+; RV32IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV32IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV32IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZDINXZHINX-NEXT:    beqz a1, .LBB3_2
+; RV32IZDINXZHINX-NEXT:  # %bb.1:
+; RV32IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rtz
+; RV32IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rtz
+; RV32IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZDINXZHINX-NEXT:  .LBB3_2:
 ; RV32IZDINXZHINX-NEXT:    ret
 ;
 ; RV64IZDINXZHINX-LABEL: trunc_f16:
 ; RV64IZDINXZHINX:       # %bb.0:
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINX-NEXT:    call truncf
-; RV64IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    li a1, 25
+; RV64IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV64IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV64IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZDINXZHINX-NEXT:    beqz a1, .LBB3_2
+; RV64IZDINXZHINX-NEXT:  # %bb.1:
+; RV64IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rtz
+; RV64IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rtz
+; RV64IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZDINXZHINX-NEXT:  .LBB3_2:
 ; RV64IZDINXZHINX-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -271,68 +331,88 @@ define half @trunc_f16(half %a) nounwind strictfp {
 define half @rint_f16(half %a) nounwind strictfp {
 ; RV32IZFH-LABEL: rint_f16:
 ; RV32IZFH:       # %bb.0:
-; RV32IZFH-NEXT:    addi sp, sp, -16
-; RV32IZFH-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFH-NEXT:    call rintf
-; RV32IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFH-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFH-NEXT:    addi sp, sp, 16
+; RV32IZFH-NEXT:    li a0, 25
+; RV32IZFH-NEXT:    slli a0, a0, 10
+; RV32IZFH-NEXT:    fmv.h.x fa5, a0
+; RV32IZFH-NEXT:    fabs.h fa4, fa0
+; RV32IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV32IZFH-NEXT:    beqz a0, .LBB4_2
+; RV32IZFH-NEXT:  # %bb.1:
+; RV32IZFH-NEXT:    fcvt.w.h a0, fa0
+; RV32IZFH-NEXT:    fcvt.h.w fa5, a0
+; RV32IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV32IZFH-NEXT:  .LBB4_2:
 ; RV32IZFH-NEXT:    ret
 ;
 ; RV64IZFH-LABEL: rint_f16:
 ; RV64IZFH:       # %bb.0:
-; RV64IZFH-NEXT:    addi sp, sp, -16
-; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFH-NEXT:    call rintf
-; RV64IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    li a0, 25
+; RV64IZFH-NEXT:    slli a0, a0, 10
+; RV64IZFH-NEXT:    fmv.h.x fa5, a0
+; RV64IZFH-NEXT:    fabs.h fa4, fa0
+; RV64IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV64IZFH-NEXT:    beqz a0, .LBB4_2
+; RV64IZFH-NEXT:  # %bb.1:
+; RV64IZFH-NEXT:    fcvt.w.h a0, fa0
+; RV64IZFH-NEXT:    fcvt.h.w fa5, a0
+; RV64IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV64IZFH-NEXT:  .LBB4_2:
 ; RV64IZFH-NEXT:    ret
 ;
 ; RV32IZHINX-LABEL: rint_f16:
 ; RV32IZHINX:       # %bb.0:
-; RV32IZHINX-NEXT:    addi sp, sp, -16
-; RV32IZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINX-NEXT:    call rintf
-; RV32IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINX-NEXT:    addi sp, sp, 16
+; RV32IZHINX-NEXT:    li a1, 25
+; RV32IZHINX-NEXT:    slli a1, a1, 10
+; RV32IZHINX-NEXT:    fabs.h a2, a0
+; RV32IZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZHINX-NEXT:    beqz a1, .LBB4_2
+; RV32IZHINX-NEXT:  # %bb.1:
+; RV32IZHINX-NEXT:    fcvt.w.h a1, a0
+; RV32IZHINX-NEXT:    fcvt.h.w a1, a1
+; RV32IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZHINX-NEXT:  .LBB4_2:
 ; RV32IZHINX-NEXT:    ret
 ;
 ; RV64IZHINX-LABEL: rint_f16:
 ; RV64IZHINX:       # %bb.0:
-; RV64IZHINX-NEXT:    addi sp, sp, -16
-; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINX-NEXT:    call rintf
-; RV64IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    li a1, 25
+; RV64IZHINX-NEXT:    slli a1, a1, 10
+; RV64IZHINX-NEXT:    fabs.h a2, a0
+; RV64IZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZHINX-NEXT:    beqz a1, .LBB4_2
+; RV64IZHINX-NEXT:  # %bb.1:
+; RV64IZHINX-NEXT:    fcvt.w.h a1, a0
+; RV64IZHINX-NEXT:    fcvt.h.w a1, a1
+; RV64IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZHINX-NEXT:  .LBB4_2:
 ; RV64IZHINX-NEXT:    ret
 ;
 ; RV32IZDINXZHINX-LABEL: rint_f16:
 ; RV32IZDINXZHINX:       # %bb.0:
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINX-NEXT:    call rintf
-; RV32IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV32IZDINXZHINX-NEXT:    li a1, 25
+; RV32IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV32IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV32IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZDINXZHINX-NEXT:    beqz a1, .LBB4_2
+; RV32IZDINXZHINX-NEXT:  # %bb.1:
+; RV32IZDINXZHINX-NEXT:    fcvt.w.h a1, a0
+; RV32IZDINXZHINX-NEXT:    fcvt.h.w a1, a1
+; RV32IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZDINXZHINX-NEXT:  .LBB4_2:
 ; RV32IZDINXZHINX-NEXT:    ret
 ;
 ; RV64IZDINXZHINX-LABEL: rint_f16:
 ; RV64IZDINXZHINX:       # %bb.0:
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINX-NEXT:    call rintf
-; RV64IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    li a1, 25
+; RV64IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV64IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV64IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZDINXZHINX-NEXT:    beqz a1, .LBB4_2
+; RV64IZDINXZHINX-NEXT:  # %bb.1:
+; RV64IZDINXZHINX-NEXT:    fcvt.w.h a1, a0
+; RV64IZDINXZHINX-NEXT:    fcvt.h.w a1, a1
+; RV64IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZDINXZHINX-NEXT:  .LBB4_2:
 ; RV64IZDINXZHINX-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.rint.f16(half %a, metadata !"round.dynamic", metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -411,68 +491,88 @@ define half @nearbyint_f16(half %a) nounwind strictfp {
 define half @round_f16(half %a) nounwind strictfp {
 ; RV32IZFH-LABEL: round_f16:
 ; RV32IZFH:       # %bb.0:
-; RV32IZFH-NEXT:    addi sp, sp, -16
-; RV32IZFH-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFH-NEXT:    call roundf
-; RV32IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFH-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFH-NEXT:    addi sp, sp, 16
+; RV32IZFH-NEXT:    li a0, 25
+; RV32IZFH-NEXT:    slli a0, a0, 10
+; RV32IZFH-NEXT:    fmv.h.x fa5, a0
+; RV32IZFH-NEXT:    fabs.h fa4, fa0
+; RV32IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV32IZFH-NEXT:    beqz a0, .LBB6_2
+; RV32IZFH-NEXT:  # %bb.1:
+; RV32IZFH-NEXT:    fcvt.w.h a0, fa0, rmm
+; RV32IZFH-NEXT:    fcvt.h.w fa5, a0, rmm
+; RV32IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV32IZFH-NEXT:  .LBB6_2:
 ; RV32IZFH-NEXT:    ret
 ;
 ; RV64IZFH-LABEL: round_f16:
 ; RV64IZFH:       # %bb.0:
-; RV64IZFH-NEXT:    addi sp, sp, -16
-; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFH-NEXT:    call roundf
-; RV64IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    li a0, 25
+; RV64IZFH-NEXT:    slli a0, a0, 10
+; RV64IZFH-NEXT:    fmv.h.x fa5, a0
+; RV64IZFH-NEXT:    fabs.h fa4, fa0
+; RV64IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV64IZFH-NEXT:    beqz a0, .LBB6_2
+; RV64IZFH-NEXT:  # %bb.1:
+; RV64IZFH-NEXT:    fcvt.w.h a0, fa0, rmm
+; RV64IZFH-NEXT:    fcvt.h.w fa5, a0, rmm
+; RV64IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV64IZFH-NEXT:  .LBB6_2:
 ; RV64IZFH-NEXT:    ret
 ;
 ; RV32IZHINX-LABEL: round_f16:
 ; RV32IZHINX:       # %bb.0:
-; RV32IZHINX-NEXT:    addi sp, sp, -16
-; RV32IZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINX-NEXT:    call roundf
-; RV32IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINX-NEXT:    addi sp, sp, 16
+; RV32IZHINX-NEXT:    li a1, 25
+; RV32IZHINX-NEXT:    slli a1, a1, 10
+; RV32IZHINX-NEXT:    fabs.h a2, a0
+; RV32IZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZHINX-NEXT:    beqz a1, .LBB6_2
+; RV32IZHINX-NEXT:  # %bb.1:
+; RV32IZHINX-NEXT:    fcvt.w.h a1, a0, rmm
+; RV32IZHINX-NEXT:    fcvt.h.w a1, a1, rmm
+; RV32IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZHINX-NEXT:  .LBB6_2:
 ; RV32IZHINX-NEXT:    ret
 ;
 ; RV64IZHINX-LABEL: round_f16:
 ; RV64IZHINX:       # %bb.0:
-; RV64IZHINX-NEXT:    addi sp, sp, -16
-; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINX-NEXT:    call roundf
-; RV64IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    li a1, 25
+; RV64IZHINX-NEXT:    slli a1, a1, 10
+; RV64IZHINX-NEXT:    fabs.h a2, a0
+; RV64IZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZHINX-NEXT:    beqz a1, .LBB6_2
+; RV64IZHINX-NEXT:  # %bb.1:
+; RV64IZHINX-NEXT:    fcvt.w.h a1, a0, rmm
+; RV64IZHINX-NEXT:    fcvt.h.w a1, a1, rmm
+; RV64IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZHINX-NEXT:  .LBB6_2:
 ; RV64IZHINX-NEXT:    ret
 ;
 ; RV32IZDINXZHINX-LABEL: round_f16:
 ; RV32IZDINXZHINX:       # %bb.0:
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINX-NEXT:    call roundf
-; RV32IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV32IZDINXZHINX-NEXT:    li a1, 25
+; RV32IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV32IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV32IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZDINXZHINX-NEXT:    beqz a1, .LBB6_2
+; RV32IZDINXZHINX-NEXT:  # %bb.1:
+; RV32IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rmm
+; RV32IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rmm
+; RV32IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZDINXZHINX-NEXT:  .LBB6_2:
 ; RV32IZDINXZHINX-NEXT:    ret
 ;
 ; RV64IZDINXZHINX-LABEL: round_f16:
 ; RV64IZDINXZHINX:       # %bb.0:
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINX-NEXT:    call roundf
-; RV64IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    li a1, 25
+; RV64IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV64IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV64IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZDINXZHINX-NEXT:    beqz a1, .LBB6_2
+; RV64IZDINXZHINX-NEXT:  # %bb.1:
+; RV64IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rmm
+; RV64IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rmm
+; RV64IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZDINXZHINX-NEXT:  .LBB6_2:
 ; RV64IZDINXZHINX-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.round.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -481,68 +581,88 @@ define half @round_f16(half %a) nounwind strictfp {
 define half @roundeven_f16(half %a) nounwind strictfp {
 ; RV32IZFH-LABEL: roundeven_f16:
 ; RV32IZFH:       # %bb.0:
-; RV32IZFH-NEXT:    addi sp, sp, -16
-; RV32IZFH-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFH-NEXT:    call roundevenf
-; RV32IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFH-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFH-NEXT:    addi sp, sp, 16
+; RV32IZFH-NEXT:    li a0, 25
+; RV32IZFH-NEXT:    slli a0, a0, 10
+; RV32IZFH-NEXT:    fmv.h.x fa5, a0
+; RV32IZFH-NEXT:    fabs.h fa4, fa0
+; RV32IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV32IZFH-NEXT:    beqz a0, .LBB7_2
+; RV32IZFH-NEXT:  # %bb.1:
+; RV32IZFH-NEXT:    fcvt.w.h a0, fa0, rne
+; RV32IZFH-NEXT:    fcvt.h.w fa5, a0, rne
+; RV32IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV32IZFH-NEXT:  .LBB7_2:
 ; RV32IZFH-NEXT:    ret
 ;
 ; RV64IZFH-LABEL: roundeven_f16:
 ; RV64IZFH:       # %bb.0:
-; RV64IZFH-NEXT:    addi sp, sp, -16
-; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFH-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFH-NEXT:    call roundevenf
-; RV64IZFH-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    li a0, 25
+; RV64IZFH-NEXT:    slli a0, a0, 10
+; RV64IZFH-NEXT:    fmv.h.x fa5, a0
+; RV64IZFH-NEXT:    fabs.h fa4, fa0
+; RV64IZFH-NEXT:    flt.h a0, fa4, fa5
+; RV64IZFH-NEXT:    beqz a0, .LBB7_2
+; RV64IZFH-NEXT:  # %bb.1:
+; RV64IZFH-NEXT:    fcvt.w.h a0, fa0, rne
+; RV64IZFH-NEXT:    fcvt.h.w fa5, a0, rne
+; RV64IZFH-NEXT:    fsgnj.h fa0, fa5, fa0
+; RV64IZFH-NEXT:  .LBB7_2:
 ; RV64IZFH-NEXT:    ret
 ;
 ; RV32IZHINX-LABEL: roundeven_f16:
 ; RV32IZHINX:       # %bb.0:
-; RV32IZHINX-NEXT:    addi sp, sp, -16
-; RV32IZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINX-NEXT:    call roundevenf
-; RV32IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINX-NEXT:    addi sp, sp, 16
+; RV32IZHINX-NEXT:    li a1, 25
+; RV32IZHINX-NEXT:    slli a1, a1, 10
+; RV32IZHINX-NEXT:    fabs.h a2, a0
+; RV32IZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZHINX-NEXT:    beqz a1, .LBB7_2
+; RV32IZHINX-NEXT:  # %bb.1:
+; RV32IZHINX-NEXT:    fcvt.w.h a1, a0, rne
+; RV32IZHINX-NEXT:    fcvt.h.w a1, a1, rne
+; RV32IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZHINX-NEXT:  .LBB7_2:
 ; RV32IZHINX-NEXT:    ret
 ;
 ; RV64IZHINX-LABEL: roundeven_f16:
 ; RV64IZHINX:       # %bb.0:
-; RV64IZHINX-NEXT:    addi sp, sp, -16
-; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINX-NEXT:    call roundevenf
-; RV64IZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    li a1, 25
+; RV64IZHINX-NEXT:    slli a1, a1, 10
+; RV64IZHINX-NEXT:    fabs.h a2, a0
+; RV64IZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZHINX-NEXT:    beqz a1, .LBB7_2
+; RV64IZHINX-NEXT:  # %bb.1:
+; RV64IZHINX-NEXT:    fcvt.w.h a1, a0, rne
+; RV64IZHINX-NEXT:    fcvt.h.w a1, a1, rne
+; RV64IZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZHINX-NEXT:  .LBB7_2:
 ; RV64IZHINX-NEXT:    ret
 ;
 ; RV32IZDINXZHINX-LABEL: roundeven_f16:
 ; RV32IZDINXZHINX:       # %bb.0:
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINX-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINX-NEXT:    call roundevenf
-; RV32IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINX-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV32IZDINXZHINX-NEXT:    li a1, 25
+; RV32IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV32IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV32IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV32IZDINXZHINX-NEXT:    beqz a1, .LBB7_2
+; RV32IZDINXZHINX-NEXT:  # %bb.1:
+; RV32IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rne
+; RV32IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rne
+; RV32IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV32IZDINXZHINX-NEXT:  .LBB7_2:
 ; RV32IZDINXZHINX-NEXT:    ret
 ;
 ; RV64IZDINXZHINX-LABEL: roundeven_f16:
 ; RV64IZDINXZHINX:       # %bb.0:
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZDINXZHINX-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINX-NEXT:    call roundevenf
-; RV64IZDINXZHINX-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    li a1, 25
+; RV64IZDINXZHINX-NEXT:    slli a1, a1, 10
+; RV64IZDINXZHINX-NEXT:    fabs.h a2, a0
+; RV64IZDINXZHINX-NEXT:    flt.h a1, a2, a1
+; RV64IZDINXZHINX-NEXT:    beqz a1, .LBB7_2
+; RV64IZDINXZHINX-NEXT:  # %bb.1:
+; RV64IZDINXZHINX-NEXT:    fcvt.w.h a1, a0, rne
+; RV64IZDINXZHINX-NEXT:    fcvt.h.w a1, a1, rne
+; RV64IZDINXZHINX-NEXT:    fsgnj.h a0, a1, a0
+; RV64IZDINXZHINX-NEXT:  .LBB7_2:
 ; RV64IZDINXZHINX-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.roundeven.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
diff --git a/llvm/test/CodeGen/RISCV/zfhmin-half-intrinsics-strict.ll b/llvm/test/CodeGen/RISCV/zfhmin-half-intrinsics-strict.ll
index 0529819a4f4e2..69a27c86b53a3 100644
--- a/llvm/test/CodeGen/RISCV/zfhmin-half-intrinsics-strict.ll
+++ b/llvm/test/CodeGen/RISCV/zfhmin-half-intrinsics-strict.ll
@@ -73,68 +73,94 @@ define half @sqrt_f16(half %a) nounwind strictfp {
 define half @floor_f16(half %a) nounwind strictfp {
 ; RV32IZFHMIN-LABEL: floor_f16:
 ; RV32IZFHMIN:       # %bb.0:
-; RV32IZFHMIN-NEXT:    addi sp, sp, -16
-; RV32IZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFHMIN-NEXT:    call floorf
-; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFHMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFHMIN-NEXT:    addi sp, sp, 16
+; RV32IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV32IZFHMIN-NEXT:    lui a0, 307200
+; RV32IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV32IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV32IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV32IZFHMIN-NEXT:    beqz a0, .LBB1_2
+; RV32IZFHMIN-NEXT:  # %bb.1:
+; RV32IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rdn
+; RV32IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rdn
+; RV32IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV32IZFHMIN-NEXT:  .LBB1_2:
+; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV32IZFHMIN-NEXT:    ret
 ;
 ; RV64IZFHMIN-LABEL: floor_f16:
 ; RV64IZFHMIN:       # %bb.0:
-; RV64IZFHMIN-NEXT:    addi sp, sp, -16
-; RV64IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFHMIN-NEXT:    call floorf
-; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFHMIN-NEXT:    addi sp, sp, 16
+; RV64IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV64IZFHMIN-NEXT:    lui a0, 307200
+; RV64IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV64IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV64IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV64IZFHMIN-NEXT:    beqz a0, .LBB1_2
+; RV64IZFHMIN-NEXT:  # %bb.1:
+; RV64IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rdn
+; RV64IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rdn
+; RV64IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV64IZFHMIN-NEXT:  .LBB1_2:
+; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV64IZFHMIN-NEXT:    ret
 ;
 ; RV32IZHINXMIN-STRICT-LABEL: floor_f16:
 ; RV32IZHINXMIN-STRICT:       # %bb.0:
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV32IZHINXMIN-STRICT-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    call floorf
+; RV32IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV32IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV32IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV32IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB1_2
+; RV32IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rdn
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rdn
+; RV32IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:  .LBB1_2:
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV32IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV64IZHINXMIN-STRICT-LABEL: floor_f16:
 ; RV64IZHINXMIN-STRICT:       # %bb.0:
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV64IZHINXMIN-STRICT-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    call floorf
+; RV64IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV64IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV64IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV64IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB1_2
+; RV64IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rdn
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rdn
+; RV64IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:  .LBB1_2:
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV64IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV32IZDINXZHINXMIN-LABEL: floor_f16:
 ; RV32IZDINXZHINXMIN:       # %bb.0:
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINXMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    call floorf
+; RV32IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV32IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV32IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV32IZDINXZHINXMIN-NEXT:    beqz a1, .LBB1_2
+; RV32IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rdn
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rdn
+; RV32IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZDINXZHINXMIN-NEXT:  .LBB1_2:
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV32IZDINXZHINXMIN-NEXT:    ret
 ;
 ; RV64IZDINXZHINXMIN-LABEL: floor_f16:
 ; RV64IZDINXZHINXMIN:       # %bb.0:
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    call floorf
+; RV64IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV64IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV64IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV64IZDINXZHINXMIN-NEXT:    beqz a1, .LBB1_2
+; RV64IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rdn
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rdn
+; RV64IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZDINXZHINXMIN-NEXT:  .LBB1_2:
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV64IZDINXZHINXMIN-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.floor.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -143,68 +169,94 @@ define half @floor_f16(half %a) nounwind strictfp {
 define half @ceil_f16(half %a) nounwind strictfp {
 ; RV32IZFHMIN-LABEL: ceil_f16:
 ; RV32IZFHMIN:       # %bb.0:
-; RV32IZFHMIN-NEXT:    addi sp, sp, -16
-; RV32IZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFHMIN-NEXT:    call ceilf
-; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFHMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFHMIN-NEXT:    addi sp, sp, 16
+; RV32IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV32IZFHMIN-NEXT:    lui a0, 307200
+; RV32IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV32IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV32IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV32IZFHMIN-NEXT:    beqz a0, .LBB2_2
+; RV32IZFHMIN-NEXT:  # %bb.1:
+; RV32IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rup
+; RV32IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rup
+; RV32IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV32IZFHMIN-NEXT:  .LBB2_2:
+; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV32IZFHMIN-NEXT:    ret
 ;
 ; RV64IZFHMIN-LABEL: ceil_f16:
 ; RV64IZFHMIN:       # %bb.0:
-; RV64IZFHMIN-NEXT:    addi sp, sp, -16
-; RV64IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFHMIN-NEXT:    call ceilf
-; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFHMIN-NEXT:    addi sp, sp, 16
+; RV64IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV64IZFHMIN-NEXT:    lui a0, 307200
+; RV64IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV64IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV64IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV64IZFHMIN-NEXT:    beqz a0, .LBB2_2
+; RV64IZFHMIN-NEXT:  # %bb.1:
+; RV64IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rup
+; RV64IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rup
+; RV64IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV64IZFHMIN-NEXT:  .LBB2_2:
+; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV64IZFHMIN-NEXT:    ret
 ;
 ; RV32IZHINXMIN-STRICT-LABEL: ceil_f16:
 ; RV32IZHINXMIN-STRICT:       # %bb.0:
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV32IZHINXMIN-STRICT-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    call ceilf
+; RV32IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV32IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV32IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV32IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB2_2
+; RV32IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rup
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rup
+; RV32IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:  .LBB2_2:
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV32IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV64IZHINXMIN-STRICT-LABEL: ceil_f16:
 ; RV64IZHINXMIN-STRICT:       # %bb.0:
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV64IZHINXMIN-STRICT-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    call ceilf
+; RV64IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV64IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV64IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV64IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB2_2
+; RV64IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rup
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rup
+; RV64IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:  .LBB2_2:
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV64IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV32IZDINXZHINXMIN-LABEL: ceil_f16:
 ; RV32IZDINXZHINXMIN:       # %bb.0:
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINXMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    call ceilf
+; RV32IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV32IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV32IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV32IZDINXZHINXMIN-NEXT:    beqz a1, .LBB2_2
+; RV32IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rup
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rup
+; RV32IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZDINXZHINXMIN-NEXT:  .LBB2_2:
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV32IZDINXZHINXMIN-NEXT:    ret
 ;
 ; RV64IZDINXZHINXMIN-LABEL: ceil_f16:
 ; RV64IZDINXZHINXMIN:       # %bb.0:
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    call ceilf
+; RV64IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV64IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV64IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV64IZDINXZHINXMIN-NEXT:    beqz a1, .LBB2_2
+; RV64IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rup
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rup
+; RV64IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZDINXZHINXMIN-NEXT:  .LBB2_2:
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV64IZDINXZHINXMIN-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.ceil.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -213,68 +265,94 @@ define half @ceil_f16(half %a) nounwind strictfp {
 define half @trunc_f16(half %a) nounwind strictfp {
 ; RV32IZFHMIN-LABEL: trunc_f16:
 ; RV32IZFHMIN:       # %bb.0:
-; RV32IZFHMIN-NEXT:    addi sp, sp, -16
-; RV32IZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFHMIN-NEXT:    call truncf
-; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFHMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFHMIN-NEXT:    addi sp, sp, 16
+; RV32IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV32IZFHMIN-NEXT:    lui a0, 307200
+; RV32IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV32IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV32IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV32IZFHMIN-NEXT:    beqz a0, .LBB3_2
+; RV32IZFHMIN-NEXT:  # %bb.1:
+; RV32IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rtz
+; RV32IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rtz
+; RV32IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV32IZFHMIN-NEXT:  .LBB3_2:
+; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV32IZFHMIN-NEXT:    ret
 ;
 ; RV64IZFHMIN-LABEL: trunc_f16:
 ; RV64IZFHMIN:       # %bb.0:
-; RV64IZFHMIN-NEXT:    addi sp, sp, -16
-; RV64IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFHMIN-NEXT:    call truncf
-; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFHMIN-NEXT:    addi sp, sp, 16
+; RV64IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV64IZFHMIN-NEXT:    lui a0, 307200
+; RV64IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV64IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV64IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV64IZFHMIN-NEXT:    beqz a0, .LBB3_2
+; RV64IZFHMIN-NEXT:  # %bb.1:
+; RV64IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rtz
+; RV64IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rtz
+; RV64IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV64IZFHMIN-NEXT:  .LBB3_2:
+; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV64IZFHMIN-NEXT:    ret
 ;
 ; RV32IZHINXMIN-STRICT-LABEL: trunc_f16:
 ; RV32IZHINXMIN-STRICT:       # %bb.0:
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV32IZHINXMIN-STRICT-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    call truncf
+; RV32IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV32IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV32IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV32IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB3_2
+; RV32IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rtz
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rtz
+; RV32IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:  .LBB3_2:
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV32IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV64IZHINXMIN-STRICT-LABEL: trunc_f16:
 ; RV64IZHINXMIN-STRICT:       # %bb.0:
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV64IZHINXMIN-STRICT-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    call truncf
+; RV64IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV64IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV64IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV64IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB3_2
+; RV64IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rtz
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rtz
+; RV64IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:  .LBB3_2:
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV64IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV32IZDINXZHINXMIN-LABEL: trunc_f16:
 ; RV32IZDINXZHINXMIN:       # %bb.0:
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINXMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    call truncf
+; RV32IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV32IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV32IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV32IZDINXZHINXMIN-NEXT:    beqz a1, .LBB3_2
+; RV32IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rtz
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rtz
+; RV32IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZDINXZHINXMIN-NEXT:  .LBB3_2:
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV32IZDINXZHINXMIN-NEXT:    ret
 ;
 ; RV64IZDINXZHINXMIN-LABEL: trunc_f16:
 ; RV64IZDINXZHINXMIN:       # %bb.0:
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    call truncf
+; RV64IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV64IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV64IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV64IZDINXZHINXMIN-NEXT:    beqz a1, .LBB3_2
+; RV64IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rtz
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rtz
+; RV64IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZDINXZHINXMIN-NEXT:  .LBB3_2:
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV64IZDINXZHINXMIN-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -283,68 +361,94 @@ define half @trunc_f16(half %a) nounwind strictfp {
 define half @rint_f16(half %a) nounwind strictfp {
 ; RV32IZFHMIN-LABEL: rint_f16:
 ; RV32IZFHMIN:       # %bb.0:
-; RV32IZFHMIN-NEXT:    addi sp, sp, -16
-; RV32IZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFHMIN-NEXT:    call rintf
-; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFHMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFHMIN-NEXT:    addi sp, sp, 16
+; RV32IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV32IZFHMIN-NEXT:    lui a0, 307200
+; RV32IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV32IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV32IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV32IZFHMIN-NEXT:    beqz a0, .LBB4_2
+; RV32IZFHMIN-NEXT:  # %bb.1:
+; RV32IZFHMIN-NEXT:    fcvt.w.s a0, fa5
+; RV32IZFHMIN-NEXT:    fcvt.s.w fa4, a0
+; RV32IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV32IZFHMIN-NEXT:  .LBB4_2:
+; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV32IZFHMIN-NEXT:    ret
 ;
 ; RV64IZFHMIN-LABEL: rint_f16:
 ; RV64IZFHMIN:       # %bb.0:
-; RV64IZFHMIN-NEXT:    addi sp, sp, -16
-; RV64IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFHMIN-NEXT:    call rintf
-; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFHMIN-NEXT:    addi sp, sp, 16
+; RV64IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV64IZFHMIN-NEXT:    lui a0, 307200
+; RV64IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV64IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV64IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV64IZFHMIN-NEXT:    beqz a0, .LBB4_2
+; RV64IZFHMIN-NEXT:  # %bb.1:
+; RV64IZFHMIN-NEXT:    fcvt.w.s a0, fa5
+; RV64IZFHMIN-NEXT:    fcvt.s.w fa4, a0
+; RV64IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV64IZFHMIN-NEXT:  .LBB4_2:
+; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV64IZFHMIN-NEXT:    ret
 ;
 ; RV32IZHINXMIN-STRICT-LABEL: rint_f16:
 ; RV32IZHINXMIN-STRICT:       # %bb.0:
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV32IZHINXMIN-STRICT-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    call rintf
+; RV32IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV32IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV32IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV32IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB4_2
+; RV32IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1
+; RV32IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:  .LBB4_2:
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV32IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV64IZHINXMIN-STRICT-LABEL: rint_f16:
 ; RV64IZHINXMIN-STRICT:       # %bb.0:
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV64IZHINXMIN-STRICT-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    call rintf
+; RV64IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV64IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV64IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV64IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB4_2
+; RV64IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1
+; RV64IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:  .LBB4_2:
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV64IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV32IZDINXZHINXMIN-LABEL: rint_f16:
 ; RV32IZDINXZHINXMIN:       # %bb.0:
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINXMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    call rintf
+; RV32IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV32IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV32IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV32IZDINXZHINXMIN-NEXT:    beqz a1, .LBB4_2
+; RV32IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1
+; RV32IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZDINXZHINXMIN-NEXT:  .LBB4_2:
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV32IZDINXZHINXMIN-NEXT:    ret
 ;
 ; RV64IZDINXZHINXMIN-LABEL: rint_f16:
 ; RV64IZDINXZHINXMIN:       # %bb.0:
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    call rintf
+; RV64IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV64IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV64IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV64IZDINXZHINXMIN-NEXT:    beqz a1, .LBB4_2
+; RV64IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1
+; RV64IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZDINXZHINXMIN-NEXT:  .LBB4_2:
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV64IZDINXZHINXMIN-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.rint.f16(half %a, metadata !"round.dynamic", metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -423,68 +527,94 @@ define half @nearbyint_f16(half %a) nounwind strictfp {
 define half @round_f16(half %a) nounwind strictfp {
 ; RV32IZFHMIN-LABEL: round_f16:
 ; RV32IZFHMIN:       # %bb.0:
-; RV32IZFHMIN-NEXT:    addi sp, sp, -16
-; RV32IZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFHMIN-NEXT:    call roundf
-; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFHMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFHMIN-NEXT:    addi sp, sp, 16
+; RV32IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV32IZFHMIN-NEXT:    lui a0, 307200
+; RV32IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV32IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV32IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV32IZFHMIN-NEXT:    beqz a0, .LBB6_2
+; RV32IZFHMIN-NEXT:  # %bb.1:
+; RV32IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rmm
+; RV32IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rmm
+; RV32IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV32IZFHMIN-NEXT:  .LBB6_2:
+; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV32IZFHMIN-NEXT:    ret
 ;
 ; RV64IZFHMIN-LABEL: round_f16:
 ; RV64IZFHMIN:       # %bb.0:
-; RV64IZFHMIN-NEXT:    addi sp, sp, -16
-; RV64IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFHMIN-NEXT:    call roundf
-; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFHMIN-NEXT:    addi sp, sp, 16
+; RV64IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV64IZFHMIN-NEXT:    lui a0, 307200
+; RV64IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV64IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV64IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV64IZFHMIN-NEXT:    beqz a0, .LBB6_2
+; RV64IZFHMIN-NEXT:  # %bb.1:
+; RV64IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rmm
+; RV64IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rmm
+; RV64IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV64IZFHMIN-NEXT:  .LBB6_2:
+; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV64IZFHMIN-NEXT:    ret
 ;
 ; RV32IZHINXMIN-STRICT-LABEL: round_f16:
 ; RV32IZHINXMIN-STRICT:       # %bb.0:
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV32IZHINXMIN-STRICT-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    call roundf
+; RV32IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV32IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV32IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV32IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB6_2
+; RV32IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rmm
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rmm
+; RV32IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:  .LBB6_2:
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV32IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV64IZHINXMIN-STRICT-LABEL: round_f16:
 ; RV64IZHINXMIN-STRICT:       # %bb.0:
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV64IZHINXMIN-STRICT-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    call roundf
+; RV64IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV64IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV64IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV64IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB6_2
+; RV64IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rmm
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rmm
+; RV64IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:  .LBB6_2:
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV64IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV32IZDINXZHINXMIN-LABEL: round_f16:
 ; RV32IZDINXZHINXMIN:       # %bb.0:
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINXMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    call roundf
+; RV32IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV32IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV32IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV32IZDINXZHINXMIN-NEXT:    beqz a1, .LBB6_2
+; RV32IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rmm
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rmm
+; RV32IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZDINXZHINXMIN-NEXT:  .LBB6_2:
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV32IZDINXZHINXMIN-NEXT:    ret
 ;
 ; RV64IZDINXZHINXMIN-LABEL: round_f16:
 ; RV64IZDINXZHINXMIN:       # %bb.0:
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    call roundf
+; RV64IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV64IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV64IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV64IZDINXZHINXMIN-NEXT:    beqz a1, .LBB6_2
+; RV64IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rmm
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rmm
+; RV64IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZDINXZHINXMIN-NEXT:  .LBB6_2:
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV64IZDINXZHINXMIN-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.round.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
@@ -493,68 +623,94 @@ define half @round_f16(half %a) nounwind strictfp {
 define half @roundeven_f16(half %a) nounwind strictfp {
 ; RV32IZFHMIN-LABEL: roundeven_f16:
 ; RV32IZFHMIN:       # %bb.0:
-; RV32IZFHMIN-NEXT:    addi sp, sp, -16
-; RV32IZFHMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
-; RV32IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV32IZFHMIN-NEXT:    call roundevenf
-; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV32IZFHMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZFHMIN-NEXT:    addi sp, sp, 16
+; RV32IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV32IZFHMIN-NEXT:    lui a0, 307200
+; RV32IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV32IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV32IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV32IZFHMIN-NEXT:    beqz a0, .LBB7_2
+; RV32IZFHMIN-NEXT:  # %bb.1:
+; RV32IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rne
+; RV32IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rne
+; RV32IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV32IZFHMIN-NEXT:  .LBB7_2:
+; RV32IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV32IZFHMIN-NEXT:    ret
 ;
 ; RV64IZFHMIN-LABEL: roundeven_f16:
 ; RV64IZFHMIN:       # %bb.0:
-; RV64IZFHMIN-NEXT:    addi sp, sp, -16
-; RV64IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
-; RV64IZFHMIN-NEXT:    fcvt.s.h fa0, fa0
-; RV64IZFHMIN-NEXT:    call roundevenf
-; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa0
-; RV64IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZFHMIN-NEXT:    addi sp, sp, 16
+; RV64IZFHMIN-NEXT:    fcvt.s.h fa5, fa0
+; RV64IZFHMIN-NEXT:    lui a0, 307200
+; RV64IZFHMIN-NEXT:    fmv.w.x fa4, a0
+; RV64IZFHMIN-NEXT:    fabs.s fa3, fa5
+; RV64IZFHMIN-NEXT:    flt.s a0, fa3, fa4
+; RV64IZFHMIN-NEXT:    beqz a0, .LBB7_2
+; RV64IZFHMIN-NEXT:  # %bb.1:
+; RV64IZFHMIN-NEXT:    fcvt.w.s a0, fa5, rne
+; RV64IZFHMIN-NEXT:    fcvt.s.w fa4, a0, rne
+; RV64IZFHMIN-NEXT:    fsgnj.s fa5, fa4, fa5
+; RV64IZFHMIN-NEXT:  .LBB7_2:
+; RV64IZFHMIN-NEXT:    fcvt.h.s fa0, fa5
 ; RV64IZFHMIN-NEXT:    ret
 ;
 ; RV32IZHINXMIN-STRICT-LABEL: roundeven_f16:
 ; RV32IZHINXMIN-STRICT:       # %bb.0:
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV32IZHINXMIN-STRICT-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    call roundevenf
+; RV32IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV32IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV32IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV32IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB7_2
+; RV32IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rne
+; RV32IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rne
+; RV32IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZHINXMIN-STRICT-NEXT:  .LBB7_2:
 ; RV32IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV32IZHINXMIN-STRICT-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV32IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV64IZHINXMIN-STRICT-LABEL: roundeven_f16:
 ; RV64IZHINXMIN-STRICT:       # %bb.0:
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, -16
-; RV64IZHINXMIN-STRICT-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.h a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    call roundevenf
+; RV64IZHINXMIN-STRICT-NEXT:    lui a1, 307200
+; RV64IZHINXMIN-STRICT-NEXT:    fabs.s a2, a0
+; RV64IZHINXMIN-STRICT-NEXT:    flt.s a1, a2, a1
+; RV64IZHINXMIN-STRICT-NEXT:    beqz a1, .LBB7_2
+; RV64IZHINXMIN-STRICT-NEXT:  # %bb.1:
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.w.s a1, a0, rne
+; RV64IZHINXMIN-STRICT-NEXT:    fcvt.s.w a1, a1, rne
+; RV64IZHINXMIN-STRICT-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZHINXMIN-STRICT-NEXT:  .LBB7_2:
 ; RV64IZHINXMIN-STRICT-NEXT:    fcvt.h.s a0, a0
-; RV64IZHINXMIN-STRICT-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZHINXMIN-STRICT-NEXT:    addi sp, sp, 16
 ; RV64IZHINXMIN-STRICT-NEXT:    ret
 ;
 ; RV32IZDINXZHINXMIN-LABEL: roundeven_f16:
 ; RV32IZDINXZHINXMIN:       # %bb.0:
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV32IZDINXZHINXMIN-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    call roundevenf
+; RV32IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV32IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV32IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV32IZDINXZHINXMIN-NEXT:    beqz a1, .LBB7_2
+; RV32IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rne
+; RV32IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rne
+; RV32IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV32IZDINXZHINXMIN-NEXT:  .LBB7_2:
 ; RV32IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV32IZDINXZHINXMIN-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
-; RV32IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV32IZDINXZHINXMIN-NEXT:    ret
 ;
 ; RV64IZDINXZHINXMIN-LABEL: roundeven_f16:
 ; RV64IZDINXZHINXMIN:       # %bb.0:
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
-; RV64IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.h a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    call roundevenf
+; RV64IZDINXZHINXMIN-NEXT:    lui a1, 307200
+; RV64IZDINXZHINXMIN-NEXT:    fabs.s a2, a0
+; RV64IZDINXZHINXMIN-NEXT:    flt.s a1, a2, a1
+; RV64IZDINXZHINXMIN-NEXT:    beqz a1, .LBB7_2
+; RV64IZDINXZHINXMIN-NEXT:  # %bb.1:
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.w.s a1, a0, rne
+; RV64IZDINXZHINXMIN-NEXT:    fcvt.s.w a1, a1, rne
+; RV64IZDINXZHINXMIN-NEXT:    fsgnj.s a0, a1, a0
+; RV64IZDINXZHINXMIN-NEXT:  .LBB7_2:
 ; RV64IZDINXZHINXMIN-NEXT:    fcvt.h.s a0, a0
-; RV64IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
-; RV64IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
 ; RV64IZDINXZHINXMIN-NEXT:    ret
   %1 = call half @llvm.experimental.constrained.roundeven.f16(half %a, metadata !"fpexcept.strict") strictfp
   ret half %1
diff --git a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/constrained-fmuladd.ll b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/constrained-fmuladd.ll
index b2d4f570afbd9..e335a1bca2b72 100644
--- a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/constrained-fmuladd.ll
+++ b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/constrained-fmuladd.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s
 ; TODO: re-enable validator FPRoundingMode is placed correctly
 ; RUNx: %if spirv-tools %{ llc -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %}
@@ -18,7 +19,7 @@ entry:
   ret void
 }
 
-; CHECK: OpFMul %[[#]] %[[#]] %[[#]] 
+; CHECK: OpFMul %[[#]] %[[#]] %[[#]]
 ; CHECK: OpFAdd %[[#]] %[[#]] %[[#]]
 define spir_kernel void @test_f64(double %a) {
 entry:
@@ -63,3 +64,5 @@ declare double @llvm.experimental.constrained.fmuladd.f64(double, double, double
 declare <2 x float> @llvm.experimental.constrained.fmuladd.v2f32(<2 x float>, <2 x float>, <2 x float>, metadata, metadata)
 declare <4 x float> @llvm.experimental.constrained.fmuladd.v4f32(<4 x float>, <4 x float>, <4 x float>, metadata, metadata)
 declare <2 x double> @llvm.experimental.constrained.fmuladd.v2f64(<2 x double>, <2 x double>, <2 x double>, metadata, metadata)
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-alias.ll b/llvm/test/CodeGen/SystemZ/fp-strict-alias.ll
index 32d05b7ebb160..5fd4671d68860 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-alias.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-alias.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; Verify that strict FP operations are not rescheduled
 ;
 ; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 | FileCheck %s
@@ -16,11 +17,12 @@ declare void @bar()
 
 define void @f1(float %f1, float %f2, ptr %ptr1, ptr %ptr2) {
 ; CHECK-LABEL: f1:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.sqrt.f32(float %f1)
   %sqrt2 = call float @llvm.sqrt.f32(float %f2)
@@ -33,11 +35,12 @@ define void @f1(float %f1, float %f2, ptr %ptr1, ptr %ptr2) {
 
 define void @f2(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f2:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -56,11 +59,12 @@ define void @f2(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f3(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f3:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -79,11 +83,12 @@ define void @f3(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f4(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f4:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -107,11 +112,12 @@ define void @f4(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f5(float %f1, float %f2, ptr %ptr1, ptr %ptr2) {
 ; CHECK-LABEL: f5:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.sqrt.f32(float %f1)
   %sqrt2 = call float @llvm.sqrt.f32(float %f2)
@@ -124,11 +130,12 @@ define void @f5(float %f1, float %f2, ptr %ptr1, ptr %ptr2) {
 
 define void @f6(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f6:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -147,11 +154,12 @@ define void @f6(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f7(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f7:
-; CHECK: sqebr
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -170,11 +178,12 @@ define void @f7(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f8(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f8:
-; CHECK: sqebr
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -196,11 +205,14 @@ define void @f8(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f9(float %f1, float %f2, ptr %ptr1, ptr %ptr2) {
 ; CHECK-LABEL: f9:
-; CHECK: sqebr
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    lhi %r0, 0
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    sfpc %r0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.sqrt.f32(float %f1)
   %sqrt2 = call float @llvm.sqrt.f32(float %f2)
@@ -215,11 +227,14 @@ define void @f9(float %f1, float %f2, ptr %ptr1, ptr %ptr2) {
 
 define void @f10(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f10:
-; CHECK: sqebr
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    lhi %r0, 0
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    sfpc %r0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -240,11 +255,14 @@ define void @f10(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f11(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f11:
-; CHECK: sqebr
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    lhi %r0, 0
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    sfpc %r0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -265,11 +283,14 @@ define void @f11(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f12(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f12:
-; CHECK: sqebr
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: ste
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    lhi %r0, 0
+; CHECK-NEXT:    sqebr %f1, %f2
+; CHECK-NEXT:    sfpc %r0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    ste %f1, 0(%r3)
+; CHECK-NEXT:    br %r14
 
   %sqrt1 = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -293,8 +314,8 @@ define void @f12(float %f1, float %f2, ptr %ptr1, ptr %ptr2) #0 {
 
 define void @f13(float %f1) {
 ; CHECK-LABEL: f13:
-; CHECK-NOT: sqeb
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    br %r14
 
   %sqrt = call float @llvm.sqrt.f32(float %f1)
 
@@ -303,8 +324,8 @@ define void @f13(float %f1) {
 
 define void @f14(float %f1) #0 {
 ; CHECK-LABEL: f14:
-; CHECK-NOT: sqeb
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    br %r14
 
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -316,8 +337,8 @@ define void @f14(float %f1) #0 {
 
 define void @f15(float %f1) #0 {
 ; CHECK-LABEL: f15:
-; CHECK-NOT: sqeb
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    br %r14
 
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -329,8 +350,8 @@ define void @f15(float %f1) #0 {
 
 define void @f16(float %f1) #0 {
 ; CHECK-LABEL: f16:
-; CHECK: sqebr
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    br %r14
 
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %f1,
@@ -346,9 +367,10 @@ define void @f16(float %f1) #0 {
 
 define void @f17(float %in, ptr %out) #0 {
 ; CHECK-LABEL: f17:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: jg bar
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    jg bar at PLT
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %in,
                         metadata !"round.dynamic",
@@ -360,9 +382,10 @@ define void @f17(float %in, ptr %out) #0 {
 
 define void @f18(float %in, ptr %out) #0 {
 ; CHECK-LABEL: f18:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: jg bar
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    jg bar at PLT
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %in,
                         metadata !"round.dynamic",
@@ -374,9 +397,10 @@ define void @f18(float %in, ptr %out) #0 {
 
 define void @f19(float %in, ptr %out) #0 {
 ; CHECK-LABEL: f19:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: jg bar
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    jg bar at PLT
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %in,
                         metadata !"round.dynamic",
@@ -388,9 +412,10 @@ define void @f19(float %in, ptr %out) #0 {
 
 define void @f20(float %in, ptr %out) #0 {
 ; CHECK-LABEL: f20:
-; CHECK: sqebr
-; CHECK: ste
-; CHECK: jg bar
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sqebr %f0, %f0
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    jg bar at PLT
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(
                         float %in,
                         metadata !"round.dynamic",
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-cmp-04.ll b/llvm/test/CodeGen/SystemZ/fp-strict-cmp-04.ll
index dfefc43c02bed..d0f6940b84709 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-cmp-04.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-cmp-04.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; Test that floating-point strict compares are omitted if CC already has the
 ; right value.
 ;
@@ -13,9 +14,13 @@ declare float @llvm.fabs.f32(float %f)
 ; Test addition followed by EQ, which can use the CC result of the addition.
 define float @f1(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f1:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: ber %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    ber %r14
+; CHECK-NEXT:  .LBB0_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -38,9 +43,13 @@ exit:
 ; ...and again with LT.
 define float @f2(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f2:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB1_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -63,9 +72,13 @@ exit:
 ; ...and again with GT.
 define float @f3(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f3:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: bhr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    bhr %r14
+; CHECK-NEXT:  .LBB2_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -88,9 +101,13 @@ exit:
 ; ...and again with UEQ.
 define float @f4(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f4:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: bnlhr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    bnlhr %r14
+; CHECK-NEXT:  .LBB3_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -113,9 +130,13 @@ exit:
 ; Subtraction also provides a zero-based CC value.
 define float @f5(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f5:
-; CHECK: seb %f0, 0(%r2)
-; CHECK-NEXT: bnher %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    seb %f0, 0(%r2)
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    bnher %r14
+; CHECK-NEXT:  .LBB4_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %cur = load float, ptr %dest
   %res = call float @llvm.experimental.constrained.fsub.f32(
@@ -139,10 +160,13 @@ exit:
 ; Test the result of LOAD POSITIVE.  We cannot omit the LTEBR.
 define float @f6(float %dummy, float %a, ptr %dest) #0 {
 ; CHECK-LABEL: f6:
-; CHECK: lpdfr %f0, %f2
-; CHECK-NEXT: ltebr %f1, %f0
-; CHECK-NEXT: bhr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lpdfr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    bhr %r14
+; CHECK-NEXT:  .LBB5_1: # %store
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.fabs.f32(float %a) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f32(
@@ -162,10 +186,13 @@ exit:
 ; Test the result of LOAD NEGATIVE.  We cannot omit the LTEBR.
 define float @f7(float %dummy, float %a, ptr %dest) #0 {
 ; CHECK-LABEL: f7:
-; CHECK: lndfr %f0, %f2
-; CHECK-NEXT: ltebr %f1, %f0
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lndfr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB6_1: # %store
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %abs = call float @llvm.fabs.f32(float %a) #0
   %res = fneg float %abs
@@ -186,10 +213,13 @@ exit:
 ; Test the result of LOAD COMPLEMENT.  We cannot omit the LTEBR.
 define float @f8(float %dummy, float %a, ptr %dest) #0 {
 ; CHECK-LABEL: f8:
-; CHECK: lcdfr %f0, %f2
-; CHECK-NEXT: ltebr %f1, %f0
-; CHECK-NEXT: bler %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lcdfr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    bler %r14
+; CHECK-NEXT:  .LBB7_1: # %store
+; CHECK-NEXT:    ste %f0, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = fneg float %a
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f32(
@@ -209,10 +239,13 @@ exit:
 ; Multiplication (for example) does not modify CC.
 define float @f9(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f9:
-; CHECK: meebr %f0, %f2
-; CHECK-NEXT: ltebr %f1, %f0
-; CHECK-NEXT: blhr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    meebr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    blhr %r14
+; CHECK-NEXT:  .LBB8_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fmul.f32(
                         float %a, float %b,
@@ -236,11 +269,14 @@ exit:
 ; a non-CC-setting instruction.
 define float @f10(float %a, float %b, float %c, ptr %dest) #0 {
 ; CHECK-LABEL: f10:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: debr %f0, %f4
-; CHECK-NEXT: ltebr %f1, %f0
-; CHECK-NEXT: bner %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    debr %f0, %f4
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    bner %r14
+; CHECK-NEXT:  .LBB9_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -268,12 +304,15 @@ exit:
 ; compare input.
 define float @f11(float %a, float %b, float %c, ptr %dest1, ptr %dest2) #0 {
 ; CHECK-LABEL: f11:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: sebr %f4, %f0
-; CHECK-DAG: ste %f4, 0(%r2)
-; CHECK-DAG: ltebr %f1, %f0
-; CHECK-NEXT: ber %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    sebr %f4, %f0
+; CHECK-NEXT:    ste %f4, 0(%r2)
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    ber %r14
+; CHECK-NEXT:  .LBB10_1: # %store
+; CHECK-NEXT:    ste %f4, 0(%r3)
+; CHECK-NEXT:    br %r14
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -300,25 +339,41 @@ exit:
 
 define half @f12_half(half %dummy, half %val) #0 {
 ; CHECK-LABEL: f12_half:
-; CHECK:      ler %f9, %f2
-; CHECK-NEXT: ler %f0, %f2
-; CHECK-NEXT: #APP
-; CHECK-NEXT: ler %f8, %f0
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: lzer %f0
-; CHECK-NEXT: brasl %r14, __extendhfsf2 at PLT
-; CHECK-NEXT: ler %f10, %f0
-; CHECK-NEXT: ler %f0, %f9
-; CHECK-NEXT: brasl %r14, __extendhfsf2 at PLT
-; CHECK-NEXT: cebr %f0, %f10
-; CHECK-NEXT: jl .LBB11_2
-; CHECK-NEXT:# %bb.1:        # %store
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT:.LBB11_2:        # %exit
-; CHECK-NEXT: ler %f0, %f8
-; CHECK:      br  %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -184
+; CHECK-NEXT:    .cfi_def_cfa_offset 344
+; CHECK-NEXT:    std %f8, 176(%r15) # 8-byte Spill
+; CHECK-NEXT:    std %f9, 168(%r15) # 8-byte Spill
+; CHECK-NEXT:    std %f10, 160(%r15) # 8-byte Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    .cfi_offset %f9, -176
+; CHECK-NEXT:    .cfi_offset %f10, -184
+; CHECK-NEXT:    ler %f9, %f2
+; CHECK-NEXT:    ler %f0, %f2
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    lzer %f0
+; CHECK-NEXT:    brasl %r14, __extendhfsf2 at PLT
+; CHECK-NEXT:    ler %f10, %f0
+; CHECK-NEXT:    ler %f0, %f9
+; CHECK-NEXT:    brasl %r14, __extendhfsf2 at PLT
+; CHECK-NEXT:    cebr %f0, %f10
+; CHECK-NEXT:    jl .LBB11_2
+; CHECK-NEXT:  # %bb.1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .LBB11_2: # %exit
+; CHECK-NEXT:    ler %f0, %f8
+; CHECK-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
+; CHECK-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
+; CHECK-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
+; CHECK-NEXT:    lmg %r14, %r15, 296(%r15)
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call half asm "ler $0, $1", "=f,{f0}"(half %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f16(
@@ -338,13 +393,18 @@ exit:
 ; Test that LER does not get converted to LTEBR as %f0 is live after it.
 define float @f12(float %dummy, float %val) #0 {
 ; CHECK-LABEL: f12:
-; CHECK: ler %f0, %f2
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah %f0
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: ltebr %f1, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    ler %f0, %f2
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    %f0 = blah %f0
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    ltebr %f1, %f2
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB12_1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call float asm "$0 = blah $1", "=f,{f0}"(float %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f32(
@@ -364,13 +424,18 @@ exit:
 ; Test that LDR does not get converted to LTDBR as %f0 is live after it.
 define double @f13(double %dummy, double %val) #0 {
 ; CHECK-LABEL: f13:
-; CHECK: ldr %f0, %f2
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah %f0
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: ltdbr %f1, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    ldr %f0, %f2
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah %f0
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    ltdbr %f1, %f2
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB13_1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call double asm "blah $1", "=f,{f0}"(double %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(
@@ -390,16 +455,25 @@ exit:
 ; Test that LXR does not get converted to LTXBR as %f4 is live after it.
 define void @f14(ptr %ptr1, ptr %ptr2) #0 {
 ; CHECK-LABEL: f14:
-; CHECK: lxr
-; CHECK-NEXT: dxbr
-; CHECK-NEXT: std
-; CHECK-NEXT: std
-; CHECK-NEXT: mxbr
-; CHECK-NEXT: std
-; CHECK-NEXT: std
-; CHECK-NEXT: ltxbr
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    ld %f0, 0(%r2)
+; CHECK-NEXT:    ld %f2, 8(%r2)
+; CHECK-NEXT:    ld %f1, 0(%r3)
+; CHECK-NEXT:    ld %f3, 8(%r3)
+; CHECK-NEXT:    lxr %f4, %f0
+; CHECK-NEXT:    dxbr %f4, %f1
+; CHECK-NEXT:    std %f4, 0(%r2)
+; CHECK-NEXT:    std %f6, 8(%r2)
+; CHECK-NEXT:    mxbr %f1, %f0
+; CHECK-NEXT:    std %f1, 0(%r3)
+; CHECK-NEXT:    std %f3, 8(%r3)
+; CHECK-NEXT:    ltxbr %f0, %f0
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB14_1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    br %r14
 entry:
   %val1 = load fp128, ptr %ptr1
   %val2 = load fp128, ptr %ptr2
@@ -423,25 +497,41 @@ exit:
 
 define half @f15_half(half %val, half %dummy) #0 {
 ; CHECK-LABEL: f15_half:
-; CHECK:      ler %f9, %f0
-; CHECK-NEXT: ler %f2, %f0
-; CHECK-NEXT: #APP
-; CHECK-NEXT: ler %f8, %f2
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: lzer %f0
-; CHECK-NEXT: brasl %r14, __extendhfsf2 at PLT
-; CHECK-NEXT: ler %f10, %f0
-; CHECK-NEXT: ler %f0, %f9
-; CHECK-NEXT: brasl %r14, __extendhfsf2 at PLT
-; CHECK-NEXT: cebr %f0, %f10
-; CHECK-NEXT: jl .LBB15_2
-; CHECK-NEXT:# %bb.1:          # %store
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT:.LBB15_2:         # %exit
-; CHECK-NEXT: ler %f0, %f8
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -184
+; CHECK-NEXT:    .cfi_def_cfa_offset 344
+; CHECK-NEXT:    std %f8, 176(%r15) # 8-byte Spill
+; CHECK-NEXT:    std %f9, 168(%r15) # 8-byte Spill
+; CHECK-NEXT:    std %f10, 160(%r15) # 8-byte Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    .cfi_offset %f9, -176
+; CHECK-NEXT:    .cfi_offset %f10, -184
+; CHECK-NEXT:    ler %f9, %f0
+; CHECK-NEXT:    ler %f2, %f0
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    ler %f8, %f2
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    lzer %f0
+; CHECK-NEXT:    brasl %r14, __extendhfsf2 at PLT
+; CHECK-NEXT:    ler %f10, %f0
+; CHECK-NEXT:    ler %f0, %f9
+; CHECK-NEXT:    brasl %r14, __extendhfsf2 at PLT
+; CHECK-NEXT:    cebr %f0, %f10
+; CHECK-NEXT:    jl .LBB15_2
+; CHECK-NEXT:  # %bb.1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .LBB15_2: # %exit
+; CHECK-NEXT:    ler %f0, %f8
+; CHECK-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
+; CHECK-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
+; CHECK-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
+; CHECK-NEXT:    lmg %r14, %r15, 296(%r15)
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call half asm "ler $0, $1", "=f,{f2}"(half %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f16(
@@ -462,13 +552,18 @@ exit:
 ; we need, but cannot convert the LER.
 define float @f15(float %val, float %dummy) #0 {
 ; CHECK-LABEL: f15:
-; CHECK: ler %f2, %f0
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah %f2
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: ltebr %f1, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    ler %f2, %f0
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah %f2
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    ltebr %f1, %f2
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB16_1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call float asm "blah $1", "=f,{f2}"(float %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f32(
@@ -489,13 +584,18 @@ exit:
 ; we need, but cannot convert the LDR.
 define double @f16(double %val, double %dummy) #0 {
 ; CHECK-LABEL: f16:
-; CHECK: ldr %f2, %f0
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah %f2
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: ltdbr %f1, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    ldr %f2, %f0
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah %f2
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    ltdbr %f1, %f2
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB17_1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call double asm "blah $1", "=f,{f2}"(double %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f64(
@@ -515,9 +615,13 @@ exit:
 ; Repeat f2 with a comparison against -0.
 define float @f17(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f17:
-; CHECK: aebr %f0, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB18_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -541,10 +645,16 @@ exit:
 ; change to the exception flags.
 define float @f18(float %a, float %b, ptr %dest) #0 {
 ; CHECK-LABEL: f18:
-; CHECK: aebr %f0, %f2
-; CHECK: ltebr %f1, %f0
-; CHECK-NEXT: ber %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    aebr %f0, %f2
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    ltebr %f1, %f0
+; CHECK-NEXT:    ber %r14
+; CHECK-NEXT:  .LBB19_1: # %store
+; CHECK-NEXT:    ste %f2, 0(%r2)
+; CHECK-NEXT:    br %r14
 entry:
   %res = call float @llvm.experimental.constrained.fadd.f32(
                         float %a, float %b,
@@ -567,25 +677,41 @@ exit:
 
 define half @f19_half(half %dummy, half %val) #0 {
 ; CHECK-LABEL: f19_half:
-; CHECK:      ler %f9, %f2
-; CHECK-NEXT: ler %f0, %f2
-; CHECK-NEXT: #APP
-; CHECK-NEXT: ler %f8, %f0
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: lzer %f0
-; CHECK-NEXT: brasl %r14, __extendhfsf2 at PLT
-; CHECK-NEXT: ler %f10, %f0
-; CHECK-NEXT: ler %f0, %f9
-; CHECK-NEXT: brasl %r14, __extendhfsf2 at PLT
-; CHECK-NEXT: cebr %f0, %f10
-; CHECK-NEXT: jl .LBB20_2
-; CHECK-NEXT:# %bb.1:           # %store
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT:.LBB20_2:          # %exit
-; CHECK-NEXT: ler %f0, %f8
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -184
+; CHECK-NEXT:    .cfi_def_cfa_offset 344
+; CHECK-NEXT:    std %f8, 176(%r15) # 8-byte Spill
+; CHECK-NEXT:    std %f9, 168(%r15) # 8-byte Spill
+; CHECK-NEXT:    std %f10, 160(%r15) # 8-byte Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    .cfi_offset %f9, -176
+; CHECK-NEXT:    .cfi_offset %f10, -184
+; CHECK-NEXT:    ler %f9, %f2
+; CHECK-NEXT:    ler %f0, %f2
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    lzer %f0
+; CHECK-NEXT:    brasl %r14, __extendhfsf2 at PLT
+; CHECK-NEXT:    ler %f10, %f0
+; CHECK-NEXT:    ler %f0, %f9
+; CHECK-NEXT:    brasl %r14, __extendhfsf2 at PLT
+; CHECK-NEXT:    cebr %f0, %f10
+; CHECK-NEXT:    jl .LBB20_2
+; CHECK-NEXT:  # %bb.1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:  .LBB20_2: # %exit
+; CHECK-NEXT:    ler %f0, %f8
+; CHECK-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
+; CHECK-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
+; CHECK-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
+; CHECK-NEXT:    lmg %r14, %r15, 296(%r15)
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call half asm sideeffect "ler $0, $1", "=f,{f0}"(half %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f16(
@@ -606,13 +732,18 @@ exit:
 ; there may be an intervening change to the exception flags.
 define float @f19(float %dummy, float %val) #0 {
 ; CHECK-LABEL: f19:
-; CHECK: ler %f0, %f2
-; CHECK-NEXT: #APP
-; CHECK-NEXT: blah %f0
-; CHECK-NEXT: #NO_APP
-; CHECK-NEXT: ltebr %f1, %f2
-; CHECK-NEXT: blr %r14
-; CHECK: br %r14
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    ler %f0, %f2
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah %f0
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    ltebr %f1, %f2
+; CHECK-NEXT:    blr %r14
+; CHECK-NEXT:  .LBB21_1: # %store
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    blah
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    br %r14
 entry:
   %ret = call float asm sideeffect "blah $1", "=f,{f0}"(float %val) #0
   %cmp = call i1 @llvm.experimental.constrained.fcmp.f32(
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-cmp-05.ll b/llvm/test/CodeGen/SystemZ/fp-strict-cmp-05.ll
index 6fcf46685ee9e..ada0c45b6b259 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-cmp-05.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-cmp-05.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; Test that floating-point instructions that set cc are *not* used to
 ; eliminate *strict* compares for load complement, load negative and load
 ; positive
@@ -8,9 +9,6 @@
 ; Load complement (sign-bit flipped).
 ; Test f32
 define float @f1(float %a, float %b, float %f) #0 {
-; CHECK-LABEL: f1:
-; CHECK: ltebr
-; CHECK-NEXT: ber %r14
   %neg = fneg float %f
   %cond = call i1 @llvm.experimental.constrained.fcmp.f32(
                                                float %neg, float 0.0,
@@ -23,8 +21,13 @@ define float @f1(float %a, float %b, float %f) #0 {
 ; Test f64
 define double @f2(double %a, double %b, double %f) #0 {
 ; CHECK-LABEL: f2:
-; CHECK: ltdbr
-; CHECK-NEXT: ber %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lcdfr %f1, %f4
+; CHECK-NEXT:    ltdbr %f1, %f1
+; CHECK-NEXT:    ber %r14
+; CHECK-NEXT:  .LBB1_1:
+; CHECK-NEXT:    ldr %f0, %f2
+; CHECK-NEXT:    br %r14
   %neg = fneg double %f
   %cond = call i1 @llvm.experimental.constrained.fcmp.f64(
                                                double %neg, double 0.0,
@@ -38,9 +41,6 @@ define double @f2(double %a, double %b, double %f) #0 {
 ; Test f32
 declare float @llvm.fabs.f32(float %f)
 define float @f3(float %a, float %b, float %f) #0 {
-; CHECK-LABEL: f3:
-; CHECK: ltebr
-; CHECK-NEXT: ber %r14
   %abs = call float @llvm.fabs.f32(float %f) #0
   %neg = fneg float %abs
   %cond = call i1 @llvm.experimental.constrained.fcmp.f32(
@@ -55,8 +55,13 @@ define float @f3(float %a, float %b, float %f) #0 {
 declare double @llvm.fabs.f64(double %f)
 define double @f4(double %a, double %b, double %f) #0 {
 ; CHECK-LABEL: f4:
-; CHECK: ltdbr
-; CHECK-NEXT: ber %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lndfr %f1, %f4
+; CHECK-NEXT:    ltdbr %f1, %f1
+; CHECK-NEXT:    ber %r14
+; CHECK-NEXT:  .LBB3_1:
+; CHECK-NEXT:    ldr %f0, %f2
+; CHECK-NEXT:    br %r14
   %abs = call double @llvm.fabs.f64(double %f) #0
   %neg = fneg double %abs
   %cond = call i1 @llvm.experimental.constrained.fcmp.f64(
@@ -70,9 +75,6 @@ define double @f4(double %a, double %b, double %f) #0 {
 ; Absolute floating-point value.
 ; Test f32
 define float @f5(float %a, float %b, float %f) #0 {
-; CHECK-LABEL: f5:
-; CHECK: ltebr
-; CHECK-NEXT: ber %r14
   %abs = call float @llvm.fabs.f32(float %f) #0
   %cond = call i1 @llvm.experimental.constrained.fcmp.f32(
                                                float %abs, float 0.0,
@@ -85,8 +87,13 @@ define float @f5(float %a, float %b, float %f) #0 {
 ; Test f64
 define double @f6(double %a, double %b, double %f) #0 {
 ; CHECK-LABEL: f6:
-; CHECK: ltdbr
-; CHECK-NEXT: ber %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lpdfr %f1, %f4
+; CHECK-NEXT:    ltdbr %f1, %f1
+; CHECK-NEXT:    ber %r14
+; CHECK-NEXT:  .LBB5_1:
+; CHECK-NEXT:    ldr %f0, %f2
+; CHECK-NEXT:    br %r14
   %abs = call double @llvm.fabs.f64(double %f) #0
   %cond = call i1 @llvm.experimental.constrained.fcmp.f64(
                                                double %abs, double 0.0,
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-conv-08.ll b/llvm/test/CodeGen/SystemZ/fp-strict-conv-08.ll
index c79f884ac4aeb..42f1581cd1dc9 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-conv-08.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-conv-08.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; Test strict conversions of unsigned i64s to floating-point values (z10 only).
 ;
 ; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
@@ -10,10 +11,25 @@ declare fp128 @llvm.experimental.constrained.uitofp.f128.i64(i64, metadata, meta
 ; Test i64->f16. For z10, this results in just a single a libcall.
 define half @f0(i64 %i) #0 {
 ; CHECK-LABEL: f0:
-; CHECK: cegbr
-; CHECK: aebr
-; CHECK: brasl %r14, __truncsfhf2 at PLT
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -160
+; CHECK-NEXT:    .cfi_def_cfa_offset 320
+; CHECK-NEXT:    cgijhe %r2, 0, .LBB0_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    risbg %r0, %r2, 63, 191, 0
+; CHECK-NEXT:    rosbg %r0, %r2, 1, 63, 63
+; CHECK-NEXT:    cegbr %f0, %r0
+; CHECK-NEXT:    aebr %f0, %f0
+; CHECK-NEXT:    j .LBB0_3
+; CHECK-NEXT:  .LBB0_2:
+; CHECK-NEXT:    cegbr %f0, %r2
+; CHECK-NEXT:  .LBB0_3:
+; CHECK-NEXT:    brasl %r14, __truncsfhf2 at PLT
+; CHECK-NEXT:    lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT:    br %r14
   %conv = call half @llvm.experimental.constrained.uitofp.f16.i64(i64 %i,
                                                metadata !"round.dynamic",
                                                metadata !"fpexcept.strict") #0
@@ -24,9 +40,17 @@ define half @f0(i64 %i) #0 {
 ; but we should be able to implement them using signed i64-to-fp conversions.
 define float @f1(i64 %i) #0 {
 ; CHECK-LABEL: f1:
-; CHECK: cegbr
-; CHECK: aebr
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    cgijhe %r2, 0, .LBB1_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    risbg %r0, %r2, 63, 191, 0
+; CHECK-NEXT:    rosbg %r0, %r2, 1, 63, 63
+; CHECK-NEXT:    cegbr %f0, %r0
+; CHECK-NEXT:    aebr %f0, %f0
+; CHECK-NEXT:    br %r14
+; CHECK-NEXT:  .LBB1_2:
+; CHECK-NEXT:    cegbr %f0, %r2
+; CHECK-NEXT:    br %r14
   %conv = call float @llvm.experimental.constrained.uitofp.f32.i64(i64 %i,
                                                metadata !"round.dynamic",
                                                metadata !"fpexcept.strict") #0
@@ -36,9 +60,16 @@ define float @f1(i64 %i) #0 {
 ; Test i64->f64.
 define double @f2(i64 %i) #0 {
 ; CHECK-LABEL: f2:
-; CHECK: cdgbr
-; CHECK: adbr
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    srlg %r0, %r2, 32
+; CHECK-NEXT:    oihh %r0, 17712
+; CHECK-NEXT:    ldgr %f1, %r0
+; CHECK-NEXT:    larl %r1, .LCPI2_0
+; CHECK-NEXT:    sdb %f1, 0(%r1)
+; CHECK-NEXT:    iihf %r2, 1127219200
+; CHECK-NEXT:    ldgr %f0, %r2
+; CHECK-NEXT:    adbr %f0, %f1
+; CHECK-NEXT:    br %r14
   %conv = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 %i,
                                                metadata !"round.dynamic",
                                                metadata !"fpexcept.strict") #0
@@ -48,9 +79,19 @@ define double @f2(i64 %i) #0 {
 ; Test i64->f128.
 define void @f3(i64 %i, ptr %dst) #0 {
 ; CHECK-LABEL: f3:
-; CHECK: cxgbr
-; CHECK: axbr
-; CHECK: br %r14
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lghi %r1, 4
+; CHECK-NEXT:    cgijl %r2, 0, .LBB3_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    lghi %r1, 0
+; CHECK-NEXT:  .LBB3_2:
+; CHECK-NEXT:    larl %r4, .LCPI3_0
+; CHECK-NEXT:    lxeb %f0, 0(%r1,%r4)
+; CHECK-NEXT:    cxgbr %f1, %r2
+; CHECK-NEXT:    axbr %f1, %f0
+; CHECK-NEXT:    std %f1, 0(%r3)
+; CHECK-NEXT:    std %f3, 8(%r3)
+; CHECK-NEXT:    br %r14
   %conv = call fp128 @llvm.experimental.constrained.uitofp.f128.i64(i64 %i,
                                                metadata !"round.dynamic",
                                                metadata !"fpexcept.strict") #0
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-conv-10.ll b/llvm/test/CodeGen/SystemZ/fp-strict-conv-10.ll
index d2206a40169e5..13ebe29c3a355 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-conv-10.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-conv-10.ll
@@ -39,18 +39,15 @@ define i32 @f1(float %f) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    larl %r1, .LCPI1_0
 ; CHECK-NEXT:    le %f1, 0(%r1)
-; CHECK-NEXT:    kebr %f0, %f1
+; CHECK-NEXT:    cebr %f0, %f1
 ; CHECK-NEXT:    jnl .LBB1_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lhi %r0, 0
-; CHECK-NEXT:    lzer %f1
-; CHECK-NEXT:    j .LBB1_3
+; CHECK-NEXT:    cfebr %r2, 5, %f0
+; CHECK-NEXT:    br %r14
 ; CHECK-NEXT:  .LBB1_2:
-; CHECK-NEXT:    llilh %r0, 32768
-; CHECK-NEXT:  .LBB1_3:
 ; CHECK-NEXT:    sebr %f0, %f1
 ; CHECK-NEXT:    cfebr %r2, 5, %f0
-; CHECK-NEXT:    xr %r2, %r0
+; CHECK-NEXT:    xilf %r2, 2147483648
 ; CHECK-NEXT:    br %r14
   %conv = call i32 @llvm.experimental.constrained.fptoui.i32.f32(float %f,
                                                metadata !"fpexcept.strict") #0
@@ -63,18 +60,15 @@ define i32 @f2(double %f) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    larl %r1, .LCPI2_0
 ; CHECK-NEXT:    ld %f1, 0(%r1)
-; CHECK-NEXT:    kdbr %f0, %f1
+; CHECK-NEXT:    cdbr %f0, %f1
 ; CHECK-NEXT:    jnl .LBB2_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lhi %r0, 0
-; CHECK-NEXT:    lzdr %f1
-; CHECK-NEXT:    j .LBB2_3
+; CHECK-NEXT:    cfdbr %r2, 5, %f0
+; CHECK-NEXT:    br %r14
 ; CHECK-NEXT:  .LBB2_2:
-; CHECK-NEXT:    llilh %r0, 32768
-; CHECK-NEXT:  .LBB2_3:
 ; CHECK-NEXT:    sdbr %f0, %f1
 ; CHECK-NEXT:    cfdbr %r2, 5, %f0
-; CHECK-NEXT:    xr %r2, %r0
+; CHECK-NEXT:    xilf %r2, 2147483648
 ; CHECK-NEXT:    br %r14
   %conv = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %f,
                                                metadata !"fpexcept.strict") #0
@@ -89,18 +83,15 @@ define i32 @f3(ptr %src) #0 {
 ; CHECK-NEXT:    ld %f2, 8(%r2)
 ; CHECK-NEXT:    larl %r1, .LCPI3_0
 ; CHECK-NEXT:    lxeb %f1, 0(%r1)
-; CHECK-NEXT:    kxbr %f0, %f1
+; CHECK-NEXT:    cxbr %f0, %f1
 ; CHECK-NEXT:    jnl .LBB3_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lhi %r0, 0
-; CHECK-NEXT:    lzxr %f1
-; CHECK-NEXT:    j .LBB3_3
+; CHECK-NEXT:    cfxbr %r2, 5, %f0
+; CHECK-NEXT:    br %r14
 ; CHECK-NEXT:  .LBB3_2:
-; CHECK-NEXT:    llilh %r0, 32768
-; CHECK-NEXT:  .LBB3_3:
 ; CHECK-NEXT:    sxbr %f0, %f1
 ; CHECK-NEXT:    cfxbr %r2, 5, %f0
-; CHECK-NEXT:    xr %r2, %r0
+; CHECK-NEXT:    xilf %r2, 2147483648
 ; CHECK-NEXT:    br %r14
   %f = load fp128, ptr %src
   %conv = call i32 @llvm.experimental.constrained.fptoui.i32.f128(fp128 %f,
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-conv-12.ll b/llvm/test/CodeGen/SystemZ/fp-strict-conv-12.ll
index 76c7188641724..3dca39e5d48a1 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-conv-12.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-conv-12.ll
@@ -38,18 +38,15 @@ define i64 @f1(float %f) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    larl %r1, .LCPI1_0
 ; CHECK-NEXT:    le %f1, 0(%r1)
-; CHECK-NEXT:    kebr %f0, %f1
+; CHECK-NEXT:    cebr %f0, %f1
 ; CHECK-NEXT:    jnl .LBB1_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lghi %r0, 0
-; CHECK-NEXT:    lzer %f1
-; CHECK-NEXT:    j .LBB1_3
+; CHECK-NEXT:    cgebr %r2, 5, %f0
+; CHECK-NEXT:    br %r14
 ; CHECK-NEXT:  .LBB1_2:
-; CHECK-NEXT:    llihh %r0, 32768
-; CHECK-NEXT:  .LBB1_3:
 ; CHECK-NEXT:    sebr %f0, %f1
 ; CHECK-NEXT:    cgebr %r2, 5, %f0
-; CHECK-NEXT:    xgr %r2, %r0
+; CHECK-NEXT:    xihf %r2, 2147483648
 ; CHECK-NEXT:    br %r14
   %conv = call i64 @llvm.experimental.constrained.fptoui.i64.f32(float %f,
                                                metadata !"fpexcept.strict") #0
@@ -62,18 +59,15 @@ define i64 @f2(double %f) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    larl %r1, .LCPI2_0
 ; CHECK-NEXT:    ld %f1, 0(%r1)
-; CHECK-NEXT:    kdbr %f0, %f1
+; CHECK-NEXT:    cdbr %f0, %f1
 ; CHECK-NEXT:    jnl .LBB2_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lghi %r0, 0
-; CHECK-NEXT:    lzdr %f1
-; CHECK-NEXT:    j .LBB2_3
+; CHECK-NEXT:    cgdbr %r2, 5, %f0
+; CHECK-NEXT:    br %r14
 ; CHECK-NEXT:  .LBB2_2:
-; CHECK-NEXT:    llihh %r0, 32768
-; CHECK-NEXT:  .LBB2_3:
 ; CHECK-NEXT:    sdbr %f0, %f1
 ; CHECK-NEXT:    cgdbr %r2, 5, %f0
-; CHECK-NEXT:    xgr %r2, %r0
+; CHECK-NEXT:    xihf %r2, 2147483648
 ; CHECK-NEXT:    br %r14
   %conv = call i64 @llvm.experimental.constrained.fptoui.i64.f64(double %f,
                                                metadata !"fpexcept.strict") #0
@@ -88,18 +82,15 @@ define i64 @f3(ptr %src) #0 {
 ; CHECK-NEXT:    ld %f2, 8(%r2)
 ; CHECK-NEXT:    larl %r1, .LCPI3_0
 ; CHECK-NEXT:    lxeb %f1, 0(%r1)
-; CHECK-NEXT:    kxbr %f0, %f1
+; CHECK-NEXT:    cxbr %f0, %f1
 ; CHECK-NEXT:    jnl .LBB3_2
 ; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    lghi %r0, 0
-; CHECK-NEXT:    lzxr %f1
-; CHECK-NEXT:    j .LBB3_3
+; CHECK-NEXT:    cgxbr %r2, 5, %f0
+; CHECK-NEXT:    br %r14
 ; CHECK-NEXT:  .LBB3_2:
-; CHECK-NEXT:    llihh %r0, 32768
-; CHECK-NEXT:  .LBB3_3:
 ; CHECK-NEXT:    sxbr %f0, %f1
 ; CHECK-NEXT:    cgxbr %r2, 5, %f0
-; CHECK-NEXT:    xgr %r2, %r0
+; CHECK-NEXT:    xihf %r2, 2147483648
 ; CHECK-NEXT:    br %r14
   %f = load fp128, ptr %src
   %conv = call i64 @llvm.experimental.constrained.fptoui.i64.f128(fp128 %f,
diff --git a/llvm/test/CodeGen/SystemZ/fp-strict-mul-06.ll b/llvm/test/CodeGen/SystemZ/fp-strict-mul-06.ll
index 05ce53c98db13..b3739803b0420 100644
--- a/llvm/test/CodeGen/SystemZ/fp-strict-mul-06.ll
+++ b/llvm/test/CodeGen/SystemZ/fp-strict-mul-06.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 \
 ; RUN:   | FileCheck -check-prefix=CHECK -check-prefix=CHECK-SCALAR %s
 ; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 \
@@ -7,15 +8,67 @@ declare half @llvm.experimental.constrained.fma.f16(half, half, half, metadata,
 declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
 
 define half @f0(half %f1, half %f2, half %acc) #0 {
-; CHECK-LABEL: f0:
-; CHECK: brasl %r14, __extendhfdf2 at PLT
-; CHECK: brasl %r14, __extendhfdf2 at PLT
-; CHECK: brasl %r14, __extendhfdf2 at PLT
-; CHECK-SCALAR: madbr %f10, %f0, %f8
-; CHECK-SCALAR: ldr %f0, %f10
-; CHECK-VECTOR: wfmadb %f0, %f0, %f8, %f10
-; CHECK: brasl %r14, __truncdfhf2 at PLT
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f0:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-SCALAR-NEXT:    .cfi_offset %r14, -48
+; CHECK-SCALAR-NEXT:    .cfi_offset %r15, -40
+; CHECK-SCALAR-NEXT:    aghi %r15, -184
+; CHECK-SCALAR-NEXT:    .cfi_def_cfa_offset 344
+; CHECK-SCALAR-NEXT:    std %f8, 176(%r15) # 8-byte Spill
+; CHECK-SCALAR-NEXT:    std %f9, 168(%r15) # 8-byte Spill
+; CHECK-SCALAR-NEXT:    std %f10, 160(%r15) # 8-byte Spill
+; CHECK-SCALAR-NEXT:    .cfi_offset %f8, -168
+; CHECK-SCALAR-NEXT:    .cfi_offset %f9, -176
+; CHECK-SCALAR-NEXT:    .cfi_offset %f10, -184
+; CHECK-SCALAR-NEXT:    ler %f8, %f4
+; CHECK-SCALAR-NEXT:    ler %f9, %f0
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    brasl %r14, __extendhfdf2 at PLT
+; CHECK-SCALAR-NEXT:    ldr %f10, %f0
+; CHECK-SCALAR-NEXT:    ler %f0, %f9
+; CHECK-SCALAR-NEXT:    brasl %r14, __extendhfdf2 at PLT
+; CHECK-SCALAR-NEXT:    ldr %f9, %f0
+; CHECK-SCALAR-NEXT:    ler %f0, %f8
+; CHECK-SCALAR-NEXT:    brasl %r14, __extendhfdf2 at PLT
+; CHECK-SCALAR-NEXT:    madbr %f0, %f9, %f10
+; CHECK-SCALAR-NEXT:    brasl %r14, __truncdfhf2 at PLT
+; CHECK-SCALAR-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
+; CHECK-SCALAR-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
+; CHECK-SCALAR-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
+; CHECK-SCALAR-NEXT:    lmg %r14, %r15, 296(%r15)
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f0:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-VECTOR-NEXT:    .cfi_offset %r14, -48
+; CHECK-VECTOR-NEXT:    .cfi_offset %r15, -40
+; CHECK-VECTOR-NEXT:    aghi %r15, -184
+; CHECK-VECTOR-NEXT:    .cfi_def_cfa_offset 344
+; CHECK-VECTOR-NEXT:    std %f8, 176(%r15) # 8-byte Spill
+; CHECK-VECTOR-NEXT:    std %f9, 168(%r15) # 8-byte Spill
+; CHECK-VECTOR-NEXT:    std %f10, 160(%r15) # 8-byte Spill
+; CHECK-VECTOR-NEXT:    .cfi_offset %f8, -168
+; CHECK-VECTOR-NEXT:    .cfi_offset %f9, -176
+; CHECK-VECTOR-NEXT:    .cfi_offset %f10, -184
+; CHECK-VECTOR-NEXT:    ldr %f9, %f0
+; CHECK-VECTOR-NEXT:    ldr %f0, %f4
+; CHECK-VECTOR-NEXT:    ldr %f8, %f2
+; CHECK-VECTOR-NEXT:    brasl %r14, __extendhfdf2 at PLT
+; CHECK-VECTOR-NEXT:    ldr %f10, %f0
+; CHECK-VECTOR-NEXT:    ldr %f0, %f8
+; CHECK-VECTOR-NEXT:    brasl %r14, __extendhfdf2 at PLT
+; CHECK-VECTOR-NEXT:    ldr %f8, %f0
+; CHECK-VECTOR-NEXT:    ldr %f0, %f9
+; CHECK-VECTOR-NEXT:    brasl %r14, __extendhfdf2 at PLT
+; CHECK-VECTOR-NEXT:    wfmadb %f0, %f0, %f8, %f10
+; CHECK-VECTOR-NEXT:    brasl %r14, __truncdfhf2 at PLT
+; CHECK-VECTOR-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
+; CHECK-VECTOR-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
+; CHECK-VECTOR-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
+; CHECK-VECTOR-NEXT:    lmg %r14, %r15, 296(%r15)
+; CHECK-VECTOR-NEXT:    br %r14
   %res = call half @llvm.experimental.constrained.fma.f16 (
                         half %f1, half %f2, half %acc,
                         metadata !"round.dynamic",
@@ -24,11 +77,16 @@ define half @f0(half %f1, half %f2, half %acc) #0 {
 }
 
 define float @f1(float %f1, float %f2, float %acc) #0 {
-; CHECK-LABEL: f1:
-; CHECK-SCALAR: maebr %f4, %f0, %f2
-; CHECK-SCALAR: ler %f0, %f4
-; CHECK-VECTOR: wfmasb %f0, %f0, %f2, %f4
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f1:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    maebr %f4, %f0, %f2
+; CHECK-SCALAR-NEXT:    ler %f0, %f4
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f1:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    wfmasb %f0, %f0, %f2, %f4
+; CHECK-VECTOR-NEXT:    br %r14
   %res = call float @llvm.experimental.constrained.fma.f32 (
                         float %f1, float %f2, float %acc,
                         metadata !"round.dynamic",
@@ -37,11 +95,17 @@ define float @f1(float %f1, float %f2, float %acc) #0 {
 }
 
 define float @f2(float %f1, ptr %ptr, float %acc) #0 {
-; CHECK-LABEL: f2:
-; CHECK: maeb %f2, %f0, 0(%r2)
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f2:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 0(%r2)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f2:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 0(%r2)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %f2 = load float, ptr %ptr
   %res = call float @llvm.experimental.constrained.fma.f32 (
                         float %f1, float %f2, float %acc,
@@ -51,11 +115,17 @@ define float @f2(float %f1, ptr %ptr, float %acc) #0 {
 }
 
 define float @f3(float %f1, ptr %base, float %acc) #0 {
-; CHECK-LABEL: f3:
-; CHECK: maeb %f2, %f0, 4092(%r2)
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f3:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 4092(%r2)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f3:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 4092(%r2)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %ptr = getelementptr float, ptr %base, i64 1023
   %f2 = load float, ptr %ptr
   %res = call float @llvm.experimental.constrained.fma.f32 (
@@ -69,12 +139,19 @@ define float @f4(float %f1, ptr %base, float %acc) #0 {
 ; The important thing here is that we don't generate an out-of-range
 ; displacement.  Other sequences besides this one would be OK.
 ;
-; CHECK-LABEL: f4:
-; CHECK: aghi %r2, 4096
-; CHECK: maeb %f2, %f0, 0(%r2)
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f4:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    aghi %r2, 4096
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 0(%r2)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f4:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    aghi %r2, 4096
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 0(%r2)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %ptr = getelementptr float, ptr %base, i64 1024
   %f2 = load float, ptr %ptr
   %res = call float @llvm.experimental.constrained.fma.f32 (
@@ -88,12 +165,19 @@ define float @f5(float %f1, ptr %base, float %acc) #0 {
 ; Here too the important thing is that we don't generate an out-of-range
 ; displacement.  Other sequences besides this one would be OK.
 ;
-; CHECK-LABEL: f5:
-; CHECK: aghi %r2, -4
-; CHECK: maeb %f2, %f0, 0(%r2)
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f5:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    aghi %r2, -4
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 0(%r2)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f5:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    aghi %r2, -4
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 0(%r2)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %ptr = getelementptr float, ptr %base, i64 -1
   %f2 = load float, ptr %ptr
   %res = call float @llvm.experimental.constrained.fma.f32 (
@@ -104,12 +188,19 @@ define float @f5(float %f1, ptr %base, float %acc) #0 {
 }
 
 define float @f6(float %f1, ptr %base, i64 %index, float %acc) #0 {
-; CHECK-LABEL: f6:
-; CHECK: sllg %r1, %r3, 2
-; CHECK: maeb %f2, %f0, 0(%r1,%r2)
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f6:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    sllg %r1, %r3, 2
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 0(%r1,%r2)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f6:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    sllg %r1, %r3, 2
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 0(%r1,%r2)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %ptr = getelementptr float, ptr %base, i64 %index
   %f2 = load float, ptr %ptr
   %res = call float @llvm.experimental.constrained.fma.f32 (
@@ -120,12 +211,19 @@ define float @f6(float %f1, ptr %base, i64 %index, float %acc) #0 {
 }
 
 define float @f7(float %f1, ptr %base, i64 %index, float %acc) #0 {
-; CHECK-LABEL: f7:
-; CHECK: sllg %r1, %r3, 2
-; CHECK: maeb %f2, %f0, 4092({{%r1,%r2|%r2,%r1}})
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f7:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    sllg %r1, %r3, 2
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 4092(%r2,%r1)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f7:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    sllg %r1, %r3, 2
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 4092(%r2,%r1)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %index2 = add i64 %index, 1023
   %ptr = getelementptr float, ptr %base, i64 %index2
   %f2 = load float, ptr %ptr
@@ -137,13 +235,21 @@ define float @f7(float %f1, ptr %base, i64 %index, float %acc) #0 {
 }
 
 define float @f8(float %f1, ptr %base, i64 %index, float %acc) #0 {
-; CHECK-LABEL: f8:
-; CHECK: sllg %r1, %r3, 2
-; CHECK: lay %r1, 4096({{%r1,%r2|%r2,%r1}})
-; CHECK: maeb %f2, %f0, 0(%r1)
-; CHECK-SCALAR: ler %f0, %f2
-; CHECK-VECTOR: ldr %f0, %f2
-; CHECK: br %r14
+; CHECK-SCALAR-LABEL: f8:
+; CHECK-SCALAR:       # %bb.0:
+; CHECK-SCALAR-NEXT:    sllg %r1, %r3, 2
+; CHECK-SCALAR-NEXT:    lay %r1, 4096(%r2,%r1)
+; CHECK-SCALAR-NEXT:    maeb %f2, %f0, 0(%r1)
+; CHECK-SCALAR-NEXT:    ler %f0, %f2
+; CHECK-SCALAR-NEXT:    br %r14
+;
+; CHECK-VECTOR-LABEL: f8:
+; CHECK-VECTOR:       # %bb.0:
+; CHECK-VECTOR-NEXT:    sllg %r1, %r3, 2
+; CHECK-VECTOR-NEXT:    lay %r1, 4096(%r2,%r1)
+; CHECK-VECTOR-NEXT:    maeb %f2, %f0, 0(%r1)
+; CHECK-VECTOR-NEXT:    ldr %f0, %f2
+; CHECK-VECTOR-NEXT:    br %r14
   %index2 = add i64 %index, 1024
   %ptr = getelementptr float, ptr %base, i64 %index2
   %f2 = load float, ptr %ptr
@@ -155,3 +261,5 @@ define float @f8(float %f1, ptr %base, i64 %index, float %acc) #0 {
 }
 
 attributes #0 = { strictfp }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll b/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
index bde3635f48446..1e3ce7b138093 100644
--- a/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
+++ b/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
@@ -7,16 +7,12 @@ define <1 x float> @constrained_vector_fdiv_v1f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI0_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI0_1
-; S390X-NEXT:    deb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fdiv_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI0_0
-; SZ13-NEXT:    vgmf %v0, 2, 8
-; SZ13-NEXT:    deb %f0, 0(%r1)
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vlrepf %v24, 0(%r1)
 ; SZ13-NEXT:    br %r14
 entry:
   %div = call <1 x float> @llvm.experimental.constrained.fdiv.v1f32(
@@ -31,22 +27,15 @@ define <2 x double> @constrained_vector_fdiv_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_fdiv_v2f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI1_0
-; S390X-NEXT:    ld %f1, 0(%r1)
+; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI1_1
 ; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI1_2
-; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    ddbr %f2, %f1
-; S390X-NEXT:    ddbr %f0, %f1
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fdiv_v2f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI1_0
-; SZ13-NEXT:    larl %r2, .LCPI1_1
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vfddb %v24, %v1, %v0
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %div = call <2 x double> @llvm.experimental.constrained.fdiv.v2f64(
@@ -61,32 +50,17 @@ define <3 x float> @constrained_vector_fdiv_v3f32() #0 {
 ; S390X-LABEL: constrained_vector_fdiv_v3f32:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI2_0
-; S390X-NEXT:    le %f1, 0(%r1)
+; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI2_1
-; S390X-NEXT:    le %f4, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI2_2
 ; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI2_3
-; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    debr %f4, %f1
-; S390X-NEXT:    debr %f2, %f1
-; S390X-NEXT:    debr %f0, %f1
+; S390X-NEXT:    larl %r1, .LCPI2_2
+; S390X-NEXT:    le %f4, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fdiv_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI2_0
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    larl %r1, .LCPI2_1
-; SZ13-NEXT:    lde %f1, 0(%r1)
-; SZ13-NEXT:    debr %f1, %f0
-; SZ13-NEXT:    vgmf %v2, 2, 8
-; SZ13-NEXT:    vgmf %v3, 1, 1
-; SZ13-NEXT:    debr %f2, %f0
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    debr %f3, %f0
-; SZ13-NEXT:    vmrhf %v0, %v2, %v3
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %div = call <3 x float> @llvm.experimental.constrained.fdiv.v3f32(
@@ -117,13 +91,14 @@ define void @constrained_vector_fdiv_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_fdiv_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI3_0
-; SZ13-NEXT:    ld %f1, 0(%r1)
-; SZ13-NEXT:    ddb %f1, 16(%r2)
-; SZ13-NEXT:    larl %r1, .LCPI3_1
+; SZ13-NEXT:    larl %r3, .LCPI3_1
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vl %v2, 0(%r1), 3
-; SZ13-NEXT:    std %f1, 16(%r2)
-; SZ13-NEXT:    vfddb %v0, %v2, %v0
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v2, 0(%r1)
+; SZ13-NEXT:    vfddb %v1, %v2, %v1
+; SZ13-NEXT:    vl %v3, 0(%r3), 3
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
+; SZ13-NEXT:    vfddb %v0, %v3, %v0
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
 ; SZ13-NEXT:    br %r14
 entry:
@@ -141,31 +116,21 @@ define <4 x double> @constrained_vector_fdiv_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_fdiv_v4f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI4_0
-; S390X-NEXT:    ld %f1, 0(%r1)
+; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI4_1
-; S390X-NEXT:    ld %f6, 0(%r1)
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI4_2
 ; S390X-NEXT:    ld %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI4_3
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI4_4
-; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    ddbr %f6, %f1
-; S390X-NEXT:    ddbr %f4, %f1
-; S390X-NEXT:    ddbr %f2, %f1
-; S390X-NEXT:    ddbr %f0, %f1
+; S390X-NEXT:    ld %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fdiv_v4f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI4_0
 ; SZ13-NEXT:    larl %r2, .LCPI4_1
-; SZ13-NEXT:    larl %r3, .LCPI4_2
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vfddb %v26, %v1, %v0
-; SZ13-NEXT:    vl %v2, 0(%r3), 3
-; SZ13-NEXT:    vfddb %v24, %v2, %v0
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vl %v26, 0(%r2), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %div = call <4 x double> @llvm.experimental.constrained.fdiv.v4f64(
@@ -181,33 +146,13 @@ entry:
 define <1 x float> @constrained_vector_frem_v1f32() #0 {
 ; S390X-LABEL: constrained_vector_frem_v1f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -160
-; S390X-NEXT:    .cfi_def_cfa_offset 320
 ; S390X-NEXT:    larl %r1, .LCPI5_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI5_1
-; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fmodf at PLT
-; S390X-NEXT:    lmg %r14, %r15, 272(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_frem_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -160
-; SZ13-NEXT:    .cfi_def_cfa_offset 320
-; SZ13-NEXT:    larl %r1, .LCPI5_0
-; SZ13-NEXT:    lde %f2, 0(%r1)
-; SZ13-NEXT:    vgmf %v0, 2, 8
-; SZ13-NEXT:    brasl %r14, fmodf at PLT
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vlr %v24, %v0
-; SZ13-NEXT:    lmg %r14, %r15, 272(%r15)
+; SZ13-NEXT:    vgmf %v24, 2, 8
 ; SZ13-NEXT:    br %r14
 entry:
   %rem = call <1 x float> @llvm.experimental.constrained.frem.v1f32(
@@ -221,57 +166,16 @@ entry:
 define <2 x double> @constrained_vector_frem_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_frem_v2f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -176
-; S390X-NEXT:    .cfi_def_cfa_offset 336
-; S390X-NEXT:    std %f8, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    larl %r1, .LCPI6_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI6_1
-; S390X-NEXT:    ld %f8, 0(%r1)
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    brasl %r14, fmod at PLT
-; S390X-NEXT:    larl %r1, .LCPI6_2
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    ldr %f9, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    brasl %r14, fmod at PLT
-; S390X-NEXT:    ldr %f2, %f9
-; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_frem_v2f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -184
-; SZ13-NEXT:    .cfi_def_cfa_offset 344
-; SZ13-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; SZ13-NEXT:    .cfi_offset %f8, -168
 ; SZ13-NEXT:    larl %r1, .LCPI6_0
-; SZ13-NEXT:    ld %f8, 0(%r1)
-; SZ13-NEXT:    vgmg %v0, 1, 1
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmod at PLT
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    vgmg %v0, 2, 11
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmod at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 296(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %rem = call <2 x double> @llvm.experimental.constrained.frem.v2f64(
@@ -285,76 +189,18 @@ entry:
 define <3 x float> @constrained_vector_frem_v3f32() #0 {
 ; S390X-LABEL: constrained_vector_frem_v3f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -184
-; S390X-NEXT:    .cfi_def_cfa_offset 344
-; S390X-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    larl %r1, .LCPI7_0
 ; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI7_1
-; S390X-NEXT:    le %f8, 0(%r1)
-; S390X-NEXT:    ler %f2, %f8
-; S390X-NEXT:    brasl %r14, fmodf at PLT
+; S390X-NEXT:    le %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI7_2
-; S390X-NEXT:    le %f1, 0(%r1)
-; S390X-NEXT:    ler %f9, %f0
-; S390X-NEXT:    ler %f0, %f1
-; S390X-NEXT:    ler %f2, %f8
-; S390X-NEXT:    brasl %r14, fmodf at PLT
-; S390X-NEXT:    larl %r1, .LCPI7_3
-; S390X-NEXT:    le %f1, 0(%r1)
-; S390X-NEXT:    ler %f10, %f0
-; S390X-NEXT:    ler %f0, %f1
-; S390X-NEXT:    ler %f2, %f8
-; S390X-NEXT:    brasl %r14, fmodf at PLT
-; S390X-NEXT:    ler %f2, %f10
-; S390X-NEXT:    ler %f4, %f9
-; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 296(%r15)
+; S390X-NEXT:    le %f4, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_frem_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -200
-; SZ13-NEXT:    .cfi_def_cfa_offset 360
-; SZ13-NEXT:    std %f8, 192(%r15) # 8-byte Spill
-; SZ13-NEXT:    .cfi_offset %f8, -168
-; SZ13-NEXT:    larl %r2, .LCPI7_1
 ; SZ13-NEXT:    larl %r1, .LCPI7_0
-; SZ13-NEXT:    lde %f8, 0(%r2)
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmodf at PLT
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    vgmf %v0, 2, 8
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmodf at PLT
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    vgmf %v0, 1, 1
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmodf at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 312(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %rem = call <3 x float> @llvm.experimental.constrained.frem.v3f32(
@@ -459,92 +305,22 @@ entry:
 define <4 x double> @constrained_vector_frem_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_frem_v4f64:
 ; S390X:       # %bb.0:
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -192
-; S390X-NEXT:    .cfi_def_cfa_offset 352
-; S390X-NEXT:    std %f8, 184(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f11, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
-; S390X-NEXT:    .cfi_offset %f11, -192
 ; S390X-NEXT:    larl %r1, .LCPI9_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI9_1
-; S390X-NEXT:    ld %f8, 0(%r1)
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    brasl %r14, fmod at PLT
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI9_2
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    ldr %f9, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    brasl %r14, fmod at PLT
+; S390X-NEXT:    ld %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI9_3
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    ldr %f10, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    brasl %r14, fmod at PLT
-; S390X-NEXT:    larl %r1, .LCPI9_4
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    ldr %f11, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    brasl %r14, fmod at PLT
-; S390X-NEXT:    ldr %f2, %f11
-; S390X-NEXT:    ldr %f4, %f10
-; S390X-NEXT:    ldr %f6, %f9
-; S390X-NEXT:    ld %f8, 184(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f11, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 304(%r15)
+; S390X-NEXT:    ld %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_frem_v4f64:
 ; SZ13:       # %bb.0:
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -200
-; SZ13-NEXT:    .cfi_def_cfa_offset 360
-; SZ13-NEXT:    std %f8, 192(%r15) # 8-byte Spill
-; SZ13-NEXT:    .cfi_offset %f8, -168
 ; SZ13-NEXT:    larl %r1, .LCPI9_0
-; SZ13-NEXT:    ld %f8, 0(%r1)
-; SZ13-NEXT:    vgmg %v0, 1, 1
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmod at PLT
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    vgmg %v0, 2, 11
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmod at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v0, %v0, %v1
-; SZ13-NEXT:    larl %r1, .LCPI9_1
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmod at PLT
-; SZ13-NEXT:    larl %r1, .LCPI9_2
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmod at PLT
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vl %v24, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v26, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 312(%r15)
+; SZ13-NEXT:    larl %r2, .LCPI9_1
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vl %v26, 0(%r2), 3
 ; SZ13-NEXT:    br %r14
   %rem = call <4 x double> @llvm.experimental.constrained.frem.v4f64(
            <4 x double> <double 1.000000e+00, double 2.000000e+00,
@@ -561,16 +337,11 @@ define <1 x float> @constrained_vector_fmul_v1f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI10_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI10_1
-; S390X-NEXT:    meeb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fmul_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgmf %v0, 1, 1
-; SZ13-NEXT:    vgmf %v1, 1, 8
-; SZ13-NEXT:    meebr %f1, %f0
-; SZ13-NEXT:    vlr %v24, %v1
+; SZ13-NEXT:    vgmf %v24, 1, 8
 ; SZ13-NEXT:    br %r14
 entry:
   %mul = call <1 x float> @llvm.experimental.constrained.fmul.v1f32(
@@ -586,20 +357,12 @@ define <2 x double> @constrained_vector_fmul_v2f64() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI11_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI11_1
 ; S390X-NEXT:    ldr %f2, %f0
-; S390X-NEXT:    mdb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI11_2
-; S390X-NEXT:    mdb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fmul_v2f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    larl %r1, .LCPI11_0
-; SZ13-NEXT:    larl %r2, .LCPI11_1
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vfmdb %v24, %v1, %v0
+; SZ13-NEXT:    vgmg %v24, 1, 11
 ; SZ13-NEXT:    br %r14
 entry:
   %mul = call <2 x double> @llvm.experimental.constrained.fmul.v2f64(
@@ -615,29 +378,13 @@ define <3 x float> @constrained_vector_fmul_v3f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI12_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI12_1
-; S390X-NEXT:    ler %f4, %f0
-; S390X-NEXT:    meeb %f4, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI12_2
 ; S390X-NEXT:    ler %f2, %f0
-; S390X-NEXT:    meeb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI12_3
-; S390X-NEXT:    meeb %f0, 0(%r1)
+; S390X-NEXT:    ler %f4, %f0
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fmul_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgmf %v0, 1, 8
-; SZ13-NEXT:    larl %r1, .LCPI12_0
-; SZ13-NEXT:    vgmf %v2, 2, 8
-; SZ13-NEXT:    vgmf %v1, 1, 8
-; SZ13-NEXT:    meeb %f1, 0(%r1)
-; SZ13-NEXT:    larl %r1, .LCPI12_1
-; SZ13-NEXT:    meebr %f2, %f0
-; SZ13-NEXT:    meeb %f0, 0(%r1)
-; SZ13-NEXT:    vmrhf %v0, %v2, %v0
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
+; SZ13-NEXT:    vgmf %v24, 1, 8
 ; SZ13-NEXT:    br %r14
 entry:
   %mul = call <3 x float> @llvm.experimental.constrained.fmul.v3f32(
@@ -667,14 +414,15 @@ define void @constrained_vector_fmul_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_fmul_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI13_0
-; SZ13-NEXT:    ld %f1, 0(%r1)
-; SZ13-NEXT:    larl %r1, .LCPI13_1
+; SZ13-NEXT:    larl %r3, .LCPI13_1
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vl %v2, 0(%r1), 3
-; SZ13-NEXT:    mdb %f1, 16(%r2)
-; SZ13-NEXT:    vfmdb %v0, %v2, %v0
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v2, 0(%r1)
+; SZ13-NEXT:    vl %v3, 0(%r3), 3
+; SZ13-NEXT:    vfmdb %v1, %v1, %v2
+; SZ13-NEXT:    vfmdb %v0, %v0, %v3
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -693,29 +441,15 @@ define <4 x double> @constrained_vector_fmul_v4f64() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI14_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI14_1
-; S390X-NEXT:    ldr %f6, %f0
-; S390X-NEXT:    mdb %f6, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI14_2
-; S390X-NEXT:    ldr %f4, %f0
-; S390X-NEXT:    mdb %f4, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI14_3
 ; S390X-NEXT:    ldr %f2, %f0
-; S390X-NEXT:    mdb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI14_4
-; S390X-NEXT:    mdb %f0, 0(%r1)
+; S390X-NEXT:    ldr %f4, %f0
+; S390X-NEXT:    ldr %f6, %f0
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fmul_v4f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    larl %r1, .LCPI14_0
-; SZ13-NEXT:    larl %r2, .LCPI14_1
-; SZ13-NEXT:    larl %r3, .LCPI14_2
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vl %v2, 0(%r3), 3
-; SZ13-NEXT:    vfmdb %v26, %v1, %v0
-; SZ13-NEXT:    vfmdb %v24, %v1, %v2
+; SZ13-NEXT:    vgmg %v24, 1, 11
+; SZ13-NEXT:    vgmg %v26, 1, 11
 ; SZ13-NEXT:    br %r14
 entry:
   %mul = call <4 x double> @llvm.experimental.constrained.fmul.v4f64(
@@ -733,16 +467,11 @@ define <1 x float> @constrained_vector_fadd_v1f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI15_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI15_1
-; S390X-NEXT:    aeb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fadd_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgmf %v0, 2, 8
-; SZ13-NEXT:    vgmf %v1, 1, 8
-; SZ13-NEXT:    aebr %f1, %f0
-; SZ13-NEXT:    vlr %v24, %v1
+; SZ13-NEXT:    vgmf %v24, 1, 8
 ; SZ13-NEXT:    br %r14
 entry:
   %add = call <1 x float> @llvm.experimental.constrained.fadd.v1f32(
@@ -758,20 +487,13 @@ define <2 x double> @constrained_vector_fadd_v2f64() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI16_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI16_1
 ; S390X-NEXT:    ldr %f2, %f0
-; S390X-NEXT:    adb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI16_2
-; S390X-NEXT:    adb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fadd_v2f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI16_0
-; SZ13-NEXT:    larl %r2, .LCPI16_1
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vfadb %v24, %v1, %v0
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %add = call <2 x double> @llvm.experimental.constrained.fadd.v2f64(
@@ -787,27 +509,13 @@ define <3 x float> @constrained_vector_fadd_v3f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI17_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    lzer %f4
-; S390X-NEXT:    aebr %f4, %f0
-; S390X-NEXT:    larl %r1, .LCPI17_1
 ; S390X-NEXT:    ler %f2, %f0
-; S390X-NEXT:    aeb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI17_2
-; S390X-NEXT:    aeb %f0, 0(%r1)
+; S390X-NEXT:    ler %f4, %f0
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fadd_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgbm %v0, 61440
-; SZ13-NEXT:    vgmf %v2, 1, 1
-; SZ13-NEXT:    vgmf %v3, 2, 8
-; SZ13-NEXT:    lzer %f1
-; SZ13-NEXT:    aebr %f1, %f0
-; SZ13-NEXT:    aebr %f2, %f0
-; SZ13-NEXT:    aebr %f3, %f0
-; SZ13-NEXT:    vmrhf %v0, %v2, %v3
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
+; SZ13-NEXT:    vgbm %v24, 65520
 ; SZ13-NEXT:    br %r14
 entry:
   %add = call <3 x float> @llvm.experimental.constrained.fadd.v3f32(
@@ -837,14 +545,15 @@ define void @constrained_vector_fadd_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_fadd_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI18_0
-; SZ13-NEXT:    ld %f1, 0(%r1)
-; SZ13-NEXT:    larl %r1, .LCPI18_1
+; SZ13-NEXT:    larl %r3, .LCPI18_1
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vl %v2, 0(%r1), 3
-; SZ13-NEXT:    adb %f1, 16(%r2)
-; SZ13-NEXT:    vfadb %v0, %v2, %v0
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v2, 0(%r1)
+; SZ13-NEXT:    vl %v3, 0(%r3), 3
+; SZ13-NEXT:    vfadb %v1, %v1, %v2
+; SZ13-NEXT:    vfadb %v0, %v0, %v3
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -863,29 +572,16 @@ define <4 x double> @constrained_vector_fadd_v4f64() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI19_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI19_1
-; S390X-NEXT:    ldr %f6, %f0
-; S390X-NEXT:    adb %f6, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI19_2
-; S390X-NEXT:    ldr %f4, %f0
-; S390X-NEXT:    adb %f4, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI19_3
 ; S390X-NEXT:    ldr %f2, %f0
-; S390X-NEXT:    adb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI19_4
-; S390X-NEXT:    adb %f0, 0(%r1)
+; S390X-NEXT:    ldr %f4, %f0
+; S390X-NEXT:    ldr %f6, %f0
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fadd_v4f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI19_0
-; SZ13-NEXT:    larl %r2, .LCPI19_1
-; SZ13-NEXT:    larl %r3, .LCPI19_2
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vl %v2, 0(%r3), 3
-; SZ13-NEXT:    vfadb %v26, %v1, %v0
-; SZ13-NEXT:    vfadb %v24, %v1, %v2
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vlr %v26, %v24
 ; SZ13-NEXT:    br %r14
 entry:
   %add = call <4 x double> @llvm.experimental.constrained.fadd.v4f64(
@@ -903,16 +599,11 @@ define <1 x float> @constrained_vector_fsub_v1f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI20_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI20_1
-; S390X-NEXT:    seb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fsub_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgmf %v0, 2, 8
-; SZ13-NEXT:    vgmf %v1, 1, 8
-; SZ13-NEXT:    sebr %f1, %f0
-; SZ13-NEXT:    vlr %v24, %v1
+; SZ13-NEXT:    vgmf %v24, 1, 8
 ; SZ13-NEXT:    br %r14
 entry:
   %sub = call <1 x float> @llvm.experimental.constrained.fsub.v1f32(
@@ -928,19 +619,12 @@ define <2 x double> @constrained_vector_fsub_v2f64() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI21_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI21_1
 ; S390X-NEXT:    ldr %f2, %f0
-; S390X-NEXT:    sdb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI21_2
-; S390X-NEXT:    sdb %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fsub_v2f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    larl %r1, .LCPI21_0
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vgmg %v1, 12, 10
-; SZ13-NEXT:    vfsdb %v24, %v1, %v0
+; SZ13-NEXT:    vgmg %v24, 12, 10
 ; SZ13-NEXT:    br %r14
 entry:
   %sub = call <2 x double> @llvm.experimental.constrained.fsub.v2f64(
@@ -956,30 +640,13 @@ define <3 x float> @constrained_vector_fsub_v3f32() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI22_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    ler %f4, %f0
-; S390X-NEXT:    larl %r1, .LCPI22_1
 ; S390X-NEXT:    ler %f2, %f0
-; S390X-NEXT:    seb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI22_2
-; S390X-NEXT:    seb %f0, 0(%r1)
-; S390X-NEXT:    lzer %f1
-; S390X-NEXT:    sebr %f4, %f1
+; S390X-NEXT:    ler %f4, %f0
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fsub_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgbm %v2, 61440
-; SZ13-NEXT:    lzer %f1
-; SZ13-NEXT:    sebr %f2, %f1
-; SZ13-NEXT:    vgmf %v1, 1, 1
-; SZ13-NEXT:    vgbm %v3, 61440
-; SZ13-NEXT:    vgbm %v0, 61440
-; SZ13-NEXT:    sebr %f3, %f1
-; SZ13-NEXT:    vgmf %v1, 2, 8
-; SZ13-NEXT:    sebr %f0, %f1
-; SZ13-NEXT:    vmrhf %v0, %v3, %v0
-; SZ13-NEXT:    vrepf %v1, %v2, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
+; SZ13-NEXT:    vgbm %v24, 65520
 ; SZ13-NEXT:    br %r14
 entry:
   %sub = call <3 x float> @llvm.experimental.constrained.fsub.v3f32(
@@ -1009,12 +676,12 @@ define void @constrained_vector_fsub_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_fsub_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
 ; SZ13-NEXT:    vgmg %v2, 12, 10
-; SZ13-NEXT:    sdb %f2, 16(%r2)
-; SZ13-NEXT:    vgmg %v1, 12, 10
-; SZ13-NEXT:    vfsdb %v0, %v1, %v0
+; SZ13-NEXT:    vfsdb %v1, %v2, %v1
+; SZ13-NEXT:    vfsdb %v0, %v2, %v0
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f2, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -1033,28 +700,15 @@ define <4 x double> @constrained_vector_fsub_v4f64() #0 {
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI24_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI24_1
-; S390X-NEXT:    ldr %f6, %f0
-; S390X-NEXT:    sdb %f6, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI24_2
-; S390X-NEXT:    ldr %f4, %f0
-; S390X-NEXT:    sdb %f4, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI24_3
 ; S390X-NEXT:    ldr %f2, %f0
-; S390X-NEXT:    sdb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI24_4
-; S390X-NEXT:    sdb %f0, 0(%r1)
+; S390X-NEXT:    ldr %f4, %f0
+; S390X-NEXT:    ldr %f6, %f0
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fsub_v4f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    larl %r1, .LCPI24_0
-; SZ13-NEXT:    larl %r2, .LCPI24_1
-; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vgmg %v2, 12, 10
-; SZ13-NEXT:    vfsdb %v26, %v2, %v0
-; SZ13-NEXT:    vfsdb %v24, %v2, %v1
+; SZ13-NEXT:    vgmg %v24, 12, 10
+; SZ13-NEXT:    vgmg %v26, 12, 10
 ; SZ13-NEXT:    br %r14
 entry:
   %sub = call <4 x double> @llvm.experimental.constrained.fsub.v4f64(
@@ -1076,9 +730,12 @@ define <1 x float> @constrained_vector_sqrt_v1f32() #0 {
 ;
 ; SZ13-LABEL: constrained_vector_sqrt_v1f32:
 ; SZ13:       # %bb.0: # %entry
+; SZ13-NEXT:    sqebr %f0, %f0
 ; SZ13-NEXT:    larl %r1, .LCPI25_0
-; SZ13-NEXT:    sqeb %f0, 0(%r1)
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    sqeb %f1, 0(%r1)
+; SZ13-NEXT:    vmrhf %v1, %v1, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
 ; SZ13-NEXT:    br %r14
 entry:
   %sqrt = call <1 x float> @llvm.experimental.constrained.sqrt.v1f32(
@@ -1092,9 +749,9 @@ define <2 x double> @constrained_vector_sqrt_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_sqrt_v2f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI26_0
-; S390X-NEXT:    sqdb %f2, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI26_1
 ; S390X-NEXT:    sqdb %f0, 0(%r1)
+; S390X-NEXT:    larl %r1, .LCPI26_1
+; S390X-NEXT:    sqdb %f2, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_sqrt_v2f64:
@@ -1115,23 +772,24 @@ define <3 x float> @constrained_vector_sqrt_v3f32() #0 {
 ; S390X-LABEL: constrained_vector_sqrt_v3f32:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI27_0
-; S390X-NEXT:    sqeb %f4, 0(%r1)
+; S390X-NEXT:    sqeb %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI27_1
 ; S390X-NEXT:    sqeb %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI27_2
-; S390X-NEXT:    sqeb %f0, 0(%r1)
+; S390X-NEXT:    sqeb %f4, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_sqrt_v3f32:
 ; SZ13:       # %bb.0: # %entry
+; SZ13-NEXT:    sqebr %f0, %f0
 ; SZ13-NEXT:    larl %r1, .LCPI27_0
-; SZ13-NEXT:    sqeb %f0, 0(%r1)
 ; SZ13-NEXT:    larl %r2, .LCPI27_1
+; SZ13-NEXT:    sqeb %f1, 0(%r1)
 ; SZ13-NEXT:    larl %r3, .LCPI27_2
-; SZ13-NEXT:    sqeb %f1, 0(%r2)
-; SZ13-NEXT:    vrepf %v0, %v0, 0
-; SZ13-NEXT:    sqeb %f2, 0(%r3)
-; SZ13-NEXT:    vmrhf %v1, %v1, %v2
+; SZ13-NEXT:    vmrhf %v0, %v1, %v0
+; SZ13-NEXT:    sqeb %f2, 0(%r2)
+; SZ13-NEXT:    sqeb %f3, 0(%r3)
+; SZ13-NEXT:    vmrhf %v1, %v3, %v2
 ; SZ13-NEXT:    vmrhg %v24, %v1, %v0
 ; SZ13-NEXT:    br %r14
 entry:
@@ -1155,11 +813,12 @@ define void @constrained_vector_sqrt_v3f64(ptr %a) #0 {
 ;
 ; SZ13-LABEL: constrained_vector_sqrt_v3f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    sqdb %f1, 16(%r2)
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
 ; SZ13-NEXT:    vfsqdb %v0, %v0
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
+; SZ13-NEXT:    vfsqdb %v1, %v1
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -1175,13 +834,13 @@ define <4 x double> @constrained_vector_sqrt_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_sqrt_v4f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI29_0
-; S390X-NEXT:    sqdb %f6, 0(%r1)
+; S390X-NEXT:    sqdb %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI29_1
-; S390X-NEXT:    sqdb %f4, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI29_2
 ; S390X-NEXT:    sqdb %f2, 0(%r1)
+; S390X-NEXT:    larl %r1, .LCPI29_2
+; S390X-NEXT:    sqdb %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI29_3
-; S390X-NEXT:    sqdb %f0, 0(%r1)
+; S390X-NEXT:    sqdb %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_sqrt_v4f64:
@@ -1189,9 +848,9 @@ define <4 x double> @constrained_vector_sqrt_v4f64() #0 {
 ; SZ13-NEXT:    larl %r1, .LCPI29_0
 ; SZ13-NEXT:    larl %r2, .LCPI29_1
 ; SZ13-NEXT:    vl %v0, 0(%r1), 3
-; SZ13-NEXT:    vfsqdb %v26, %v0
+; SZ13-NEXT:    vfsqdb %v24, %v0
 ; SZ13-NEXT:    vl %v1, 0(%r2), 3
-; SZ13-NEXT:    vfsqdb %v24, %v1
+; SZ13-NEXT:    vfsqdb %v26, %v1
 ; SZ13-NEXT:    br %r14
  entry:
   %sqrt = call <4 x double> @llvm.experimental.constrained.sqrt.v4f64(
@@ -1267,7 +926,8 @@ define <2 x double> @constrained_vector_pow_v2f64() #0 {
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    ldr %f2, %f8
 ; S390X-NEXT:    brasl %r14, pow at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -1341,8 +1001,9 @@ define <3 x float> @constrained_vector_pow_v3f32() #0 {
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    ler %f2, %f8
 ; S390X-NEXT:    brasl %r14, powf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f9
 ; S390X-NEXT:    ler %f2, %f10
-; S390X-NEXT:    ler %f4, %f9
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -1354,9 +1015,9 @@ define <3 x float> @constrained_vector_pow_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -200
-; SZ13-NEXT:    .cfi_def_cfa_offset 360
-; SZ13-NEXT:    std %f8, 192(%r15) # 8-byte Spill
+; SZ13-NEXT:    aghi %r15, -184
+; SZ13-NEXT:    .cfi_def_cfa_offset 344
+; SZ13-NEXT:    std %f8, 176(%r15) # 8-byte Spill
 ; SZ13-NEXT:    .cfi_offset %f8, -168
 ; SZ13-NEXT:    larl %r2, .LCPI32_1
 ; SZ13-NEXT:    larl %r1, .LCPI32_0
@@ -1366,24 +1027,24 @@ define <3 x float> @constrained_vector_pow_v3f32() #0 {
 ; SZ13-NEXT:    brasl %r14, powf at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI32_2
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    ldr %f2, %f8
 ; SZ13-NEXT:    brasl %r14, powf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI32_3
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI32_3
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    ldr %f2, %f8
 ; SZ13-NEXT:    brasl %r14, powf at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
+; SZ13-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 312(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 296(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %pow = call <3 x float> @llvm.experimental.constrained.pow.v3f32(
@@ -1529,9 +1190,10 @@ define <4 x double> @constrained_vector_pow_v4f64() #0 {
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    ldr %f2, %f8
 ; S390X-NEXT:    brasl %r14, pow at PLT
-; S390X-NEXT:    ldr %f2, %f11
-; S390X-NEXT:    ldr %f4, %f10
-; S390X-NEXT:    ldr %f6, %f9
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f9
+; S390X-NEXT:    ldr %f2, %f10
+; S390X-NEXT:    ldr %f4, %f11
 ; S390X-NEXT:    ld %f8, 184(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 168(%r15) # 8-byte Reload
@@ -1595,32 +1257,14 @@ entry:
 define <1 x float> @constrained_vector_powi_v1f32() #0 {
 ; S390X-LABEL: constrained_vector_powi_v1f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -160
-; S390X-NEXT:    .cfi_def_cfa_offset 320
 ; S390X-NEXT:    larl %r1, .LCPI35_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    brasl %r14, __powisf2 at PLT
-; S390X-NEXT:    lmg %r14, %r15, 272(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_powi_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -160
-; SZ13-NEXT:    .cfi_def_cfa_offset 320
 ; SZ13-NEXT:    larl %r1, .LCPI35_0
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powisf2 at PLT
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vlr %v24, %v0
-; SZ13-NEXT:    lmg %r14, %r15, 272(%r15)
+; SZ13-NEXT:    vlrepf %v24, 0(%r1)
 ; SZ13-NEXT:    br %r14
 entry:
   %powi = call <1 x float> @llvm.experimental.constrained.powi.v1f32(
@@ -1634,49 +1278,16 @@ entry:
 define <2 x double> @constrained_vector_powi_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_powi_v2f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -168
-; S390X-NEXT:    .cfi_def_cfa_offset 328
-; S390X-NEXT:    std %f8, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    larl %r1, .LCPI36_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
 ; S390X-NEXT:    larl %r1, .LCPI36_1
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_powi_v2f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -176
-; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI36_0
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    larl %r1, .LCPI36_1
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %powi = call <2 x double> @llvm.experimental.constrained.powi.v2f64(
@@ -1690,68 +1301,18 @@ entry:
 define <3 x float> @constrained_vector_powi_v3f32() #0 {
 ; S390X-LABEL: constrained_vector_powi_v3f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -176
-; S390X-NEXT:    .cfi_def_cfa_offset 336
-; S390X-NEXT:    std %f8, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    larl %r1, .LCPI37_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    brasl %r14, __powisf2 at PLT
 ; S390X-NEXT:    larl %r1, .LCPI37_1
-; S390X-NEXT:    le %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ler %f8, %f0
-; S390X-NEXT:    ler %f0, %f1
-; S390X-NEXT:    brasl %r14, __powisf2 at PLT
+; S390X-NEXT:    le %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI37_2
-; S390X-NEXT:    le %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ler %f9, %f0
-; S390X-NEXT:    ler %f0, %f1
-; S390X-NEXT:    brasl %r14, __powisf2 at PLT
-; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
-; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
+; S390X-NEXT:    le %f4, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_powi_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
 ; SZ13-NEXT:    larl %r1, .LCPI37_0
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powisf2 at PLT
-; SZ13-NEXT:    larl %r1, .LCPI37_1
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powisf2 at PLT
-; SZ13-NEXT:    larl %r1, .LCPI37_2
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powisf2 at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %powi = call <3 x float> @llvm.experimental.constrained.powi.v3f32(
@@ -1765,71 +1326,24 @@ entry:
 define void @constrained_vector_powi_v3f64(ptr %a) #0 {
 ; S390X-LABEL: constrained_vector_powi_v3f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r13, %r15, 104(%r15)
-; S390X-NEXT:    .cfi_offset %r13, -56
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -176
-; S390X-NEXT:    .cfi_def_cfa_offset 336
-; S390X-NEXT:    std %f8, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    larl %r1, .LCPI38_0
-; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    lgr %r13, %r2
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
-; S390X-NEXT:    larl %r1, .LCPI38_1
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
-; S390X-NEXT:    larl %r1, .LCPI38_2
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ldr %f9, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
-; S390X-NEXT:    std %f0, 16(%r13)
-; S390X-NEXT:    std %f9, 8(%r13)
-; S390X-NEXT:    std %f8, 0(%r13)
-; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r13, %r15, 280(%r15)
+; S390X-NEXT:    llihf %r0, 1089607296
+; S390X-NEXT:    stg %r0, 0(%r2)
+; S390X-NEXT:    llihf %r0, 1089624311
+; S390X-NEXT:    oilf %r0, 721554506
+; S390X-NEXT:    stg %r0, 16(%r2)
+; S390X-NEXT:    llihf %r0, 1089615783
+; S390X-NEXT:    oilf %r0, 1614907704
+; S390X-NEXT:    stg %r0, 8(%r2)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_powi_v3f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r13, %r15, 104(%r15)
-; SZ13-NEXT:    .cfi_offset %r13, -56
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -176
-; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI38_0
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lgr %r13, %r2
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    larl %r1, .LCPI38_1
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v0, %v0, %v1
-; SZ13-NEXT:    larl %r1, .LCPI38_2
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    lmg %r13, %r15, 280(%r15)
+; SZ13-NEXT:    vl %v0, 0(%r1), 3
+; SZ13-NEXT:    llihf %r0, 1089624311
+; SZ13-NEXT:    oilf %r0, 721554506
+; SZ13-NEXT:    stg %r0, 16(%r2)
+; SZ13-NEXT:    vst %v0, 0(%r2), 4
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -1845,84 +1359,22 @@ entry:
 define <4 x double> @constrained_vector_powi_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_powi_v4f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -184
-; S390X-NEXT:    .cfi_def_cfa_offset 344
-; S390X-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    larl %r1, .LCPI39_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
 ; S390X-NEXT:    larl %r1, .LCPI39_1
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI39_2
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ldr %f9, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
+; S390X-NEXT:    ld %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI39_3
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    lghi %r2, 3
-; S390X-NEXT:    ldr %f10, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, __powidf2 at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
-; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 296(%r15)
+; S390X-NEXT:    ld %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_powi_v4f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
 ; SZ13-NEXT:    larl %r1, .LCPI39_0
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    larl %r1, .LCPI39_1
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v0, %v0, %v1
-; SZ13-NEXT:    larl %r1, .LCPI39_2
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    larl %r1, .LCPI39_3
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    lghi %r2, 3
-; SZ13-NEXT:    brasl %r14, __powidf2 at PLT
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vl %v24, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v26, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    larl %r2, .LCPI39_1
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vl %v26, 0(%r2), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %powi = call <4 x double> @llvm.experimental.constrained.powi.v4f64(
@@ -1988,7 +1440,8 @@ define <2 x double> @constrained_vector_sin_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, sin at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -2046,8 +1499,9 @@ define <3 x float> @constrained_vector_sin_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, sinf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -2058,28 +1512,28 @@ define <3 x float> @constrained_vector_sin_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI42_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, sinf at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI42_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, sinf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI42_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI42_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, sinf at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %sin = call <3 x float> @llvm.experimental.constrained.sin.v3f32(
@@ -2199,9 +1653,10 @@ define <4 x double> @constrained_vector_sin_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, sin at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -2304,7 +1759,8 @@ define <2 x double> @constrained_vector_cos_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, cos at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -2362,8 +1818,9 @@ define <3 x float> @constrained_vector_cos_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, cosf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -2374,28 +1831,28 @@ define <3 x float> @constrained_vector_cos_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI47_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, cosf at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI47_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, cosf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI47_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI47_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, cosf at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %cos = call <3 x float> @llvm.experimental.constrained.cos.v3f32(
@@ -2515,9 +1972,10 @@ define <4 x double> @constrained_vector_cos_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, cos at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -2620,7 +2078,8 @@ define <2 x double> @constrained_vector_exp_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, exp at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -2678,8 +2137,9 @@ define <3 x float> @constrained_vector_exp_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, expf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -2690,28 +2150,28 @@ define <3 x float> @constrained_vector_exp_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI52_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, expf at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI52_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, expf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI52_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI52_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, expf at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %exp = call <3 x float> @llvm.experimental.constrained.exp.v3f32(
@@ -2831,9 +2291,10 @@ define <4 x double> @constrained_vector_exp_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, exp at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -2936,7 +2397,8 @@ define <2 x double> @constrained_vector_exp2_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, exp2 at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -2994,8 +2456,9 @@ define <3 x float> @constrained_vector_exp2_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, exp2f at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -3006,28 +2469,28 @@ define <3 x float> @constrained_vector_exp2_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI57_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, exp2f at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI57_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, exp2f at PLT
-; SZ13-NEXT:    larl %r1, .LCPI57_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI57_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, exp2f at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %exp2 = call <3 x float> @llvm.experimental.constrained.exp2.v3f32(
@@ -3147,9 +2610,10 @@ define <4 x double> @constrained_vector_exp2_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, exp2 at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -3252,7 +2716,8 @@ define <2 x double> @constrained_vector_log_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, log at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -3310,8 +2775,9 @@ define <3 x float> @constrained_vector_log_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, logf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -3322,28 +2788,28 @@ define <3 x float> @constrained_vector_log_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI62_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, logf at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI62_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, logf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI62_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI62_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, logf at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %log = call <3 x float> @llvm.experimental.constrained.log.v3f32(
@@ -3463,9 +2929,10 @@ define <4 x double> @constrained_vector_log_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, log at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -3568,7 +3035,8 @@ define <2 x double> @constrained_vector_log10_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, log10 at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -3626,8 +3094,9 @@ define <3 x float> @constrained_vector_log10_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, log10f at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -3638,28 +3107,28 @@ define <3 x float> @constrained_vector_log10_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI67_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, log10f at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI67_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, log10f at PLT
-; SZ13-NEXT:    larl %r1, .LCPI67_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI67_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, log10f at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %log10 = call <3 x float> @llvm.experimental.constrained.log10.v3f32(
@@ -3779,9 +3248,10 @@ define <4 x double> @constrained_vector_log10_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, log10 at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -3884,7 +3354,8 @@ define <2 x double> @constrained_vector_log2_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, log2 at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -3942,8 +3413,9 @@ define <3 x float> @constrained_vector_log2_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, log2f at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -3954,28 +3426,28 @@ define <3 x float> @constrained_vector_log2_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI72_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, log2f at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI72_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, log2f at PLT
-; SZ13-NEXT:    larl %r1, .LCPI72_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI72_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, log2f at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %log2 = call <3 x float> @llvm.experimental.constrained.log2.v3f32(
@@ -4095,9 +3567,10 @@ define <4 x double> @constrained_vector_log2_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, log2 at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -4156,8 +3629,11 @@ define <1 x float> @constrained_vector_rint_v1f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_rint_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    lde %f0, 0(%r2)
+; SZ13-NEXT:    fiebr %f1, 0, %f0
 ; SZ13-NEXT:    fiebr %f0, 0, %f0
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    vmrhf %v1, %v1, %v1
+; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -4171,10 +3647,10 @@ entry:
 define <2 x double> @constrained_vector_rint_v2f64(ptr %a) #0 {
 ; S390X-LABEL: constrained_vector_rint_v2f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f1, 0(%r2)
-; S390X-NEXT:    fidbr %f2, 0, %f0
-; S390X-NEXT:    fidbr %f0, 0, %f1
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f1, 8(%r2)
+; S390X-NEXT:    fidbr %f0, 0, %f0
+; S390X-NEXT:    fidbr %f2, 0, %f1
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_rint_v2f64:
@@ -4195,26 +3671,28 @@ define <3 x float> @constrained_vector_rint_v3f32(ptr %a) #0 {
 ; S390X-LABEL: constrained_vector_rint_v3f32:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
+; S390X-NEXT:    sllg %r1, %r0, 32
 ; S390X-NEXT:    le %f0, 8(%r2)
 ; S390X-NEXT:    ldgr %f1, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
 ; S390X-NEXT:    ldgr %f2, %r0
 ; S390X-NEXT:    fiebr %f4, 0, %f0
-; S390X-NEXT:    fiebr %f2, 0, %f2
-; S390X-NEXT:    fiebr %f0, 0, %f1
+; S390X-NEXT:    fiebr %f0, 0, %f2
+; S390X-NEXT:    fiebr %f2, 0, %f1
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_rint_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebr %f1, 0, %f1
 ; SZ13-NEXT:    fiebr %f2, 0, %f2
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebr %f2, 0, %f0
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebr %f0, 0, %f0
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
  entry:
@@ -4243,11 +3721,11 @@ define void @constrained_vector_rint_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_rint_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 0, 0
 ; SZ13-NEXT:    vfidb %v0, %v0, 0, 0
-; SZ13-NEXT:    fidbra %f1, 0, %f1, 0
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -4262,14 +3740,14 @@ entry:
 define <4 x double> @constrained_vector_rint_v4f64(ptr %a) #0 {
 ; S390X-LABEL: constrained_vector_rint_v4f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    ld %f0, 24(%r2)
-; S390X-NEXT:    ld %f1, 16(%r2)
-; S390X-NEXT:    ld %f2, 8(%r2)
-; S390X-NEXT:    ld %f3, 0(%r2)
-; S390X-NEXT:    fidbr %f6, 0, %f0
-; S390X-NEXT:    fidbr %f4, 0, %f1
-; S390X-NEXT:    fidbr %f2, 0, %f2
-; S390X-NEXT:    fidbr %f0, 0, %f3
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f1, 8(%r2)
+; S390X-NEXT:    ld %f3, 16(%r2)
+; S390X-NEXT:    ld %f5, 24(%r2)
+; S390X-NEXT:    fidbr %f0, 0, %f0
+; S390X-NEXT:    fidbr %f2, 0, %f1
+; S390X-NEXT:    fidbr %f4, 0, %f3
+; S390X-NEXT:    fidbr %f6, 0, %f5
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_rint_v4f64:
@@ -4304,8 +3782,11 @@ define <1 x float> @constrained_vector_nearbyint_v1f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_nearbyint_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    lde %f0, 0(%r2)
+; SZ13-NEXT:    fiebra %f1, 0, %f0, 4
 ; SZ13-NEXT:    fiebra %f0, 0, %f0, 4
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    vmrhf %v1, %v1, %v1
+; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -4328,13 +3809,14 @@ define <2 x double> @constrained_vector_nearbyint_v2f64(ptr %a) #0 {
 ; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
 ; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f8, 0(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f8, 8(%r2)
 ; S390X-NEXT:    brasl %r14, nearbyint at PLT
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, nearbyint at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -4369,11 +3851,12 @@ define <3 x float> @constrained_vector_nearbyint_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    le %f0, 8(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
-; S390X-NEXT:    ldgr %f8, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
-; S390X-NEXT:    ldgr %f9, %r0
+; S390X-NEXT:    le %f8, 8(%r2)
+; S390X-NEXT:    sllg %r1, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
+; S390X-NEXT:    ldgr %f0, %r0
+; S390X-NEXT:    ldgr %f9, %r1
+; S390X-NEXT:    # kill: def $f0s killed $f0s killed $f0d
 ; S390X-NEXT:    brasl %r14, nearbyintf at PLT
 ; S390X-NEXT:    ler %f10, %f0
 ; S390X-NEXT:    ler %f0, %f9
@@ -4381,8 +3864,9 @@ define <3 x float> @constrained_vector_nearbyint_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    brasl %r14, nearbyintf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f10
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -4392,13 +3876,15 @@ define <3 x float> @constrained_vector_nearbyint_v3f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_nearbyint_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebra %f1, 0, %f1, 4
 ; SZ13-NEXT:    fiebra %f2, 0, %f2, 4
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebra %f2, 0, %f0, 4
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebra %f0, 0, %f0, 4
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
@@ -4448,11 +3934,11 @@ define void @constrained_vector_nearbyint_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_nearbyint_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 4, 0
 ; SZ13-NEXT:    vfidb %v0, %v0, 4, 0
-; SZ13-NEXT:    fidbra %f1, 0, %f1, 4
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -4480,10 +3966,10 @@ define <4 x double> @constrained_vector_nearbyint_v4f64(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    .cfi_offset %f11, -192
-; S390X-NEXT:    ld %f8, 0(%r2)
-; S390X-NEXT:    ld %f9, 8(%r2)
-; S390X-NEXT:    ld %f0, 24(%r2)
-; S390X-NEXT:    ld %f10, 16(%r2)
+; S390X-NEXT:    ld %f8, 24(%r2)
+; S390X-NEXT:    ld %f9, 16(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f10, 8(%r2)
 ; S390X-NEXT:    brasl %r14, nearbyint at PLT
 ; S390X-NEXT:    ldr %f11, %f0
 ; S390X-NEXT:    ldr %f0, %f10
@@ -4494,9 +3980,10 @@ define <4 x double> @constrained_vector_nearbyint_v4f64(ptr %a) #0 {
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, nearbyint at PLT
-; S390X-NEXT:    ldr %f2, %f9
-; S390X-NEXT:    ldr %f4, %f10
-; S390X-NEXT:    ldr %f6, %f11
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f11
+; S390X-NEXT:    ldr %f2, %f10
+; S390X-NEXT:    ldr %f4, %f9
 ; S390X-NEXT:    ld %f8, 184(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 168(%r15) # 8-byte Reload
@@ -4523,34 +4010,14 @@ entry:
 define <1 x float> @constrained_vector_maxnum_v1f32() #0 {
 ; S390X-LABEL: constrained_vector_maxnum_v1f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -160
-; S390X-NEXT:    .cfi_def_cfa_offset 320
 ; S390X-NEXT:    larl %r1, .LCPI85_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI85_1
-; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fmaxf at PLT
-; S390X-NEXT:    lmg %r14, %r15, 272(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_maxnum_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -160
-; SZ13-NEXT:    .cfi_def_cfa_offset 320
 ; SZ13-NEXT:    larl %r1, .LCPI85_0
-; SZ13-NEXT:    larl %r2, .LCPI85_1
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lde %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmaxf at PLT
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vlr %v24, %v0
-; SZ13-NEXT:    lmg %r14, %r15, 272(%r15)
+; SZ13-NEXT:    vlrepf %v24, 0(%r1)
 ; SZ13-NEXT:    br %r14
 entry:
   %max = call <1 x float> @llvm.experimental.constrained.maxnum.v1f32(
@@ -4562,53 +4029,16 @@ entry:
 define <2 x double> @constrained_vector_maxnum_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_maxnum_v2f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -168
-; S390X-NEXT:    .cfi_def_cfa_offset 328
-; S390X-NEXT:    std %f8, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    larl %r1, .LCPI86_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI86_1
 ; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fmax at PLT
-; S390X-NEXT:    larl %r1, .LCPI86_2
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI86_3
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmax at PLT
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_maxnum_v2f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -176
-; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI86_0
-; SZ13-NEXT:    larl %r2, .LCPI86_1
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmax at PLT
-; SZ13-NEXT:    larl %r1, .LCPI86_2
-; SZ13-NEXT:    larl %r2, .LCPI86_3
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmax at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %max = call <2 x double> @llvm.experimental.constrained.maxnum.v2f64(
@@ -4621,79 +4051,18 @@ entry:
 define <3 x float> @constrained_vector_maxnum_v3f32() #0 {
 ; S390X-LABEL: constrained_vector_maxnum_v3f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -184
-; S390X-NEXT:    .cfi_def_cfa_offset 344
-; S390X-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    larl %r1, .LCPI87_0
 ; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI87_1
-; S390X-NEXT:    le %f8, 0(%r1)
-; S390X-NEXT:    ler %f2, %f8
-; S390X-NEXT:    brasl %r14, fmaxf at PLT
-; S390X-NEXT:    larl %r1, .LCPI87_2
-; S390X-NEXT:    le %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI87_3
-; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    ler %f9, %f0
-; S390X-NEXT:    ler %f0, %f1
-; S390X-NEXT:    brasl %r14, fmaxf at PLT
-; S390X-NEXT:    larl %r1, .LCPI87_4
 ; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    ler %f10, %f0
-; S390X-NEXT:    ler %f0, %f8
-; S390X-NEXT:    brasl %r14, fmaxf at PLT
-; S390X-NEXT:    ler %f2, %f10
-; S390X-NEXT:    ler %f4, %f9
-; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 296(%r15)
+; S390X-NEXT:    larl %r1, .LCPI87_2
+; S390X-NEXT:    le %f4, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_maxnum_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -200
-; SZ13-NEXT:    .cfi_def_cfa_offset 360
-; SZ13-NEXT:    std %f8, 192(%r15) # 8-byte Spill
-; SZ13-NEXT:    .cfi_offset %f8, -168
-; SZ13-NEXT:    larl %r2, .LCPI87_1
 ; SZ13-NEXT:    larl %r1, .LCPI87_0
-; SZ13-NEXT:    lde %f8, 0(%r2)
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fmaxf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI87_2
-; SZ13-NEXT:    lde %f2, 0(%r1)
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ldr %f0, %f8
-; SZ13-NEXT:    brasl %r14, fmaxf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI87_3
-; SZ13-NEXT:    larl %r2, .LCPI87_4
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lde %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmaxf at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 312(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %max = call <3 x float> @llvm.experimental.constrained.maxnum.v3f32(
@@ -4796,93 +4165,22 @@ entry:
 define <4 x double> @constrained_vector_maxnum_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_maxnum_v4f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -184
-; S390X-NEXT:    .cfi_def_cfa_offset 344
-; S390X-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    larl %r1, .LCPI89_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI89_1
 ; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fmax at PLT
 ; S390X-NEXT:    larl %r1, .LCPI89_2
-; S390X-NEXT:    ld %f1, 0(%r1)
+; S390X-NEXT:    ld %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI89_3
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmax at PLT
-; S390X-NEXT:    larl %r1, .LCPI89_4
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI89_5
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f9, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmax at PLT
-; S390X-NEXT:    larl %r1, .LCPI89_6
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI89_7
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f10, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmax at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
-; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 296(%r15)
+; S390X-NEXT:    ld %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_maxnum_v4f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
 ; SZ13-NEXT:    larl %r1, .LCPI89_0
 ; SZ13-NEXT:    larl %r2, .LCPI89_1
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmax at PLT
-; SZ13-NEXT:    larl %r1, .LCPI89_2
-; SZ13-NEXT:    larl %r2, .LCPI89_3
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmax at PLT
-; SZ13-NEXT:    vl %v3, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    larl %r1, .LCPI89_4
-; SZ13-NEXT:    larl %r2, .LCPI89_5
-; SZ13-NEXT:    ld %f1, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v0, %v0, %v3
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ldr %f0, %f1
-; SZ13-NEXT:    brasl %r14, fmax at PLT
-; SZ13-NEXT:    larl %r1, .LCPI89_6
-; SZ13-NEXT:    larl %r2, .LCPI89_7
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmax at PLT
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vl %v24, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v26, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vl %v26, 0(%r2), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %max = call <4 x double> @llvm.experimental.constrained.maxnum.v4f64(
@@ -4897,34 +4195,14 @@ entry:
 define <1 x float> @constrained_vector_minnum_v1f32() #0 {
 ; S390X-LABEL: constrained_vector_minnum_v1f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -160
-; S390X-NEXT:    .cfi_def_cfa_offset 320
 ; S390X-NEXT:    larl %r1, .LCPI90_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI90_1
-; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fminf at PLT
-; S390X-NEXT:    lmg %r14, %r15, 272(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_minnum_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -160
-; SZ13-NEXT:    .cfi_def_cfa_offset 320
 ; SZ13-NEXT:    larl %r1, .LCPI90_0
-; SZ13-NEXT:    larl %r2, .LCPI90_1
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lde %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fminf at PLT
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vlr %v24, %v0
-; SZ13-NEXT:    lmg %r14, %r15, 272(%r15)
+; SZ13-NEXT:    vlrepf %v24, 0(%r1)
 ; SZ13-NEXT:    br %r14
  entry:
   %min = call <1 x float> @llvm.experimental.constrained.minnum.v1f32(
@@ -4936,53 +4214,16 @@ define <1 x float> @constrained_vector_minnum_v1f32() #0 {
 define <2 x double> @constrained_vector_minnum_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_minnum_v2f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -168
-; S390X-NEXT:    .cfi_def_cfa_offset 328
-; S390X-NEXT:    std %f8, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    larl %r1, .LCPI91_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI91_1
 ; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fmin at PLT
-; S390X-NEXT:    larl %r1, .LCPI91_2
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI91_3
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmin at PLT
-; S390X-NEXT:    ldr %f2, %f8
-; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_minnum_v2f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -176
-; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI91_0
-; SZ13-NEXT:    larl %r2, .LCPI91_1
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmin at PLT
-; SZ13-NEXT:    larl %r1, .LCPI91_2
-; SZ13-NEXT:    larl %r2, .LCPI91_3
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmin at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %min = call <2 x double> @llvm.experimental.constrained.minnum.v2f64(
@@ -4995,79 +4236,18 @@ entry:
 define <3 x float> @constrained_vector_minnum_v3f32() #0 {
 ; S390X-LABEL: constrained_vector_minnum_v3f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -184
-; S390X-NEXT:    .cfi_def_cfa_offset 344
-; S390X-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    larl %r1, .LCPI92_0
 ; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI92_1
-; S390X-NEXT:    le %f8, 0(%r1)
-; S390X-NEXT:    ler %f2, %f8
-; S390X-NEXT:    brasl %r14, fminf at PLT
-; S390X-NEXT:    larl %r1, .LCPI92_2
-; S390X-NEXT:    le %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI92_3
-; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    ler %f9, %f0
-; S390X-NEXT:    ler %f0, %f1
-; S390X-NEXT:    brasl %r14, fminf at PLT
-; S390X-NEXT:    larl %r1, .LCPI92_4
 ; S390X-NEXT:    le %f2, 0(%r1)
-; S390X-NEXT:    ler %f10, %f0
-; S390X-NEXT:    ler %f0, %f8
-; S390X-NEXT:    brasl %r14, fminf at PLT
-; S390X-NEXT:    ler %f2, %f10
-; S390X-NEXT:    ler %f4, %f9
-; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 296(%r15)
+; S390X-NEXT:    larl %r1, .LCPI92_2
+; S390X-NEXT:    le %f4, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_minnum_v3f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -200
-; SZ13-NEXT:    .cfi_def_cfa_offset 360
-; SZ13-NEXT:    std %f8, 192(%r15) # 8-byte Spill
-; SZ13-NEXT:    .cfi_offset %f8, -168
-; SZ13-NEXT:    larl %r2, .LCPI92_1
 ; SZ13-NEXT:    larl %r1, .LCPI92_0
-; SZ13-NEXT:    lde %f8, 0(%r2)
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    ldr %f2, %f8
-; SZ13-NEXT:    brasl %r14, fminf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI92_2
-; SZ13-NEXT:    lde %f2, 0(%r1)
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ldr %f0, %f8
-; SZ13-NEXT:    brasl %r14, fminf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI92_3
-; SZ13-NEXT:    larl %r2, .LCPI92_4
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lde %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fminf at PLT
-; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 312(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %min = call <3 x float> @llvm.experimental.constrained.minnum.v3f32(
@@ -5174,93 +4354,22 @@ entry:
 define <4 x double> @constrained_vector_minnum_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_minnum_v4f64:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -184
-; S390X-NEXT:    .cfi_def_cfa_offset 344
-; S390X-NEXT:    std %f8, 176(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f9, 168(%r15) # 8-byte Spill
-; S390X-NEXT:    std %f10, 160(%r15) # 8-byte Spill
-; S390X-NEXT:    .cfi_offset %f8, -168
-; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    larl %r1, .LCPI94_0
 ; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI94_1
 ; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    brasl %r14, fmin at PLT
 ; S390X-NEXT:    larl %r1, .LCPI94_2
-; S390X-NEXT:    ld %f1, 0(%r1)
+; S390X-NEXT:    ld %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI94_3
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f8, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmin at PLT
-; S390X-NEXT:    larl %r1, .LCPI94_4
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI94_5
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f9, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmin at PLT
-; S390X-NEXT:    larl %r1, .LCPI94_6
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    larl %r1, .LCPI94_7
-; S390X-NEXT:    ld %f2, 0(%r1)
-; S390X-NEXT:    ldr %f10, %f0
-; S390X-NEXT:    ldr %f0, %f1
-; S390X-NEXT:    brasl %r14, fmin at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
-; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
-; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
-; S390X-NEXT:    lmg %r14, %r15, 296(%r15)
+; S390X-NEXT:    ld %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_minnum_v4f64:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
-; SZ13-NEXT:    .cfi_offset %r14, -48
-; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
 ; SZ13-NEXT:    larl %r1, .LCPI94_0
 ; SZ13-NEXT:    larl %r2, .LCPI94_1
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmin at PLT
-; SZ13-NEXT:    larl %r1, .LCPI94_2
-; SZ13-NEXT:    larl %r2, .LCPI94_3
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmin at PLT
-; SZ13-NEXT:    vl %v3, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    larl %r1, .LCPI94_4
-; SZ13-NEXT:    larl %r2, .LCPI94_5
-; SZ13-NEXT:    ld %f1, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v0, %v0, %v3
-; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ldr %f0, %f1
-; SZ13-NEXT:    brasl %r14, fmin at PLT
-; SZ13-NEXT:    larl %r1, .LCPI94_6
-; SZ13-NEXT:    larl %r2, .LCPI94_7
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f2, 0(%r2)
-; SZ13-NEXT:    brasl %r14, fmin at PLT
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vl %v24, 160(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    # kill: def $f0d killed $f0d def $v0
-; SZ13-NEXT:    vmrhg %v26, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vl %v26, 0(%r2), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %min = call <4 x double> @llvm.experimental.constrained.minnum.v4f64(
@@ -5276,15 +4385,13 @@ define <1 x float> @constrained_vector_fptrunc_v1f64() #0 {
 ; S390X-LABEL: constrained_vector_fptrunc_v1f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI95_0
-; S390X-NEXT:    ld %f0, 0(%r1)
-; S390X-NEXT:    ledbr %f0, %f0
+; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fptrunc_v1f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI95_0
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    wledb %v24, %f0, 0, 0
+; SZ13-NEXT:    vlrepf %v24, 0(%r1)
 ; SZ13-NEXT:    br %r14
 entry:
   %result = call <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(
@@ -5298,23 +4405,15 @@ define <2 x float> @constrained_vector_fptrunc_v2f64() #0 {
 ; S390X-LABEL: constrained_vector_fptrunc_v2f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI96_0
-; S390X-NEXT:    ld %f0, 0(%r1)
+; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI96_1
-; S390X-NEXT:    ld %f1, 0(%r1)
-; S390X-NEXT:    ledbr %f2, %f0
-; S390X-NEXT:    ledbr %f0, %f1
+; S390X-NEXT:    le %f2, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fptrunc_v2f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI96_0
-; SZ13-NEXT:    larl %r2, .LCPI96_1
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f1, 0(%r2)
-; SZ13-NEXT:    ledbra %f0, 0, %f0, 0
-; SZ13-NEXT:    ledbra %f1, 0, %f1, 0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v0
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %result = call <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
@@ -5368,36 +4467,19 @@ define <4 x float> @constrained_vector_fptrunc_v4f64() #0 {
 ; S390X-LABEL: constrained_vector_fptrunc_v4f64:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI98_0
-; S390X-NEXT:    ld %f0, 0(%r1)
+; S390X-NEXT:    le %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI98_1
-; S390X-NEXT:    ld %f1, 0(%r1)
+; S390X-NEXT:    le %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI98_2
-; S390X-NEXT:    ld %f2, 0(%r1)
+; S390X-NEXT:    le %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI98_3
-; S390X-NEXT:    ld %f3, 0(%r1)
-; S390X-NEXT:    ledbr %f6, %f0
-; S390X-NEXT:    ledbr %f4, %f1
-; S390X-NEXT:    ledbr %f2, %f2
-; S390X-NEXT:    ledbr %f0, %f3
+; S390X-NEXT:    le %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fptrunc_v4f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI98_0
-; SZ13-NEXT:    larl %r2, .LCPI98_1
-; SZ13-NEXT:    larl %r3, .LCPI98_2
-; SZ13-NEXT:    larl %r4, .LCPI98_3
-; SZ13-NEXT:    ld %f0, 0(%r1)
-; SZ13-NEXT:    ld %f1, 0(%r2)
-; SZ13-NEXT:    ld %f2, 0(%r3)
-; SZ13-NEXT:    ld %f3, 0(%r4)
-; SZ13-NEXT:    ledbra %f0, 0, %f0, 0
-; SZ13-NEXT:    ledbra %f1, 0, %f1, 0
-; SZ13-NEXT:    ledbra %f2, 0, %f2, 0
-; SZ13-NEXT:    ledbra %f3, 0, %f3, 0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vmrhf %v1, %v3, %v2
-; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %result = call <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(
@@ -5412,14 +4494,13 @@ define <1 x double> @constrained_vector_fpext_v1f32() #0 {
 ; S390X-LABEL: constrained_vector_fpext_v1f32:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI99_0
-; S390X-NEXT:    ldeb %f0, 0(%r1)
+; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fpext_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI99_0
-; SZ13-NEXT:    ldeb %f0, 0(%r1)
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vlrepg %v24, 0(%r1)
 ; SZ13-NEXT:    br %r14
 entry:
   %result = call <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(
@@ -5432,18 +4513,15 @@ define <2 x double> @constrained_vector_fpext_v2f32() #0 {
 ; S390X-LABEL: constrained_vector_fpext_v2f32:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI100_0
-; S390X-NEXT:    ldeb %f2, 0(%r1)
+; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI100_1
-; S390X-NEXT:    ldeb %f0, 0(%r1)
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fpext_v2f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI100_0
-; SZ13-NEXT:    larl %r2, .LCPI100_1
-; SZ13-NEXT:    ldeb %f0, 0(%r1)
-; SZ13-NEXT:    ldeb %f1, 0(%r2)
-; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %result = call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(
@@ -5491,27 +4569,21 @@ define <4 x double> @constrained_vector_fpext_v4f32() #0 {
 ; S390X-LABEL: constrained_vector_fpext_v4f32:
 ; S390X:       # %bb.0: # %entry
 ; S390X-NEXT:    larl %r1, .LCPI102_0
-; S390X-NEXT:    ldeb %f6, 0(%r1)
+; S390X-NEXT:    ld %f0, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI102_1
-; S390X-NEXT:    ldeb %f4, 0(%r1)
+; S390X-NEXT:    ld %f2, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI102_2
-; S390X-NEXT:    ldeb %f2, 0(%r1)
+; S390X-NEXT:    ld %f4, 0(%r1)
 ; S390X-NEXT:    larl %r1, .LCPI102_3
-; S390X-NEXT:    ldeb %f0, 0(%r1)
+; S390X-NEXT:    ld %f6, 0(%r1)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_fpext_v4f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    larl %r1, .LCPI102_0
 ; SZ13-NEXT:    larl %r2, .LCPI102_1
-; SZ13-NEXT:    larl %r3, .LCPI102_2
-; SZ13-NEXT:    larl %r4, .LCPI102_3
-; SZ13-NEXT:    ldeb %f0, 0(%r1)
-; SZ13-NEXT:    ldeb %f1, 0(%r2)
-; SZ13-NEXT:    ldeb %f2, 0(%r3)
-; SZ13-NEXT:    ldeb %f3, 0(%r4)
-; SZ13-NEXT:    vmrhg %v24, %v1, %v0
-; SZ13-NEXT:    vmrhg %v26, %v3, %v2
+; SZ13-NEXT:    vl %v24, 0(%r1), 3
+; SZ13-NEXT:    vl %v26, 0(%r2), 3
 ; SZ13-NEXT:    br %r14
 entry:
   %result = call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(
@@ -5524,22 +4596,13 @@ entry:
 define <1 x float> @constrained_vector_ceil_v1f32(ptr %a) #0 {
 ; S390X-LABEL: constrained_vector_ceil_v1f32:
 ; S390X:       # %bb.0: # %entry
-; S390X-NEXT:    stmg %r14, %r15, 112(%r15)
-; S390X-NEXT:    .cfi_offset %r14, -48
-; S390X-NEXT:    .cfi_offset %r15, -40
-; S390X-NEXT:    aghi %r15, -160
-; S390X-NEXT:    .cfi_def_cfa_offset 320
 ; S390X-NEXT:    larl %r1, .LCPI103_0
 ; S390X-NEXT:    le %f0, 0(%r1)
-; S390X-NEXT:    brasl %r14, ceilf at PLT
-; S390X-NEXT:    lmg %r14, %r15, 272(%r15)
 ; S390X-NEXT:    br %r14
 ;
 ; SZ13-LABEL: constrained_vector_ceil_v1f32:
 ; SZ13:       # %bb.0: # %entry
-; SZ13-NEXT:    vgmf %v0, 2, 9
-; SZ13-NEXT:    fiebra %f0, 6, %f0, 4
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vgmf %v24, 1, 1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -5561,13 +4624,14 @@ define <2 x double> @constrained_vector_ceil_v2f64(ptr %a) #0 {
 ; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
 ; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f8, 0(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f8, 8(%r2)
 ; S390X-NEXT:    brasl %r14, ceil at PLT
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, ceil at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -5601,11 +4665,12 @@ define <3 x float> @constrained_vector_ceil_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    le %f0, 8(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
-; S390X-NEXT:    ldgr %f8, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
-; S390X-NEXT:    ldgr %f9, %r0
+; S390X-NEXT:    le %f8, 8(%r2)
+; S390X-NEXT:    sllg %r1, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
+; S390X-NEXT:    ldgr %f0, %r0
+; S390X-NEXT:    ldgr %f9, %r1
+; S390X-NEXT:    # kill: def $f0s killed $f0s killed $f0d
 ; S390X-NEXT:    brasl %r14, ceilf at PLT
 ; S390X-NEXT:    ler %f10, %f0
 ; S390X-NEXT:    ler %f0, %f9
@@ -5613,8 +4678,9 @@ define <3 x float> @constrained_vector_ceil_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    brasl %r14, ceilf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f10
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -5624,13 +4690,15 @@ define <3 x float> @constrained_vector_ceil_v3f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_ceil_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebra %f1, 6, %f1, 4
 ; SZ13-NEXT:    fiebra %f2, 6, %f2, 4
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebra %f2, 6, %f0, 4
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebra %f0, 6, %f0, 4
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
@@ -5679,11 +4747,11 @@ define void @constrained_vector_ceil_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_ceil_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 4, 6
 ; SZ13-NEXT:    vfidb %v0, %v0, 4, 6
-; SZ13-NEXT:    fidbra %f1, 6, %f1, 4
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -5710,8 +4778,11 @@ define <1 x float> @constrained_vector_floor_v1f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_floor_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    lde %f0, 0(%r2)
+; SZ13-NEXT:    fiebra %f1, 7, %f0, 4
 ; SZ13-NEXT:    fiebra %f0, 7, %f0, 4
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    vmrhf %v1, %v1, %v1
+; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -5734,13 +4805,14 @@ define <2 x double> @constrained_vector_floor_v2f64(ptr %a) #0 {
 ; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
 ; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f8, 0(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f8, 8(%r2)
 ; S390X-NEXT:    brasl %r14, floor at PLT
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, floor at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -5774,11 +4846,12 @@ define <3 x float> @constrained_vector_floor_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    le %f0, 8(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
-; S390X-NEXT:    ldgr %f8, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
-; S390X-NEXT:    ldgr %f9, %r0
+; S390X-NEXT:    le %f8, 8(%r2)
+; S390X-NEXT:    sllg %r1, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
+; S390X-NEXT:    ldgr %f0, %r0
+; S390X-NEXT:    ldgr %f9, %r1
+; S390X-NEXT:    # kill: def $f0s killed $f0s killed $f0d
 ; S390X-NEXT:    brasl %r14, floorf at PLT
 ; S390X-NEXT:    ler %f10, %f0
 ; S390X-NEXT:    ler %f0, %f9
@@ -5786,8 +4859,9 @@ define <3 x float> @constrained_vector_floor_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    brasl %r14, floorf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f10
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -5797,13 +4871,15 @@ define <3 x float> @constrained_vector_floor_v3f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_floor_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebra %f1, 7, %f1, 4
 ; SZ13-NEXT:    fiebra %f2, 7, %f2, 4
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebra %f2, 7, %f0, 4
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebra %f0, 7, %f0, 4
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
@@ -5852,11 +4928,11 @@ define void @constrained_vector_floor_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_floor_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 4, 7
 ; SZ13-NEXT:    vfidb %v0, %v0, 4, 7
-; SZ13-NEXT:    fidbra %f1, 7, %f1, 4
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -5883,8 +4959,11 @@ define <1 x float> @constrained_vector_round_v1f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_round_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    lde %f0, 0(%r2)
+; SZ13-NEXT:    fiebra %f1, 1, %f0, 4
 ; SZ13-NEXT:    fiebra %f0, 1, %f0, 4
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    vmrhf %v1, %v1, %v1
+; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -5906,13 +4985,14 @@ define <2 x double> @constrained_vector_round_v2f64(ptr %a) #0 {
 ; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
 ; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f8, 0(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f8, 8(%r2)
 ; S390X-NEXT:    brasl %r14, round at PLT
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, round at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -5946,11 +5026,12 @@ define <3 x float> @constrained_vector_round_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    le %f0, 8(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
-; S390X-NEXT:    ldgr %f8, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
-; S390X-NEXT:    ldgr %f9, %r0
+; S390X-NEXT:    le %f8, 8(%r2)
+; S390X-NEXT:    sllg %r1, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
+; S390X-NEXT:    ldgr %f0, %r0
+; S390X-NEXT:    ldgr %f9, %r1
+; S390X-NEXT:    # kill: def $f0s killed $f0s killed $f0d
 ; S390X-NEXT:    brasl %r14, roundf at PLT
 ; S390X-NEXT:    ler %f10, %f0
 ; S390X-NEXT:    ler %f0, %f9
@@ -5958,8 +5039,9 @@ define <3 x float> @constrained_vector_round_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    brasl %r14, roundf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f10
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -5969,13 +5051,15 @@ define <3 x float> @constrained_vector_round_v3f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_round_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebra %f1, 1, %f1, 4
 ; SZ13-NEXT:    fiebra %f2, 1, %f2, 4
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebra %f2, 1, %f0, 4
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebra %f0, 1, %f0, 4
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
@@ -6025,11 +5109,11 @@ define void @constrained_vector_round_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_round_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 4, 1
 ; SZ13-NEXT:    vfidb %v0, %v0, 4, 1
-; SZ13-NEXT:    fidbra %f1, 1, %f1, 4
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -6056,8 +5140,11 @@ define <1 x float> @constrained_vector_roundeven_v1f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_roundeven_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    lde %f0, 0(%r2)
+; SZ13-NEXT:    fiebra %f1, 4, %f0, 4
 ; SZ13-NEXT:    fiebra %f0, 4, %f0, 4
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    vmrhf %v1, %v1, %v1
+; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -6079,13 +5166,14 @@ define <2 x double> @constrained_vector_roundeven_v2f64(ptr %a) #0 {
 ; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
 ; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f8, 0(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f8, 8(%r2)
 ; S390X-NEXT:    brasl %r14, roundeven at PLT
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, roundeven at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -6119,11 +5207,12 @@ define <3 x float> @constrained_vector_roundeven_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    le %f0, 8(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
-; S390X-NEXT:    ldgr %f8, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
-; S390X-NEXT:    ldgr %f9, %r0
+; S390X-NEXT:    le %f8, 8(%r2)
+; S390X-NEXT:    sllg %r1, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
+; S390X-NEXT:    ldgr %f0, %r0
+; S390X-NEXT:    ldgr %f9, %r1
+; S390X-NEXT:    # kill: def $f0s killed $f0s killed $f0d
 ; S390X-NEXT:    brasl %r14, roundevenf at PLT
 ; S390X-NEXT:    ler %f10, %f0
 ; S390X-NEXT:    ler %f0, %f9
@@ -6131,8 +5220,9 @@ define <3 x float> @constrained_vector_roundeven_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    brasl %r14, roundevenf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f10
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -6142,13 +5232,15 @@ define <3 x float> @constrained_vector_roundeven_v3f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_roundeven_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebra %f1, 4, %f1, 4
 ; SZ13-NEXT:    fiebra %f2, 4, %f2, 4
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebra %f2, 4, %f0, 4
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebra %f0, 4, %f0, 4
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
@@ -6197,11 +5289,11 @@ define void @constrained_vector_roundeven_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_roundeven_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 4, 4
 ; SZ13-NEXT:    vfidb %v0, %v0, 4, 4
-; SZ13-NEXT:    fidbra %f1, 4, %f1, 4
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -6228,8 +5320,11 @@ define <1 x float> @constrained_vector_trunc_v1f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_trunc_v1f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    lde %f0, 0(%r2)
+; SZ13-NEXT:    fiebra %f1, 5, %f0, 4
 ; SZ13-NEXT:    fiebra %f0, 5, %f0, 4
-; SZ13-NEXT:    vlr %v24, %v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    vmrhf %v1, %v1, %v1
+; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <1 x float>, ptr %a
@@ -6251,13 +5346,14 @@ define <2 x double> @constrained_vector_trunc_v2f64(ptr %a) #0 {
 ; S390X-NEXT:    std %f9, 160(%r15) # 8-byte Spill
 ; S390X-NEXT:    .cfi_offset %f8, -168
 ; S390X-NEXT:    .cfi_offset %f9, -176
-; S390X-NEXT:    ld %f0, 8(%r2)
-; S390X-NEXT:    ld %f8, 0(%r2)
+; S390X-NEXT:    ld %f0, 0(%r2)
+; S390X-NEXT:    ld %f8, 8(%r2)
 ; S390X-NEXT:    brasl %r14, trunc at PLT
 ; S390X-NEXT:    ldr %f9, %f0
 ; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    brasl %r14, trunc at PLT
-; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f9
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -6291,11 +5387,12 @@ define <3 x float> @constrained_vector_trunc_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    .cfi_offset %f9, -176
 ; S390X-NEXT:    .cfi_offset %f10, -184
 ; S390X-NEXT:    lg %r0, 0(%r2)
-; S390X-NEXT:    le %f0, 8(%r2)
-; S390X-NEXT:    risbg %r1, %r0, 0, 159, 0
-; S390X-NEXT:    ldgr %f8, %r1
-; S390X-NEXT:    sllg %r0, %r0, 32
-; S390X-NEXT:    ldgr %f9, %r0
+; S390X-NEXT:    le %f8, 8(%r2)
+; S390X-NEXT:    sllg %r1, %r0, 32
+; S390X-NEXT:    nilf %r0, 0
+; S390X-NEXT:    ldgr %f0, %r0
+; S390X-NEXT:    ldgr %f9, %r1
+; S390X-NEXT:    # kill: def $f0s killed $f0s killed $f0d
 ; S390X-NEXT:    brasl %r14, truncf at PLT
 ; S390X-NEXT:    ler %f10, %f0
 ; S390X-NEXT:    ler %f0, %f9
@@ -6303,8 +5400,9 @@ define <3 x float> @constrained_vector_trunc_v3f32(ptr %a) #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    brasl %r14, truncf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f10
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -6314,13 +5412,15 @@ define <3 x float> @constrained_vector_trunc_v3f32(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_trunc_v3f32:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    vrepf %v1, %v0, 2
-; SZ13-NEXT:    vrepf %v2, %v0, 1
+; SZ13-NEXT:    vrepf %v1, %v0, 3
+; SZ13-NEXT:    vrepf %v2, %v0, 2
 ; SZ13-NEXT:    fiebra %f1, 5, %f1, 4
 ; SZ13-NEXT:    fiebra %f2, 5, %f2, 4
+; SZ13-NEXT:    vmrhf %v1, %v2, %v1
+; SZ13-NEXT:    fiebra %f2, 5, %f0, 4
+; SZ13-NEXT:    vrepf %v0, %v0, 1
 ; SZ13-NEXT:    fiebra %f0, 5, %f0, 4
-; SZ13-NEXT:    vmrhf %v0, %v0, %v2
-; SZ13-NEXT:    vrepf %v1, %v1, 0
+; SZ13-NEXT:    vmrhf %v0, %v2, %v0
 ; SZ13-NEXT:    vmrhg %v24, %v0, %v1
 ; SZ13-NEXT:    br %r14
 entry:
@@ -6369,11 +5469,11 @@ define void @constrained_vector_trunc_v3f64(ptr %a) #0 {
 ; SZ13-LABEL: constrained_vector_trunc_v3f64:
 ; SZ13:       # %bb.0: # %entry
 ; SZ13-NEXT:    vl %v0, 0(%r2), 4
-; SZ13-NEXT:    ld %f1, 16(%r2)
+; SZ13-NEXT:    vlrepg %v1, 16(%r2)
+; SZ13-NEXT:    vfidb %v1, %v1, 4, 5
 ; SZ13-NEXT:    vfidb %v0, %v0, 4, 5
-; SZ13-NEXT:    fidbra %f1, 5, %f1, 4
 ; SZ13-NEXT:    vst %v0, 0(%r2), 4
-; SZ13-NEXT:    std %f1, 16(%r2)
+; SZ13-NEXT:    vsteg %v1, 16(%r2), 0
 ; SZ13-NEXT:    br %r14
 entry:
   %b = load <3 x double>, ptr %a
@@ -6438,7 +5538,8 @@ define <2 x double> @constrained_vector_tan_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, tan at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -6496,8 +5597,9 @@ define <3 x float> @constrained_vector_tan_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, tanf at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -6508,28 +5610,28 @@ define <3 x float> @constrained_vector_tan_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI125_0
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, tanf at PLT
 ; SZ13-NEXT:    larl %r1, .LCPI125_1
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, tanf at PLT
-; SZ13-NEXT:    larl %r1, .LCPI125_2
+; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v1
+; SZ13-NEXT:    larl %r1, .LCPI125_2
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    brasl %r14, tanf at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %tan = call <3 x float> @llvm.experimental.constrained.tan.v3f32(
@@ -6649,9 +5751,10 @@ define <4 x double> @constrained_vector_tan_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, tan at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
@@ -6763,7 +5866,8 @@ define <2 x double> @constrained_vector_atan2_v2f64() #0 {
 ; S390X-NEXT:    ldr %f8, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, atan2 at PLT
-; S390X-NEXT:    ldr %f2, %f8
+; S390X-NEXT:    ldr %f2, %f0
+; S390X-NEXT:    ldr %f0, %f8
 ; S390X-NEXT:    ld %f8, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 280(%r15)
 ; S390X-NEXT:    br %r14
@@ -6832,8 +5936,9 @@ define <3 x float> @constrained_vector_atan2_v3f32() #0 {
 ; S390X-NEXT:    ler %f9, %f0
 ; S390X-NEXT:    ler %f0, %f1
 ; S390X-NEXT:    brasl %r14, atan2f at PLT
+; S390X-NEXT:    ler %f4, %f0
+; S390X-NEXT:    ler %f0, %f8
 ; S390X-NEXT:    ler %f2, %f9
-; S390X-NEXT:    ler %f4, %f8
 ; S390X-NEXT:    ld %f8, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 160(%r15) # 8-byte Reload
 ; S390X-NEXT:    lmg %r14, %r15, 288(%r15)
@@ -6844,8 +5949,8 @@ define <3 x float> @constrained_vector_atan2_v3f32() #0 {
 ; SZ13-NEXT:    stmg %r14, %r15, 112(%r15)
 ; SZ13-NEXT:    .cfi_offset %r14, -48
 ; SZ13-NEXT:    .cfi_offset %r15, -40
-; SZ13-NEXT:    aghi %r15, -192
-; SZ13-NEXT:    .cfi_def_cfa_offset 352
+; SZ13-NEXT:    aghi %r15, -176
+; SZ13-NEXT:    .cfi_def_cfa_offset 336
 ; SZ13-NEXT:    larl %r1, .LCPI130_0
 ; SZ13-NEXT:    larl %r2, .LCPI130_1
 ; SZ13-NEXT:    lde %f0, 0(%r1)
@@ -6854,24 +5959,25 @@ define <3 x float> @constrained_vector_atan2_v3f32() #0 {
 ; SZ13-NEXT:    larl %r1, .LCPI130_2
 ; SZ13-NEXT:    larl %r2, .LCPI130_3
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vst %v0, 176(%r15), 3 # 16-byte Spill
+; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
 ; SZ13-NEXT:    lde %f0, 0(%r1)
 ; SZ13-NEXT:    lde %f2, 0(%r2)
 ; SZ13-NEXT:    brasl %r14, atan2f at PLT
+; SZ13-NEXT:    vl %v3, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    larl %r1, .LCPI130_4
 ; SZ13-NEXT:    larl %r2, .LCPI130_5
+; SZ13-NEXT:    lde %f1, 0(%r1)
+; SZ13-NEXT:    lde %f2, 0(%r2)
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
+; SZ13-NEXT:    vmrhf %v0, %v0, %v3
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Spill
-; SZ13-NEXT:    lde %f0, 0(%r1)
-; SZ13-NEXT:    lde %f2, 0(%r2)
+; SZ13-NEXT:    ldr %f0, %f1
 ; SZ13-NEXT:    brasl %r14, atan2f at PLT
 ; SZ13-NEXT:    vl %v1, 160(%r15), 3 # 16-byte Reload
 ; SZ13-NEXT:    # kill: def $f0s killed $f0s def $v0
-; SZ13-NEXT:    vmrhf %v0, %v1, %v0
-; SZ13-NEXT:    vl %v1, 176(%r15), 3 # 16-byte Reload
-; SZ13-NEXT:    vrepf %v1, %v1, 0
-; SZ13-NEXT:    vmrhg %v24, %v0, %v1
-; SZ13-NEXT:    lmg %r14, %r15, 304(%r15)
+; SZ13-NEXT:    vrepf %v0, %v0, 0
+; SZ13-NEXT:    vmrhg %v24, %v1, %v0
+; SZ13-NEXT:    lmg %r14, %r15, 288(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
   %atan2 = call <3 x float> @llvm.experimental.constrained.atan2.v3f32(
@@ -7024,9 +6130,10 @@ define <4 x double> @constrained_vector_atan2_v4f64() #0 {
 ; S390X-NEXT:    ldr %f10, %f0
 ; S390X-NEXT:    ldr %f0, %f1
 ; S390X-NEXT:    brasl %r14, atan2 at PLT
-; S390X-NEXT:    ldr %f2, %f10
-; S390X-NEXT:    ldr %f4, %f9
-; S390X-NEXT:    ldr %f6, %f8
+; S390X-NEXT:    ldr %f6, %f0
+; S390X-NEXT:    ldr %f0, %f8
+; S390X-NEXT:    ldr %f2, %f9
+; S390X-NEXT:    ldr %f4, %f10
 ; S390X-NEXT:    ld %f8, 176(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f9, 168(%r15) # 8-byte Reload
 ; S390X-NEXT:    ld %f10, 160(%r15) # 8-byte Reload
diff --git a/llvm/test/CodeGen/VE/Scalar/cast.ll b/llvm/test/CodeGen/VE/Scalar/cast.ll
index 6f6c93a1e639f..985d329b1c277 100644
--- a/llvm/test/CodeGen/VE/Scalar/cast.ll
+++ b/llvm/test/CodeGen/VE/Scalar/cast.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc < %s -mtriple=ve-unknown-unknown | FileCheck %s
 
 define signext i32 @i() {
@@ -223,9 +224,9 @@ define i64 @q2ll(fp128) {
 define i64 @q2ull(fp128) {
 ; CHECK-LABEL: q2ull:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lea %s2, .LCPI{{[0-9]+}}_0 at lo
+; CHECK-NEXT:    lea %s2, .LCPI22_0 at lo
 ; CHECK-NEXT:    and %s2, %s2, (32)0
-; CHECK-NEXT:    lea.sl %s2, .LCPI{{[0-9]+}}_0 at hi(, %s2)
+; CHECK-NEXT:    lea.sl %s2, .LCPI22_0 at hi(, %s2)
 ; CHECK-NEXT:    ld %s4, 8(, %s2)
 ; CHECK-NEXT:    ld %s5, (, %s2)
 ; CHECK-NEXT:    fcmp.q %s3, %s0, %s4
@@ -580,28 +581,28 @@ define float @ull2f_nneg(i64 %x) {
 
 define float @ull2f_strict(i32 %x) {
 ; CHECK-LABEL: ull2f_strict:
-; CHECK:     # %bb.0:
-; CHECK-NEXT:	adds.l %s11, -16, %s11
-; CHECK-NEXT:		brge.l.t %s11, %s8, .LBB58_2
-; CHECK-NEXT:	# %bb.1:
-; CHECK-NEXT:		ld %s61, 24(, %s14)
-; CHECK-NEXT:		or %s62, 0, %s0
-; CHECK-NEXT:		lea %s63, 315
-; CHECK-NEXT:		shm.l %s63, (%s61)
-; CHECK-NEXT:		shm.l %s8, 8(%s61)
-; CHECK-NEXT:		shm.l %s11, 16(%s61)
-; CHECK-NEXT:		monc
-; CHECK-NEXT:		or %s0, 0, %s62
-; CHECK-NEXT:	.LBB58_2:
-; CHECK-NEXT:		lea %s1, 1127219200
-; CHECK-NEXT:		stl %s1, 12(, %s11)
-; CHECK-NEXT:		stl %s0, 8(, %s11)
-; CHECK-NEXT:		ld %s0, 8(, %s11)
-; CHECK-NEXT:		lea.sl %s1, 1127219200
-; CHECK-NEXT:		fsub.d %s0, %s0, %s1
-; CHECK-NEXT:		cvt.s.d %s0, %s0
-; CHECK-NEXT:		adds.l %s11, 16, %s11
-; CHECK-NEXT:		b.l.t (, %s10)
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    adds.l %s11, -16, %s11
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB58_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB58_2:
+; CHECK-NEXT:    lea %s1, 1127219200
+; CHECK-NEXT:    stl %s1, 12(, %s11)
+; CHECK-NEXT:    stl %s0, 8(, %s11)
+; CHECK-NEXT:    ld %s0, 8(, %s11)
+; CHECK-NEXT:    lea.sl %s1, 1127219200
+; CHECK-NEXT:    fsub.d %s0, %s0, %s1
+; CHECK-NEXT:    cvt.s.d %s0, %s0
+; CHECK-NEXT:    adds.l %s11, 16, %s11
+; CHECK-NEXT:    b.l.t (, %s10)
   %val = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %x, metadata !"round.tonearest", metadata !"fpexcept.strict")
   ret float %val
 }
@@ -629,9 +630,9 @@ define fp128 @ull2q(i64) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    srl %s1, %s0, 61
 ; CHECK-NEXT:    and %s1, 4, %s1
-; CHECK-NEXT:    lea %s2, .LCPI{{[0-9]+}}_0 at lo
+; CHECK-NEXT:    lea %s2, .LCPI60_0 at lo
 ; CHECK-NEXT:    and %s2, %s2, (32)0
-; CHECK-NEXT:    lea.sl %s2, .LCPI{{[0-9]+}}_0 at hi(, %s2)
+; CHECK-NEXT:    lea.sl %s2, .LCPI60_0 at hi(, %s2)
 ; CHECK-NEXT:    ldu %s1, (%s1, %s2)
 ; CHECK-NEXT:    cvt.q.s %s2, %s1
 ; CHECK-NEXT:    cvt.d.l %s0, %s0
@@ -1411,12 +1412,30 @@ define i128 @ui1282i128(i128 returned %0) {
 ; Function Attrs: norecurse nounwind readnone
 define float @i1282f(i128) {
 ; CHECK-LABEL: i1282f:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB147_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB147_2:
 ; CHECK-NEXT:    lea %s2, __floattisf at lo
 ; CHECK-NEXT:    and %s2, %s2, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __floattisf at hi(, %s2)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = sitofp i128 %0 to float
   ret float %2
 }
@@ -1424,12 +1443,30 @@ define float @i1282f(i128) {
 ; Function Attrs: norecurse nounwind readnone
 define float @ui1282f(i128) {
 ; CHECK-LABEL: ui1282f:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB148_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB148_2:
 ; CHECK-NEXT:    lea %s2, __floatuntisf at lo
 ; CHECK-NEXT:    and %s2, %s2, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __floatuntisf at hi(, %s2)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = uitofp i128 %0 to float
   ret float %2
 }
@@ -1437,12 +1474,30 @@ define float @ui1282f(i128) {
 ; Function Attrs: norecurse nounwind readnone
 define double @i1282d(i128) {
 ; CHECK-LABEL: i1282d:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB149_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB149_2:
 ; CHECK-NEXT:    lea %s2, __floattidf at lo
 ; CHECK-NEXT:    and %s2, %s2, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __floattidf at hi(, %s2)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = sitofp i128 %0 to double
   ret double %2
 }
@@ -1450,12 +1505,30 @@ define double @i1282d(i128) {
 ; Function Attrs: norecurse nounwind readnone
 define double @ui1282d(i128) {
 ; CHECK-LABEL: ui1282d:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB150_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB150_2:
 ; CHECK-NEXT:    lea %s2, __floatuntidf at lo
 ; CHECK-NEXT:    and %s2, %s2, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __floatuntidf at hi(, %s2)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = uitofp i128 %0 to double
   ret double %2
 }
@@ -1463,12 +1536,30 @@ define double @ui1282d(i128) {
 ; Function Attrs: norecurse nounwind readnone
 define i128 @d2i128(double) {
 ; CHECK-LABEL: d2i128:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB151_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB151_2:
 ; CHECK-NEXT:    lea %s1, __fixdfti at lo
 ; CHECK-NEXT:    and %s1, %s1, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __fixdfti at hi(, %s1)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = fptosi double %0 to i128
   ret i128 %2
 }
@@ -1476,12 +1567,30 @@ define i128 @d2i128(double) {
 ; Function Attrs: norecurse nounwind readnone
 define i128 @d2ui128(double) {
 ; CHECK-LABEL: d2ui128:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB152_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB152_2:
 ; CHECK-NEXT:    lea %s1, __fixunsdfti at lo
 ; CHECK-NEXT:    and %s1, %s1, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __fixunsdfti at hi(, %s1)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = fptoui double %0 to i128
   ret i128 %2
 }
@@ -1489,12 +1598,30 @@ define i128 @d2ui128(double) {
 ; Function Attrs: norecurse nounwind readnone
 define i128 @f2i128(float) {
 ; CHECK-LABEL: f2i128:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB153_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB153_2:
 ; CHECK-NEXT:    lea %s1, __fixsfti at lo
 ; CHECK-NEXT:    and %s1, %s1, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __fixsfti at hi(, %s1)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = fptosi float %0 to i128
   ret i128 %2
 }
@@ -1502,12 +1629,30 @@ define i128 @f2i128(float) {
 ; Function Attrs: norecurse nounwind readnone
 define i128 @f2ui128(float) {
 ; CHECK-LABEL: f2ui128:
-; CHECK:       .LBB{{[0-9]+}}_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    st %s9, (, %s11)
+; CHECK-NEXT:    st %s10, 8(, %s11)
+; CHECK-NEXT:    or %s9, 0, %s11
+; CHECK-NEXT:    lea %s11, -240(, %s11)
+; CHECK-NEXT:    brge.l.t %s11, %s8, .LBB154_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    ld %s61, 24(, %s14)
+; CHECK-NEXT:    or %s62, 0, %s0
+; CHECK-NEXT:    lea %s63, 315
+; CHECK-NEXT:    shm.l %s63, (%s61)
+; CHECK-NEXT:    shm.l %s8, 8(%s61)
+; CHECK-NEXT:    shm.l %s11, 16(%s61)
+; CHECK-NEXT:    monc
+; CHECK-NEXT:    or %s0, 0, %s62
+; CHECK-NEXT:  .LBB154_2:
 ; CHECK-NEXT:    lea %s1, __fixunssfti at lo
 ; CHECK-NEXT:    and %s1, %s1, (32)0
 ; CHECK-NEXT:    lea.sl %s12, __fixunssfti at hi(, %s1)
 ; CHECK-NEXT:    bsic %s10, (, %s12)
 ; CHECK-NEXT:    or %s11, 0, %s9
+; CHECK-NEXT:    ld %s10, 8(, %s11)
+; CHECK-NEXT:    ld %s9, (, %s11)
+; CHECK-NEXT:    b.l.t (, %s10)
   %2 = fptoui float %0 to i128
   ret i128 %2
 }
diff --git a/llvm/test/CodeGen/X86/avx512fp16-frem.ll b/llvm/test/CodeGen/X86/avx512fp16-frem.ll
index 2164c2460f6d7..ad7bd2e6161d5 100644
--- a/llvm/test/CodeGen/X86/avx512fp16-frem.ll
+++ b/llvm/test/CodeGen/X86/avx512fp16-frem.ll
@@ -809,8 +809,8 @@ define half @frem_strict(half %x, half %y) nounwind #0 {
 ; CHECK-LABEL: frem_strict:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    popq %rax
@@ -822,23 +822,87 @@ define half @frem_strict(half %x, half %y) nounwind #0 {
 define <2 x half> @frem_strict_vec2(<2 x half> %x, <2 x half> %y) nounwind #0 {
 ; CHECK-LABEL: frem_strict_vec2:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    vmovaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    subq $88, %rsp
+; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovapd %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm0
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpermilps $255, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
+; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq $10, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
+; CHECK-NEXT:    vpunpckldq {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrlq $48, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovshdup (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
+; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
 ; CHECK-NEXT:    vpunpcklwd {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    vpunpckldq {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; CHECK-NEXT:    vpunpcklqdq {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
+; CHECK-NEXT:    addq $88, %rsp
 ; CHECK-NEXT:    retq
   %result = call <2 x half> @llvm.experimental.constrained.frem.v2f16(<2 x half> %x, <2 x half> %y, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret <2 x half> %result
@@ -847,45 +911,87 @@ define <2 x half> @frem_strict_vec2(<2 x half> %x, <2 x half> %y) nounwind #0 {
 define <4 x half> @frem_strict_vec4(<4 x half> %x, <4 x half> %y) nounwind #0 {
 ; CHECK-LABEL: frem_strict_vec4:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    subq $72, %rsp
-; CHECK-NEXT:    vmovdqa %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrlq $48, %xmm1, %xmm2
-; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm1
-; CHECK-NEXT:    vpsrlq $48, %xmm0, %xmm2
+; CHECK-NEXT:    subq $88, %rsp
+; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovapd %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm0
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
-; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpermilps $255, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
+; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq $10, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
+; CHECK-NEXT:    vpunpckldq {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrlq $48, (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    callq fmodf at PLT
+; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovshdup (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
-; CHECK-NEXT:    vpunpcklwd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
-; CHECK-NEXT:    vmovdqa %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovaps (%rsp), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    vpsrld $16, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
 ; CHECK-NEXT:    vpunpcklwd {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
-; CHECK-NEXT:    vinsertps $28, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],zero,zero
-; CHECK-NEXT:    addq $72, %rsp
+; CHECK-NEXT:    vpunpckldq {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; CHECK-NEXT:    vpunpcklqdq {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
+; CHECK-NEXT:    addq $88, %rsp
 ; CHECK-NEXT:    retq
   %result = call <4 x half> @llvm.experimental.constrained.frem.v4f16(<4 x half> %x, <4 x half> %y, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret <4 x half> %result
@@ -895,21 +1001,21 @@ define <8 x half> @frem_strict_vec8(<8 x half> %x, <8 x half> %y) nounwind #0 {
 ; CHECK-LABEL: frem_strict_vec8:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    subq $88, %rsp
-; CHECK-NEXT:    vmovapd %xmm1, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm1
+; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vmovapd %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm0
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm2 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm2, %xmm2, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -917,19 +1023,19 @@ define <8 x half> @frem_strict_vec8(<8 x half> %x, <8 x half> %y) nounwind #0 {
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -938,34 +1044,34 @@ define <8 x half> @frem_strict_vec8(<8 x half> %x, <8 x half> %y) nounwind #0 {
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovaps (%rsp), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
@@ -985,25 +1091,25 @@ define <16 x half> @frem_strict_vec16(<16 x half> %x, <16 x half> %y) nounwind #
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    subq $168, %rsp
 ; CHECK-NEXT:    vmovupd %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
-; CHECK-NEXT:    vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
+; CHECK-NEXT:    vmovupd %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
+; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm0
+; CHECK-NEXT:    vmovapd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vextractf128 $1, %ymm1, %xmm1
 ; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq {{.*#+}} xmm1 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm0
-; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1011,19 +1117,19 @@ define <16 x half> @frem_strict_vec16(<16 x half> %x, <16 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1033,19 +1139,19 @@ define <16 x half> @frem_strict_vec16(<16 x half> %x, <16 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $14, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $14, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $14, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1053,19 +1159,19 @@ define <16 x half> @frem_strict_vec16(<16 x half> %x, <16 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1075,35 +1181,35 @@ define <16 x half> @frem_strict_vec16(<16 x half> %x, <16 x half> %y) nounwind #
 ; CHECK-NEXT:    vinserti128 $1, {{[-0-9]+}}(%r{{[sb]}}p), %ymm0, %ymm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vmovdqu %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
@@ -1112,35 +1218,35 @@ define <16 x half> @frem_strict_vec16(<16 x half> %x, <16 x half> %y) nounwind #
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm0 # 32-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm0 # 32-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa (%rsp), %xmm1 # 16-byte Reload
@@ -1161,25 +1267,25 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    subq $360, %rsp # imm = 0x168
 ; CHECK-NEXT:    vmovupd %zmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
-; CHECK-NEXT:    vmovups %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
+; CHECK-NEXT:    vmovupd %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
+; CHECK-NEXT:    vextractf32x4 $3, %zmm0, %xmm0
+; CHECK-NEXT:    vmovapd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vextractf32x4 $3, %zmm1, %xmm1
 ; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq {{.*#+}} xmm1 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vextractf32x4 $3, %zmm0, %xmm0
-; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1187,19 +1293,19 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1211,22 +1317,22 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vextractf32x4 $2, %zmm0, %xmm0
 ; CHECK-NEXT:    vmovapd %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload
-; CHECK-NEXT:    vextractf32x4 $2, %zmm0, %xmm0
-; CHECK-NEXT:    vmovapd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
+; CHECK-NEXT:    vextractf32x4 $2, %zmm1, %xmm1
+; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm1 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1234,19 +1340,19 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1259,22 +1365,22 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm0
 ; CHECK-NEXT:    vmovapd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload
-; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm0
-; CHECK-NEXT:    vmovapd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    vpsrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovupd {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
+; CHECK-NEXT:    vextractf128 $1, %ymm1, %xmm1
+; CHECK-NEXT:    vmovapd %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    vpsrldq {{.*#+}} xmm1 = xmm1[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1282,19 +1388,19 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1304,19 +1410,19 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $14, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $14, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $14, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[3,3,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilps $255, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[3,3,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1324,19 +1430,19 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrldq $10, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,0]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,0]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpermilpd $1, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,0]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
@@ -1347,35 +1453,35 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vinserti64x4 $1, {{[-0-9]+}}(%r{{[sb]}}p), %zmm0, %zmm0 # 32-byte Folded Reload
 ; CHECK-NEXT:    vmovdqu64 %zmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 64-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
@@ -1384,34 +1490,34 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup (%rsp), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovaps (%rsp), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
@@ -1421,35 +1527,35 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    vinserti128 $1, {{[-0-9]+}}(%r{{[sb]}}p), %ymm0, %ymm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vmovdqu %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa (%rsp), %xmm1 # 16-byte Reload
@@ -1458,35 +1564,35 @@ define <32 x half> @frem_strict_vec32(<32 x half> %x, <32 x half> %y) nounwind #
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    vmovdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrlq $48, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = mem[1,1,3,3]
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovshdup {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm1 = mem[1,1,3,3]
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vpunpcklwd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1],xmm0[2],mem[2],xmm0[3],mem[3]
 ; CHECK-NEXT:    vmovdqa %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm0 # 64-byte Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %zmm1 # 64-byte Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vzeroupper
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm1
-; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
 ; CHECK-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpsrld $16, {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
+; CHECK-NEXT:    vcvtsh2ss %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    callq fmodf at PLT
 ; CHECK-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
 ; CHECK-NEXT:    vmovdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
diff --git a/llvm/test/CodeGen/X86/bfloat-constrained.ll b/llvm/test/CodeGen/X86/bfloat-constrained.ll
index 081b1cebfc43d..394b3f9e6ae67 100644
--- a/llvm/test/CodeGen/X86/bfloat-constrained.ll
+++ b/llvm/test/CodeGen/X86/bfloat-constrained.ll
@@ -10,24 +10,23 @@
 define float @bfloat_to_float() strictfp {
 ; X86-LABEL: bfloat_to_float:
 ; X86:       # %bb.0:
-; X86-NEXT:    subl $12, %esp
-; X86-NEXT:    .cfi_def_cfa_offset 16
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    .cfi_def_cfa_offset 8
 ; X86-NEXT:    movzwl a, %eax
+; X86-NEXT:    shll $16, %eax
 ; X86-NEXT:    movl %eax, (%esp)
-; X86-NEXT:    calll __extendbfsf2
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    flds (%esp)
+; X86-NEXT:    wait
+; X86-NEXT:    popl %eax
 ; X86-NEXT:    .cfi_def_cfa_offset 4
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: bfloat_to_float:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    .cfi_def_cfa_offset 16
 ; X64-NEXT:    movq a at GOTPCREL(%rip), %rax
-; X64-NEXT:    movzwl (%rax), %edi
-; X64-NEXT:    callq __extendbfsf2 at PLT
-; X64-NEXT:    popq %rax
-; X64-NEXT:    .cfi_def_cfa_offset 8
+; X64-NEXT:    movzwl (%rax), %eax
+; X64-NEXT:    shll $16, %eax
+; X64-NEXT:    vmovd %eax, %xmm0
 ; X64-NEXT:    retq
   %1 = load bfloat, ptr @a, align 2
   %2 = tail call float @llvm.experimental.constrained.fpext.f32.bfloat(bfloat %1, metadata !"fpexcept.strict") #0
@@ -37,25 +36,24 @@ define float @bfloat_to_float() strictfp {
 define double @bfloat_to_double() strictfp {
 ; X86-LABEL: bfloat_to_double:
 ; X86:       # %bb.0:
-; X86-NEXT:    subl $12, %esp
-; X86-NEXT:    .cfi_def_cfa_offset 16
+; X86-NEXT:    pushl %eax
+; X86-NEXT:    .cfi_def_cfa_offset 8
 ; X86-NEXT:    movzwl a, %eax
+; X86-NEXT:    shll $16, %eax
 ; X86-NEXT:    movl %eax, (%esp)
-; X86-NEXT:    calll __extendbfsf2
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    flds (%esp)
+; X86-NEXT:    wait
+; X86-NEXT:    popl %eax
 ; X86-NEXT:    .cfi_def_cfa_offset 4
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: bfloat_to_double:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    .cfi_def_cfa_offset 16
 ; X64-NEXT:    movq a at GOTPCREL(%rip), %rax
-; X64-NEXT:    movzwl (%rax), %edi
-; X64-NEXT:    callq __extendbfsf2 at PLT
+; X64-NEXT:    movzwl (%rax), %eax
+; X64-NEXT:    shll $16, %eax
+; X64-NEXT:    vmovd %eax, %xmm0
 ; X64-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
-; X64-NEXT:    popq %rax
-; X64-NEXT:    .cfi_def_cfa_offset 8
 ; X64-NEXT:    retq
   %1 = load bfloat, ptr @a, align 2
   %2 = tail call double @llvm.experimental.constrained.fpext.f64.bfloat(bfloat %1, metadata !"fpexcept.strict") #0
@@ -126,15 +124,14 @@ define void @add() strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    .cfi_def_cfa_offset 16
 ; X86-NEXT:    movzwl a, %eax
-; X86-NEXT:    movl %eax, (%esp)
-; X86-NEXT:    calll __extendbfsf2
-; X86-NEXT:    fstps {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Spill
+; X86-NEXT:    shll $16, %eax
+; X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
+; X86-NEXT:    flds {{[0-9]+}}(%esp)
 ; X86-NEXT:    wait
 ; X86-NEXT:    movzwl b, %eax
-; X86-NEXT:    movl %eax, (%esp)
-; X86-NEXT:    calll __extendbfsf2
-; X86-NEXT:    flds {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
-; X86-NEXT:    faddp %st, %st(1)
+; X86-NEXT:    shll $16, %eax
+; X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
+; X86-NEXT:    fadds {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstps (%esp)
 ; X86-NEXT:    wait
 ; X86-NEXT:    calll __truncsfbf2
@@ -148,13 +145,14 @@ define void @add() strictfp {
 ; X64-NEXT:    pushq %rax
 ; X64-NEXT:    .cfi_def_cfa_offset 16
 ; X64-NEXT:    movq a at GOTPCREL(%rip), %rax
-; X64-NEXT:    movzwl (%rax), %edi
-; X64-NEXT:    callq __extendbfsf2 at PLT
-; X64-NEXT:    vmovss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; X64-NEXT:    movzwl (%rax), %eax
+; X64-NEXT:    shll $16, %eax
+; X64-NEXT:    vmovd %eax, %xmm0
 ; X64-NEXT:    movq b at GOTPCREL(%rip), %rax
-; X64-NEXT:    movzwl (%rax), %edi
-; X64-NEXT:    callq __extendbfsf2 at PLT
-; X64-NEXT:    vaddss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 4-byte Folded Reload
+; X64-NEXT:    movzwl (%rax), %eax
+; X64-NEXT:    shll $16, %eax
+; X64-NEXT:    vmovd %eax, %xmm1
+; X64-NEXT:    vaddss %xmm1, %xmm0, %xmm0
 ; X64-NEXT:    callq __truncsfbf2 at PLT
 ; X64-NEXT:    movq c at GOTPCREL(%rip), %rcx
 ; X64-NEXT:    movw %ax, (%rcx)
diff --git a/llvm/test/CodeGen/X86/float-strict-powi-convert.ll b/llvm/test/CodeGen/X86/float-strict-powi-convert.ll
index b39f5ec667cec..4d2bcb9119053 100644
--- a/llvm/test/CodeGen/X86/float-strict-powi-convert.ll
+++ b/llvm/test/CodeGen/X86/float-strict-powi-convert.ll
@@ -7,18 +7,12 @@ declare float @llvm.experimental.constrained.powi.f32(float, i32, metadata, meta
 define float @powi_f64(float %a, i32 %b) nounwind strictfp {
 ; WIN-LABEL: powi_f64:
 ; WIN:       # %bb.0:
-; WIN-NEXT:    subq $40, %rsp
 ; WIN-NEXT:    cvtsi2ss %edx, %xmm1
-; WIN-NEXT:    callq powf
-; WIN-NEXT:    addq $40, %rsp
-; WIN-NEXT:    retq
+; WIN-NEXT:    jmp powf # TAILCALL
 ;
 ; UNIX-LABEL: powi_f64:
 ; UNIX:       # %bb.0:
-; UNIX-NEXT:    pushq %rax
-; UNIX-NEXT:    callq __powisf2 at PLT
-; UNIX-NEXT:    popq %rax
-; UNIX-NEXT:    retq
+; UNIX-NEXT:    jmp __powisf2 at PLT # TAILCALL
   %1 = call float @llvm.experimental.constrained.powi.f32(float %a, i32 %b, metadata !"round.tonearest", metadata !"fpexcept.ignore") strictfp
   ret float %1
 }
diff --git a/llvm/test/CodeGen/X86/fp-intrinsics-flags-x86_64.ll b/llvm/test/CodeGen/X86/fp-intrinsics-flags-x86_64.ll
index c2228046d6077..dc2ce49908f13 100644
--- a/llvm/test/CodeGen/X86/fp-intrinsics-flags-x86_64.ll
+++ b/llvm/test/CodeGen/X86/fp-intrinsics-flags-x86_64.ll
@@ -1,13 +1,16 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s -stop-after=finalize-isel | FileCheck %s
 
 define i32 @f20u(double %x) #0 {
-; CHECK-LABEL: name: f20u
-; CHECK: liveins: $xmm0
-; CHECK: [[COPY:%[0-9]+]]:fr64 = COPY $xmm0
-; CHECK: [[CVTTSD2SI64rr:%[0-9]+]]:gr64 = CVTTSD2SI64rr [[COPY]], implicit $mxcsr
-; CHECK: [[COPY1:%[0-9]+]]:gr32 = COPY [[CVTTSD2SI64rr]].sub_32bit
-; CHECK: $eax = COPY [[COPY1]]
-; CHECK: RET 0, $eax
+  ; CHECK-LABEL: name: f20u
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   liveins: $xmm0
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:fr64 = COPY $xmm0
+  ; CHECK-NEXT:   [[CVTTSD2SI64rr:%[0-9]+]]:gr64 = nofpexcept CVTTSD2SI64rr [[COPY]], implicit $mxcsr
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:gr32 = COPY [[CVTTSD2SI64rr]].sub_32bit
+  ; CHECK-NEXT:   $eax = COPY [[COPY1]]
+  ; CHECK-NEXT:   RET 0, $eax
 entry:
   %result = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %result
diff --git a/llvm/test/CodeGen/X86/fp-intrinsics-flags.ll b/llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
index fc5279d02ab8a..ae1518262325d 100644
--- a/llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
+++ b/llvm/test/CodeGen/X86/fp-intrinsics-flags.ll
@@ -1,101 +1,115 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc -O3 -mtriple=i686-pc-linux -mattr=sse2 -stop-after=finalize-isel < %s | FileCheck %s
 
 define double @sifdb(i8 %x) #0 {
+  ; CHECK-LABEL: name: sifdb
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOVSX32rm8_:%[0-9]+]]:gr32 = MOVSX32rm8 %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s8) from %fixed-stack.0, align 16)
+  ; CHECK-NEXT:   [[CVTSI2SDrr:%[0-9]+]]:fr64 = CVTSI2SDrr killed [[MOVSX32rm8_]]
+  ; CHECK-NEXT:   MOVSDmr %stack.0, 1, $noreg, 0, $noreg, killed [[CVTSI2SDrr]] :: (store (s64) into %stack.0, align 4)
+  ; CHECK-NEXT:   [[LD_Fp64m80_:%[0-9]+]]:rfp80 = nofpexcept LD_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, implicit-def dead $fpsw, implicit $fpcw :: (load (s64) from %stack.0, align 4)
+  ; CHECK-NEXT:   RET 0, killed [[LD_Fp64m80_]]
 entry:
-; CHECK-LABEL: name: sifdb
-; CHECK: [[MOVSX32rm8_:%[0-9]+]]:gr32 = MOVSX32rm8 %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s8) from %fixed-stack.0, align 16)
-; CHECK: [[CVTSI2SDrr:%[0-9]+]]:fr64 = CVTSI2SDrr killed [[MOVSX32rm8_]]
-; CHECK: MOVSDmr %stack.0, 1, $noreg, 0, $noreg, killed [[CVTSI2SDrr]] :: (store (s64) into %stack.0, align 4)
-; CHECK: [[LD_Fp64m80_:%[0-9]+]]:rfp80 = nofpexcept LD_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, implicit-def dead $fpsw, implicit $fpcw :: (load (s64) from %stack.0, align 4)
-; CHECK: RET 0, killed [[LD_Fp64m80_]]
   %result = call double @llvm.experimental.constrained.sitofp.f64.i8(i8 %x, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 define double @sifdw(i16 %x) #0 {
+  ; CHECK-LABEL: name: sifdw
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOVSX32rm16_:%[0-9]+]]:gr32 = MOVSX32rm16 %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s16) from %fixed-stack.0, align 16)
+  ; CHECK-NEXT:   [[CVTSI2SDrr:%[0-9]+]]:fr64 = CVTSI2SDrr killed [[MOVSX32rm16_]]
+  ; CHECK-NEXT:   MOVSDmr %stack.0, 1, $noreg, 0, $noreg, killed [[CVTSI2SDrr]] :: (store (s64) into %stack.0, align 4)
+  ; CHECK-NEXT:   [[LD_Fp64m80_:%[0-9]+]]:rfp80 = nofpexcept LD_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, implicit-def dead $fpsw, implicit $fpcw :: (load (s64) from %stack.0, align 4)
+  ; CHECK-NEXT:   RET 0, killed [[LD_Fp64m80_]]
 entry:
-; CHECK-LABEL: name: sifdw
-; CHECK: [[MOVSX32rm16_:%[0-9]+]]:gr32 = MOVSX32rm16 %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s16) from %fixed-stack.0, align 16)
-; CHECK: [[CVTSI2SDrr:%[0-9]+]]:fr64 = CVTSI2SDrr killed [[MOVSX32rm16_]]
-; CHECK: MOVSDmr %stack.0, 1, $noreg, 0, $noreg, killed [[CVTSI2SDrr]] :: (store (s64) into %stack.0, align 4)
-; CHECK: [[LD_Fp64m80_:%[0-9]+]]:rfp80 = nofpexcept LD_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, implicit-def dead $fpsw, implicit $fpcw :: (load (s64) from %stack.0, align 4)
-; CHECK: RET 0, killed [[LD_Fp64m80_]]
   %result = call double @llvm.experimental.constrained.sitofp.f64.i16(i16 %x, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 define i64 @f20u64(double %x) #0 {
+  ; CHECK-LABEL: name: f20u64
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0, align 16)
+  ; CHECK-NEXT:   [[MOVSDrm_alt1:%[0-9]+]]:fr64 = MOVSDrm_alt $noreg, 1, $noreg, %const.0, $noreg :: (load (s64) from constant-pool)
+  ; CHECK-NEXT:   nofpexcept UCOMISDrr [[MOVSDrm_alt1]], [[MOVSDrm_alt]], implicit-def $eflags, implicit $mxcsr
+  ; CHECK-NEXT:   [[FsFLD0SD:%[0-9]+]]:fr64 = FsFLD0SD
+  ; CHECK-NEXT:   JCC_1 %bb.2, 6, implicit $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1.entry:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT:   liveins: $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2.entry:
+  ; CHECK-NEXT:   liveins: $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:fr64 = PHI [[FsFLD0SD]], %bb.1, [[MOVSDrm_alt1]], %bb.0
+  ; CHECK-NEXT:   [[SUBSDrr:%[0-9]+]]:fr64 = nofpexcept SUBSDrr [[MOVSDrm_alt]], killed [[PHI]], implicit $mxcsr
+  ; CHECK-NEXT:   MOVSDmr %stack.0, 1, $noreg, 0, $noreg, killed [[SUBSDrr]] :: (store (s64) into %stack.0)
+  ; CHECK-NEXT:   [[SETCCr:%[0-9]+]]:gr8 = SETCCr 6, implicit $eflags
+  ; CHECK-NEXT:   [[LD_Fp64m80_:%[0-9]+]]:rfp80 = LD_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, implicit-def dead $fpsw, implicit $fpcw :: (load (s64) from %stack.0)
+  ; CHECK-NEXT:   FNSTCW16m %stack.1, 1, $noreg, 0, $noreg, implicit-def $fpsw, implicit $fpcw :: (store (s16) into %stack.1)
+  ; CHECK-NEXT:   [[MOVZX32rm16_:%[0-9]+]]:gr32 = MOVZX32rm16 %stack.1, 1, $noreg, 0, $noreg :: (load (s16) from %stack.1)
+  ; CHECK-NEXT:   [[OR32ri:%[0-9]+]]:gr32 = OR32ri killed [[MOVZX32rm16_]], 3072, implicit-def $eflags
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gr16 = COPY killed [[OR32ri]].sub_16bit
+  ; CHECK-NEXT:   MOV16mr %stack.2, 1, $noreg, 0, $noreg, killed [[COPY]] :: (store (s16) into %stack.2)
+  ; CHECK-NEXT:   FLDCW16m %stack.2, 1, $noreg, 0, $noreg, implicit-def $fpsw, implicit-def $fpcw :: (load (s16) from %stack.2)
+  ; CHECK-NEXT:   IST_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, [[LD_Fp64m80_]], implicit-def $fpsw, implicit $fpcw
+  ; CHECK-NEXT:   FLDCW16m %stack.1, 1, $noreg, 0, $noreg, implicit-def $fpsw, implicit-def $fpcw :: (load (s16) from %stack.1)
+  ; CHECK-NEXT:   [[MOVZX32rr8_:%[0-9]+]]:gr32 = MOVZX32rr8 killed [[SETCCr]]
+  ; CHECK-NEXT:   [[SHL32ri:%[0-9]+]]:gr32 = SHL32ri [[MOVZX32rr8_]], 31, implicit-def dead $eflags
+  ; CHECK-NEXT:   [[XOR32rm:%[0-9]+]]:gr32 = XOR32rm [[SHL32ri]], %stack.0, 1, $noreg, 4, $noreg, implicit-def dead $eflags :: (load (s32) from %stack.0 + 4)
+  ; CHECK-NEXT:   [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %stack.0, align 8)
+  ; CHECK-NEXT:   $eax = COPY [[MOV32rm]]
+  ; CHECK-NEXT:   $edx = COPY [[XOR32rm]]
+  ; CHECK-NEXT:   RET 0, $eax, $edx
 entry:
-; CHECK-LABEL: name: f20u64
-; CHECK: [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0, align 16)
-; CHECK: [[MOVSDrm_alt1:%[0-9]+]]:fr64 = MOVSDrm_alt $noreg, 1, $noreg, %const.0, $noreg :: (load (s64) from constant-pool)
-; CHECK: COMISDrr [[MOVSDrm_alt1]], [[MOVSDrm_alt]], implicit-def $eflags, implicit $mxcsr
-; CHECK: [[FsFLD0SD:%[0-9]+]]:fr64 = FsFLD0SD
-; CHECK: JCC_1
-; CHECK: [[PHI:%[0-9]+]]:fr64 = PHI [[FsFLD0SD]], {{.*}}, [[MOVSDrm_alt1]], {{.*}}
-; CHECK: [[SUBSDrr:%[0-9]+]]:fr64 = SUBSDrr [[MOVSDrm_alt]], killed [[PHI]], implicit $mxcsr
-; CHECK: MOVSDmr %stack.0, 1, $noreg, 0, $noreg, killed [[SUBSDrr]] :: (store (s64) into %stack.0)
-; CHECK: [[SETCCr:%[0-9]+]]:gr8 = SETCCr 6, implicit $eflags
-; CHECK: [[LD_Fp64m80:%[0-9]+]]:rfp80 = LD_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, implicit-def dead $fpsw, implicit $fpcw :: (load (s64) from %stack.0)
-; CHECK: FNSTCW16m %stack.1, 1, $noreg, 0, $noreg, implicit-def $fpsw, implicit $fpcw :: (store (s16) into %stack.1)
-; CHECK: [[MOVZX32rm16_:%[0-9]+]]:gr32 = MOVZX32rm16 %stack.1, 1, $noreg, 0, $noreg :: (load (s16) from %stack.1)
-; CHECK: [[OR32ri:%[0-9]+]]:gr32 = OR32ri killed [[MOVZX32rm16_]], 3072, implicit-def $eflags
-; CHECK: [[COPY3:%[0-9]+]]:gr16 = COPY killed [[OR32ri]].sub_16bit
-; CHECK: MOV16mr %stack.2, 1, $noreg, 0, $noreg, killed [[COPY3]] :: (store (s16) into %stack.2)
-; CHECK: FLDCW16m %stack.2, 1, $noreg, 0, $noreg, implicit-def $fpsw, implicit-def $fpcw :: (load (s16) from %stack.2)
-; CHECK: IST_Fp64m80 %stack.0, 1, $noreg, 0, $noreg, [[LD_Fp64m80]], implicit-def $fpsw, implicit $fpcw
-; CHECK: FLDCW16m %stack.1, 1, $noreg, 0, $noreg, implicit-def $fpsw, implicit-def $fpcw :: (load (s16) from %stack.1)
-; CHECK: [[MOVZX32rr8_:%[0-9]+]]:gr32 = MOVZX32rr8 killed [[SETCCr]]
-; CHECK: [[SHL32ri:%[0-9]+]]:gr32 = SHL32ri [[MOVZX32rr8_]], 31, implicit-def dead $eflags
-; CHECK: [[XOR32rm:%[0-9]+]]:gr32 = XOR32rm [[SHL32ri]], %stack.0, 1, $noreg, 4, $noreg, implicit-def dead $eflags :: (load (s32) from %stack.0 + 4)
-; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %stack.0, align 8)
-; CHECK: $eax = COPY [[MOV32rm]]
-; CHECK: $edx = COPY [[XOR32rm]]
-; CHECK: RET 0, $eax, $edx
   %result = call i64 @llvm.experimental.constrained.fptoui.i64.f64(double %x, metadata !"fpexcept.strict") #0
   ret i64 %result
 }
 
 define i8 @f20s8(double %x) #0 {
+  ; CHECK-LABEL: name: f20s8
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[CVTTSD2SIrm:%[0-9]+]]:gr32 = nofpexcept CVTTSD2SIrm %fixed-stack.0, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.0, align 16)
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gr32_abcd = COPY killed [[CVTTSD2SIrm]]
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:gr8 = COPY [[COPY]].sub_8bit
+  ; CHECK-NEXT:   $al = COPY [[COPY1]]
+  ; CHECK-NEXT:   RET 0, $al
 entry:
-; CHECK-LABEL: name: f20s8
-; CHECK: [[CVTTSD2SIrm:%[0-9]+]]:gr32 = CVTTSD2SIrm %fixed-stack.0, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.0, align 16)
-; CHECK: [[COPY:%[0-9]+]]:gr32_abcd = COPY killed [[CVTTSD2SIrm]]
-; CHECK: [[COPY1:%[0-9]+]]:gr8 = COPY [[COPY]].sub_8bit
-; CHECK: $al = COPY [[COPY1]]
-; CHECK: RET 0, $al
   %result = call i8 @llvm.experimental.constrained.fptosi.i8.f64(double %x, metadata !"fpexcept.strict") #0
   ret i8 %result
 }
 
 define i16 @f20s16(double %x) #0 {
+  ; CHECK-LABEL: name: f20s16
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[CVTTSD2SIrm:%[0-9]+]]:gr32 = nofpexcept CVTTSD2SIrm %fixed-stack.0, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.0, align 16)
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gr16 = COPY [[CVTTSD2SIrm]].sub_16bit
+  ; CHECK-NEXT:   $ax = COPY [[COPY]]
+  ; CHECK-NEXT:   RET 0, $ax
 entry:
-; CHECK-LABEL: name: f20s16
-; CHECK: [[CVTTSD2SIrm:%[0-9]+]]:gr32 = CVTTSD2SIrm %fixed-stack.0, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.0, align 16)
-; CHECK: [[COPY:%[0-9]+]]:gr16 = COPY [[CVTTSD2SIrm]].sub_16bit
-; CHECK: $ax = COPY [[COPY]]
-; CHECK: RET 0, $ax
   %result = call i16 @llvm.experimental.constrained.fptosi.i16.f64(double %x, metadata !"fpexcept.strict") #0
   ret i16 %result
 }
 
 define i32 @f20u(double %x) #0 {
+  ; CHECK-LABEL: name: f20u
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0, align 16)
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:vr128 = COPY [[MOVSDrm_alt]]
+  ; CHECK-NEXT:   [[CVTTSD2SIrr_Int:%[0-9]+]]:gr32 = nofpexcept CVTTSD2SIrr_Int killed [[COPY]], implicit $mxcsr
+  ; CHECK-NEXT:   [[SAR32ri:%[0-9]+]]:gr32 = SAR32ri [[CVTTSD2SIrr_Int]], 31, implicit-def dead $eflags
+  ; CHECK-NEXT:   [[SUBSDrm:%[0-9]+]]:fr64 = nofpexcept SUBSDrm [[MOVSDrm_alt]], $noreg, 1, $noreg, %const.0, $noreg, implicit $mxcsr :: (load (s64) from constant-pool)
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:vr128 = COPY killed [[SUBSDrm]]
+  ; CHECK-NEXT:   [[CVTTSD2SIrr_Int1:%[0-9]+]]:gr32 = nofpexcept CVTTSD2SIrr_Int killed [[COPY1]], implicit $mxcsr
+  ; CHECK-NEXT:   [[AND32rr:%[0-9]+]]:gr32 = AND32rr [[CVTTSD2SIrr_Int1]], killed [[SAR32ri]], implicit-def dead $eflags
+  ; CHECK-NEXT:   [[OR32rr:%[0-9]+]]:gr32 = OR32rr [[CVTTSD2SIrr_Int]], killed [[AND32rr]], implicit-def dead $eflags
+  ; CHECK-NEXT:   $eax = COPY [[OR32rr]]
+  ; CHECK-NEXT:   RET 0, $eax
 entry:
-; CHECK-LABEL: name: f20u
-; CHECK: [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.0, align 16)
-; CHECK: [[MOVSDrm_alt1:%[0-9]+]]:fr64 = MOVSDrm_alt $noreg, 1, $noreg, %const.0, $noreg :: (load (s64) from constant-pool)
-; CHECK: COMISDrr [[MOVSDrm_alt1]], [[MOVSDrm_alt]], implicit-def $eflags, implicit $mxcsr
-; CHECK: [[FsFLD0SD:%[0-9]+]]:fr64 = FsFLD0SD
-; CHECK: JCC_1
-; CHECK: [[PHI:%[0-9]+]]:fr64 = PHI [[MOVSDrm_alt1]], {{.*}}, [[FsFLD0SD]], {{.*}}
-; CHECK: [[SETCCr:%[0-9]+]]:gr8 = SETCCr 6, implicit $eflags
-; CHECK: [[MOVZX32rr8_:%[0-9]+]]:gr32 = MOVZX32rr8 killed [[SETCCr]]
-; CHECK: [[SHL32ri:%[0-9]+]]:gr32 = SHL32ri [[MOVZX32rr8_]], 31, implicit-def dead $eflags
-; CHECK: [[SUBSDrr:%[0-9]+]]:fr64 = SUBSDrr [[MOVSDrm_alt]], killed [[PHI]], implicit $mxcsr
-; CHECK: [[CVTTSD2SIrr:%[0-9]+]]:gr32 = CVTTSD2SIrr killed [[SUBSDrr]], implicit $mxcsr
-; CHECK: [[XOR32rr:%[0-9]+]]:gr32 = XOR32rr [[CVTTSD2SIrr]], killed [[SHL32ri]], implicit-def dead $eflags
-; CHECK: $eax = COPY [[XOR32rr]]
-; CHECK: RET 0, $eax
   %result = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %x, metadata !"fpexcept.strict") #0
   ret i32 %result
 }
@@ -105,17 +119,16 @@ entry:
 ; may be CSE'd. Instructions with different exception behavior belong to
 ; different groups, they have different chain argument and cannot be CSE'd.
 define void @binop_cse(double %a, double %b, ptr %x, ptr %y) #0 {
+  ; CHECK-LABEL: name: binop_cse
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
+  ; CHECK-NEXT:   [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1, align 16)
+  ; CHECK-NEXT:   [[MOVSDrm_alt:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.3, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.3, align 16)
+  ; CHECK-NEXT:   [[DIVSDrm:%[0-9]+]]:fr64 = nofpexcept DIVSDrm [[MOVSDrm_alt]], %fixed-stack.2, 1, $noreg, 0, $noreg, implicit $mxcsr :: (load (s64) from %fixed-stack.2)
+  ; CHECK-NEXT:   MOVSDmr killed [[MOV32rm1]], 1, $noreg, 0, $noreg, [[DIVSDrm]] :: (store (s64) into %ir.x, align 4)
+  ; CHECK-NEXT:   MOVSDmr killed [[MOV32rm]], 1, $noreg, 0, $noreg, [[DIVSDrm]] :: (store (s64) into %ir.y, align 4)
+  ; CHECK-NEXT:   RET 0
 entry:
-; CHECK-LABEL: name: binop_cse
-; CHECK: [[Y:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
-; CHECK: [[X:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1, align 16)
-; CHECK: [[B:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.2)
-; CHECK: [[A:%[0-9]+]]:fr64 = MOVSDrm_alt %fixed-stack.3, 1, $noreg, 0, $noreg :: (load (s64) from %fixed-stack.3, align 16)
-; CHECK: [[DIV0:%[0-9]+]]:fr64 = DIVSDrr [[A]], [[B]], implicit $mxcsr
-; CHECK: [[DIV1:%[0-9]+]]:fr64 = nofpexcept DIVSDrr [[A]], [[B]], implicit $mxcsr
-; CHECK: MOVSDmr killed [[X]], 1, $noreg, 0, $noreg, [[DIV1]] :: (store (s64) into %ir.x, align 4)
-; CHECK: MOVSDmr killed [[Y]], 1, $noreg, 0, $noreg, [[DIV1]] :: (store (s64) into %ir.y, align 4)
-; CHECK: RET 0
   %div = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   %div1 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   %div2 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -130,16 +143,15 @@ entry:
 ; may be CSE'd. Instructions with different exception behavior belong to
 ; different groups, they have different chain argument and cannot be CSE'd.
 define void @sitofp_cse(i32 %a, ptr %x, ptr %y) #0 {
+  ; CHECK-LABEL: name: sitofp_cse
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 8)
+  ; CHECK-NEXT:   [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1)
+  ; CHECK-NEXT:   [[CVTSI2SDrm:%[0-9]+]]:fr64 = nofpexcept CVTSI2SDrm %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.2, align 16)
+  ; CHECK-NEXT:   MOVSDmr killed [[MOV32rm1]], 1, $noreg, 0, $noreg, [[CVTSI2SDrm]] :: (store (s64) into %ir.x, align 4)
+  ; CHECK-NEXT:   MOVSDmr killed [[MOV32rm]], 1, $noreg, 0, $noreg, [[CVTSI2SDrm]] :: (store (s64) into %ir.y, align 4)
+  ; CHECK-NEXT:   RET 0
 entry:
-; CHECK-LABEL: name: sitofp_cse
-; CHECK: [[Y:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 8)
-; CHECK: [[X:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.1)
-; CHECK: [[A:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.2, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.2, align 16)
-; CHECK: [[CVT0:%[0-9]+]]:fr64 = CVTSI2SDrr [[A]]
-; CHECK: [[CVT1:%[0-9]+]]:fr64 = nofpexcept CVTSI2SDrr [[A]]
-; CHECK: MOVSDmr killed [[X]], 1, $noreg, 0, $noreg, [[CVT1]] :: (store (s64) into %ir.x, align 4)
-; CHECK: MOVSDmr killed [[Y]], 1, $noreg, 0, $noreg, [[CVT1]] :: (store (s64) into %ir.y, align 4)
-; CHECK: RET 0
   %result = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   %result1 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   %result2 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/CodeGen/X86/fp-intrinsics-fma.ll b/llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
index 71d49481ebb8e..8f5183a45baba 100644
--- a/llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
+++ b/llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
@@ -7,13 +7,8 @@
 define float @f1(float %0, float %1, float %2) #0 {
 ; NOFMA-LABEL: f1:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
 ; NOFMA-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; NOFMA-NEXT:    callq fmaf at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
-; NOFMA-NEXT:    retq
+; NOFMA-NEXT:    jmp fmaf at PLT # TAILCALL
 ;
 ; FMA-LABEL: f1:
 ; FMA:       # %bb.0: # %entry
@@ -35,13 +30,8 @@ entry:
 define double @f2(double %0, double %1, double %2) #0 {
 ; NOFMA-LABEL: f2:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
 ; NOFMA-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; NOFMA-NEXT:    callq fma at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
-; NOFMA-NEXT:    retq
+; NOFMA-NEXT:    jmp fma at PLT # TAILCALL
 ;
 ; FMA-LABEL: f2:
 ; FMA:       # %bb.0: # %entry
@@ -63,13 +53,8 @@ entry:
 define float @f3(float %0, float %1, float %2) #0 {
 ; NOFMA-LABEL: f3:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
 ; NOFMA-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
-; NOFMA-NEXT:    callq fmaf at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
-; NOFMA-NEXT:    retq
+; NOFMA-NEXT:    jmp fmaf at PLT # TAILCALL
 ;
 ; FMA-LABEL: f3:
 ; FMA:       # %bb.0: # %entry
@@ -91,13 +76,8 @@ entry:
 define double @f4(double %0, double %1, double %2) #0 {
 ; NOFMA-LABEL: f4:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
 ; NOFMA-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
-; NOFMA-NEXT:    callq fma at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
-; NOFMA-NEXT:    retq
+; NOFMA-NEXT:    jmp fma at PLT # TAILCALL
 ;
 ; FMA-LABEL: f4:
 ; FMA:       # %bb.0: # %entry
@@ -119,15 +99,10 @@ entry:
 define float @f5(float %0, float %1, float %2) #0 {
 ; NOFMA-LABEL: f5:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
 ; NOFMA-NEXT:    movaps {{.*#+}} xmm3 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
 ; NOFMA-NEXT:    xorps %xmm3, %xmm0
 ; NOFMA-NEXT:    xorps %xmm3, %xmm2
-; NOFMA-NEXT:    callq fmaf at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
-; NOFMA-NEXT:    retq
+; NOFMA-NEXT:    jmp fmaf at PLT # TAILCALL
 ;
 ; FMA-LABEL: f5:
 ; FMA:       # %bb.0: # %entry
@@ -150,15 +125,10 @@ entry:
 define double @f6(double %0, double %1, double %2) #0 {
 ; NOFMA-LABEL: f6:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
 ; NOFMA-NEXT:    movaps {{.*#+}} xmm3 = [-0.0E+0,-0.0E+0]
 ; NOFMA-NEXT:    xorps %xmm3, %xmm0
 ; NOFMA-NEXT:    xorps %xmm3, %xmm2
-; NOFMA-NEXT:    callq fma at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
-; NOFMA-NEXT:    retq
+; NOFMA-NEXT:    jmp fma at PLT # TAILCALL
 ;
 ; FMA-LABEL: f6:
 ; FMA:       # %bb.0: # %entry
@@ -385,22 +355,17 @@ entry:
 define float @f15() #0 {
 ; NOFMA-LABEL: f15:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    movss {{.*#+}} xmm1 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; NOFMA-NEXT:    movaps %xmm1, %xmm0
-; NOFMA-NEXT:    mulss %xmm1, %xmm0
-; NOFMA-NEXT:    addss %xmm1, %xmm0
+; NOFMA-NEXT:    movss {{.*#+}} xmm0 = [1.575E+1,0.0E+0,0.0E+0,0.0E+0]
 ; NOFMA-NEXT:    retq
 ;
 ; FMA-LABEL: f15:
 ; FMA:       # %bb.0: # %entry
-; FMA-NEXT:    vmovss {{.*#+}} xmm0 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; FMA-NEXT:    vfmadd213ss {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA-NEXT:    vmovss {{.*#+}} xmm0 = [1.575E+1,0.0E+0,0.0E+0,0.0E+0]
 ; FMA-NEXT:    retq
 ;
 ; FMA4-LABEL: f15:
 ; FMA4:       # %bb.0: # %entry
-; FMA4-NEXT:    vmovss {{.*#+}} xmm0 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; FMA4-NEXT:    vfmaddss {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA4-NEXT:    vmovss {{.*#+}} xmm0 = [1.575E+1,0.0E+0,0.0E+0,0.0E+0]
 ; FMA4-NEXT:    retq
 entry:
   %result = call float @llvm.experimental.constrained.fmuladd.f32(
@@ -417,22 +382,17 @@ entry:
 define double @f16() #0 {
 ; NOFMA-LABEL: f16:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    movsd {{.*#+}} xmm1 = [4.2100000000000001E+1,0.0E+0]
-; NOFMA-NEXT:    movapd %xmm1, %xmm0
-; NOFMA-NEXT:    mulsd %xmm1, %xmm0
-; NOFMA-NEXT:    addsd %xmm1, %xmm0
+; NOFMA-NEXT:    movsd {{.*#+}} xmm0 = [1.81451E+3,0.0E+0]
 ; NOFMA-NEXT:    retq
 ;
 ; FMA-LABEL: f16:
 ; FMA:       # %bb.0: # %entry
-; FMA-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; FMA-NEXT:    vfmadd213sd {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA-NEXT:    vmovsd {{.*#+}} xmm0 = [1.8145100000000002E+3,0.0E+0]
 ; FMA-NEXT:    retq
 ;
 ; FMA4-LABEL: f16:
 ; FMA4:       # %bb.0: # %entry
-; FMA4-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; FMA4-NEXT:    vfmaddsd {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA4-NEXT:    vmovsd {{.*#+}} xmm0 = [1.8145100000000002E+3,0.0E+0]
 ; FMA4-NEXT:    retq
 entry:
   %result = call double @llvm.experimental.constrained.fmuladd.f64(
@@ -449,26 +409,17 @@ entry:
 define float @f17() #0 {
 ; NOFMA-LABEL: f17:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
-; NOFMA-NEXT:    movss {{.*#+}} xmm0 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; NOFMA-NEXT:    movaps %xmm0, %xmm1
-; NOFMA-NEXT:    movaps %xmm0, %xmm2
-; NOFMA-NEXT:    callq fmaf at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
+; NOFMA-NEXT:    movss {{.*#+}} xmm0 = [1.575E+1,0.0E+0,0.0E+0,0.0E+0]
 ; NOFMA-NEXT:    retq
 ;
 ; FMA-LABEL: f17:
 ; FMA:       # %bb.0: # %entry
-; FMA-NEXT:    vmovss {{.*#+}} xmm0 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; FMA-NEXT:    vfmadd213ss {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA-NEXT:    vmovss {{.*#+}} xmm0 = [1.575E+1,0.0E+0,0.0E+0,0.0E+0]
 ; FMA-NEXT:    retq
 ;
 ; FMA4-LABEL: f17:
 ; FMA4:       # %bb.0: # %entry
-; FMA4-NEXT:    vmovss {{.*#+}} xmm0 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; FMA4-NEXT:    vfmaddss {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA4-NEXT:    vmovss {{.*#+}} xmm0 = [1.575E+1,0.0E+0,0.0E+0,0.0E+0]
 ; FMA4-NEXT:    retq
 entry:
   %result = call float @llvm.experimental.constrained.fma.f32(
@@ -485,26 +436,17 @@ entry:
 define double @f18() #0 {
 ; NOFMA-LABEL: f18:
 ; NOFMA:       # %bb.0: # %entry
-; NOFMA-NEXT:    pushq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 16
-; NOFMA-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; NOFMA-NEXT:    movaps %xmm0, %xmm1
-; NOFMA-NEXT:    movaps %xmm0, %xmm2
-; NOFMA-NEXT:    callq fma at PLT
-; NOFMA-NEXT:    popq %rax
-; NOFMA-NEXT:    .cfi_def_cfa_offset 8
+; NOFMA-NEXT:    movsd {{.*#+}} xmm0 = [1.8145100000000002E+3,0.0E+0]
 ; NOFMA-NEXT:    retq
 ;
 ; FMA-LABEL: f18:
 ; FMA:       # %bb.0: # %entry
-; FMA-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; FMA-NEXT:    vfmadd213sd {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA-NEXT:    vmovsd {{.*#+}} xmm0 = [1.8145100000000002E+3,0.0E+0]
 ; FMA-NEXT:    retq
 ;
 ; FMA4-LABEL: f18:
 ; FMA4:       # %bb.0: # %entry
-; FMA4-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; FMA4-NEXT:    vfmaddsd {{.*#+}} xmm0 = (xmm0 * xmm0) + xmm0
+; FMA4-NEXT:    vmovsd {{.*#+}} xmm0 = [1.8145100000000002E+3,0.0E+0]
 ; FMA4-NEXT:    retq
 entry:
   %result = call double @llvm.experimental.constrained.fma.f64(
diff --git a/llvm/test/CodeGen/X86/fp-intrinsics.ll b/llvm/test/CodeGen/X86/fp-intrinsics.ll
index 5d69a217fb402..1d1f396e5601e 100644
--- a/llvm/test/CodeGen/X86/fp-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/fp-intrinsics.ll
@@ -17,34 +17,24 @@
 define double @f1() #0 {
 ; X87-LABEL: f1:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    fld1
-; X87-NEXT:    fdivs {{\.?LCPI[0-9]+_[0-9]+}}
+; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    wait
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f1:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; X86-SSE-NEXT:    divsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    fldl (%esp)
+; X86-SSE-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X86-SSE-NEXT:    wait
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f1:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; SSE-NEXT:    divsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-NEXT:    movsd {{.*#+}} xmm0 = [1.0000000000000001E-1,0.0E+0]
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f1:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; AVX-NEXT:    vdivsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.0000000000000001E-1,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %div = call double @llvm.experimental.constrained.fdiv.f64(
@@ -66,35 +56,22 @@ entry:
 define double @f2(double %a) #0 {
 ; X87-LABEL: f2:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    fldz
-; X87-NEXT:    fsubrl {{[0-9]+}}(%esp)
+; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    wait
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f2:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE-NEXT:    xorpd %xmm1, %xmm1
-; X86-SSE-NEXT:    subsd %xmm1, %xmm0
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    fldl (%esp)
+; X86-SSE-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X86-SSE-NEXT:    wait
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f2:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    xorpd %xmm1, %xmm1
-; SSE-NEXT:    subsd %xmm1, %xmm0
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f2:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vsubsd %xmm1, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %sub = call double @llvm.experimental.constrained.fsub.f64(
@@ -117,12 +94,8 @@ entry:
 define double @f3(double %a, double %b) #0 {
 ; X87-LABEL: f3:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    fldz
-; X87-NEXT:    fchs
-; X87-NEXT:    fld %st(0)
-; X87-NEXT:    fsubl {{[0-9]+}}(%esp)
+; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fmull {{[0-9]+}}(%esp)
-; X87-NEXT:    fsubrp %st, %st(1)
 ; X87-NEXT:    wait
 ; X87-NEXT:    retl
 ;
@@ -130,11 +103,8 @@ define double @f3(double %a, double %b) #0 {
 ; X86-SSE:       # %bb.0: # %entry
 ; X86-SSE-NEXT:    subl $12, %esp
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = [-0.0E+0,0.0E+0]
-; X86-SSE-NEXT:    movapd %xmm0, %xmm1
-; X86-SSE-NEXT:    subsd {{[0-9]+}}(%esp), %xmm1
-; X86-SSE-NEXT:    mulsd {{[0-9]+}}(%esp), %xmm1
-; X86-SSE-NEXT:    subsd %xmm1, %xmm0
+; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; X86-SSE-NEXT:    mulsd {{[0-9]+}}(%esp), %xmm0
 ; X86-SSE-NEXT:    movsd %xmm0, (%esp)
 ; X86-SSE-NEXT:    fldl (%esp)
 ; X86-SSE-NEXT:    wait
@@ -144,20 +114,12 @@ define double @f3(double %a, double %b) #0 {
 ;
 ; SSE-LABEL: f3:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    movsd {{.*#+}} xmm2 = [-0.0E+0,0.0E+0]
-; SSE-NEXT:    movapd %xmm2, %xmm3
-; SSE-NEXT:    subsd %xmm0, %xmm3
-; SSE-NEXT:    mulsd %xmm1, %xmm3
-; SSE-NEXT:    subsd %xmm3, %xmm2
-; SSE-NEXT:    movapd %xmm2, %xmm0
+; SSE-NEXT:    mulsd %xmm1, %xmm0
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f3:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm2 = [-0.0E+0,0.0E+0]
-; AVX-NEXT:    vsubsd %xmm0, %xmm2, %xmm0
 ; AVX-NEXT:    vmulsd %xmm1, %xmm0, %xmm0
-; AVX-NEXT:    vsubsd %xmm0, %xmm2, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %sub = call double @llvm.experimental.constrained.fsub.f64(
@@ -197,7 +159,6 @@ define double @f4(i32 %n, double %a) #0 {
 ; X87-NEXT:  # %bb.1: # %if.then
 ; X87-NEXT:    fld1
 ; X87-NEXT:    faddp %st, %st(1)
-; X87-NEXT:    wait
 ; X87-NEXT:  .LBB3_2: # %if.end
 ; X87-NEXT:    retl
 ;
@@ -257,7 +218,6 @@ define double @f5() #0 {
 ; X87:       # %bb.0: # %entry
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fsqrt
-; X87-NEXT:    wait
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f5:
@@ -322,25 +282,15 @@ define double @f6() #0 {
 ;
 ; SSE-LABEL: f6:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; SSE-NEXT:    movsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
-; SSE-NEXT:    callq pow at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp pow at PLT # TAILCALL
 ;
 ; AVX-LABEL: f6:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
-; AVX-NEXT:    callq pow at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp pow at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.pow.f64(double 42.1,
                                                double 3.0,
@@ -353,49 +303,24 @@ entry:
 define double @f7() #0 {
 ; X87-LABEL: f7:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    subl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
-; X87-NEXT:    fstpl (%esp)
 ; X87-NEXT:    wait
-; X87-NEXT:    movl $3, {{[0-9]+}}(%esp)
-; X87-NEXT:    calll __powidf2
-; X87-NEXT:    addl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f7:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    movl $3, {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    calll __powidf2
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
+; X86-SSE-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
+; X86-SSE-NEXT:    wait
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f7:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; SSE-NEXT:    movl $3, %edi
-; SSE-NEXT:    callq __powidf2 at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
+; SSE-NEXT:    movsd {{.*#+}} xmm0 = [7.461846100000001E+4,0.0E+0]
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f7:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [7.461846100000001E+4,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %result = call double @llvm.experimental.constrained.powi.f64(double 42.1,
@@ -432,23 +357,13 @@ define double @f8() #0 {
 ;
 ; SSE-LABEL: f8:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq sin at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp sin at PLT # TAILCALL
 ;
 ; AVX-LABEL: f8:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq sin at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp sin at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.sin.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -483,23 +398,13 @@ define double @f9() #0 {
 ;
 ; SSE-LABEL: f9:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq cos at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp cos at PLT # TAILCALL
 ;
 ; AVX-LABEL: f9:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq cos at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp cos at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.cos.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -534,23 +439,13 @@ define double @f10() #0 {
 ;
 ; SSE-LABEL: f10:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq exp at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp exp at PLT # TAILCALL
 ;
 ; AVX-LABEL: f10:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq exp at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp exp at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.exp.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -585,23 +480,13 @@ define double @f11() #0 {
 ;
 ; SSE-LABEL: f11:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; SSE-NEXT:    callq exp2 at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp exp2 at PLT # TAILCALL
 ;
 ; AVX-LABEL: f11:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    callq exp2 at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp exp2 at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.exp2.f64(double 42.1,
                                                metadata !"round.dynamic",
@@ -636,23 +521,13 @@ define double @f12() #0 {
 ;
 ; SSE-LABEL: f12:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq log at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp log at PLT # TAILCALL
 ;
 ; AVX-LABEL: f12:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq log at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp log at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.log.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -687,23 +562,13 @@ define double @f13() #0 {
 ;
 ; SSE-LABEL: f13:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq log10 at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp log10 at PLT # TAILCALL
 ;
 ; AVX-LABEL: f13:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq log10 at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp log10 at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.log10.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -738,23 +603,13 @@ define double @f14() #0 {
 ;
 ; SSE-LABEL: f14:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq log2 at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp log2 at PLT # TAILCALL
 ;
 ; AVX-LABEL: f14:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq log2 at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp log2 at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.log2.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -789,13 +644,8 @@ define double @f15() #0 {
 ;
 ; SSE-LABEL: f15:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; SSE-NEXT:    callq rint at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp rint at PLT # TAILCALL
 ;
 ; AVX-LABEL: f15:
 ; AVX:       # %bb.0: # %entry
@@ -837,13 +687,8 @@ define double @f16() #0 {
 ;
 ; SSE-LABEL: f16:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; SSE-NEXT:    callq nearbyint at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp nearbyint at PLT # TAILCALL
 ;
 ; AVX-LABEL: f16:
 ; AVX:       # %bb.0: # %entry
@@ -861,51 +706,22 @@ entry:
 define double @f19() #0 {
 ; X87-LABEL: f19:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    subl $28, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 32
-; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
-; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fld1
-; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
-; X87-NEXT:    calll fmod
-; X87-NEXT:    addl $28, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f19:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $28, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 32
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+1,0.0E+0]
-; X86-SSE-NEXT:    movsd %xmm0, {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    calll fmod
-; X86-SSE-NEXT:    addl $28, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
+; X86-SSE-NEXT:    fld1
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f19:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; SSE-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; SSE-NEXT:    callq fmod at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f19:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
   %rem = call double @llvm.experimental.constrained.frem.f64(
@@ -1133,10 +949,10 @@ define i128 @f20s128(double %x) nounwind strictfp {
 ; X87-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X87-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X87-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X87-NEXT:    movl %edi, 8(%esi)
-; X87-NEXT:    movl %edx, 12(%esi)
-; X87-NEXT:    movl %eax, (%esi)
+; X87-NEXT:    movl %edi, 12(%esi)
+; X87-NEXT:    movl %edx, 8(%esi)
 ; X87-NEXT:    movl %ecx, 4(%esi)
+; X87-NEXT:    movl %eax, (%esi)
 ; X87-NEXT:    movl %esi, %eax
 ; X87-NEXT:    addl $36, %esp
 ; X87-NEXT:    popl %esi
@@ -1301,19 +1117,13 @@ define i32 @f20u(double %x) #0 {
 ; X86-SSE-LABEL: f20u:
 ; X86-SSE:       # %bb.0: # %entry
 ; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm2 = [2.147483648E+9,0.0E+0]
-; X86-SSE-NEXT:    comisd %xmm0, %xmm2
-; X86-SSE-NEXT:    xorpd %xmm1, %xmm1
-; X86-SSE-NEXT:    ja .LBB24_2
-; X86-SSE-NEXT:  # %bb.1: # %entry
-; X86-SSE-NEXT:    movapd %xmm2, %xmm1
-; X86-SSE-NEXT:  .LBB24_2: # %entry
-; X86-SSE-NEXT:    setbe %al
-; X86-SSE-NEXT:    movzbl %al, %ecx
-; X86-SSE-NEXT:    shll $31, %ecx
-; X86-SSE-NEXT:    subsd %xmm1, %xmm0
+; X86-SSE-NEXT:    cvttsd2si %xmm0, %ecx
+; X86-SSE-NEXT:    movl %ecx, %edx
+; X86-SSE-NEXT:    sarl $31, %edx
+; X86-SSE-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
 ; X86-SSE-NEXT:    cvttsd2si %xmm0, %eax
-; X86-SSE-NEXT:    xorl %ecx, %eax
+; X86-SSE-NEXT:    andl %edx, %eax
+; X86-SSE-NEXT:    orl %ecx, %eax
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f20u:
@@ -1351,14 +1161,12 @@ define i64 @f20u64(double %x) #0 {
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    wait
 ; X87-NEXT:    xorl %edx, %edx
-; X87-NEXT:    fcomi %st(1), %st
-; X87-NEXT:    wait
+; X87-NEXT:    fucomi %st(1), %st
 ; X87-NEXT:    setbe %dl
 ; X87-NEXT:    fldz
 ; X87-NEXT:    fcmovbe %st(1), %st
 ; X87-NEXT:    fstp %st(1)
 ; X87-NEXT:    fsubrp %st, %st(1)
-; X87-NEXT:    wait
 ; X87-NEXT:    fnstcw {{[0-9]+}}(%esp)
 ; X87-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
 ; X87-NEXT:    orl $3072, %eax # imm = 0xC00
@@ -1379,7 +1187,7 @@ define i64 @f20u64(double %x) #0 {
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 24
 ; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
 ; X86-SSE-NEXT:    movsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; X86-SSE-NEXT:    comisd %xmm0, %xmm1
+; X86-SSE-NEXT:    ucomisd %xmm0, %xmm1
 ; X86-SSE-NEXT:    jbe .LBB25_2
 ; X86-SSE-NEXT:  # %bb.1: # %entry
 ; X86-SSE-NEXT:    xorpd %xmm1, %xmm1
@@ -1406,36 +1214,24 @@ define i64 @f20u64(double %x) #0 {
 ;
 ; SSE-LABEL: f20u64:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    movsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
-; SSE-NEXT:    comisd %xmm2, %xmm0
-; SSE-NEXT:    xorpd %xmm1, %xmm1
-; SSE-NEXT:    jb .LBB25_2
-; SSE-NEXT:  # %bb.1: # %entry
-; SSE-NEXT:    movapd %xmm2, %xmm1
-; SSE-NEXT:  .LBB25_2: # %entry
-; SSE-NEXT:    subsd %xmm1, %xmm0
 ; SSE-NEXT:    cvttsd2si %xmm0, %rcx
-; SSE-NEXT:    setae %al
-; SSE-NEXT:    movzbl %al, %eax
-; SSE-NEXT:    shlq $63, %rax
-; SSE-NEXT:    xorq %rcx, %rax
+; SSE-NEXT:    movq %rcx, %rdx
+; SSE-NEXT:    sarq $63, %rdx
+; SSE-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-NEXT:    cvttsd2si %xmm0, %rax
+; SSE-NEXT:    andq %rdx, %rax
+; SSE-NEXT:    orq %rcx, %rax
 ; SSE-NEXT:    retq
 ;
 ; AVX1-LABEL: f20u64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm1, %xmm0
-; AVX1-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
-; AVX1-NEXT:    jb .LBB25_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovapd %xmm1, %xmm2
-; AVX1-NEXT:  .LBB25_2: # %entry
-; AVX1-NEXT:    vsubsd %xmm2, %xmm0, %xmm0
 ; AVX1-NEXT:    vcvttsd2si %xmm0, %rcx
-; AVX1-NEXT:    setae %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
+; AVX1-NEXT:    movq %rcx, %rdx
+; AVX1-NEXT:    sarq $63, %rdx
+; AVX1-NEXT:    vsubsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-NEXT:    vcvttsd2si %xmm0, %rax
+; AVX1-NEXT:    andq %rdx, %rax
+; AVX1-NEXT:    orq %rcx, %rax
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: f20u64:
@@ -1470,10 +1266,10 @@ define i128 @f20u128(double %x) nounwind strictfp {
 ; X87-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X87-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X87-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X87-NEXT:    movl %edi, 8(%esi)
-; X87-NEXT:    movl %edx, 12(%esi)
-; X87-NEXT:    movl %eax, (%esi)
+; X87-NEXT:    movl %edi, 12(%esi)
+; X87-NEXT:    movl %edx, 8(%esi)
 ; X87-NEXT:    movl %ecx, 4(%esi)
+; X87-NEXT:    movl %eax, (%esi)
 ; X87-NEXT:    movl %esi, %eax
 ; X87-NEXT:    addl $36, %esp
 ; X87-NEXT:    popl %esi
@@ -1523,39 +1319,24 @@ entry:
 define float @f21() #0 {
 ; X87-LABEL: f21:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    pushl %eax
-; X87-NEXT:    .cfi_def_cfa_offset 8
-; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
-; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    flds (%esp)
+; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    wait
-; X87-NEXT:    popl %eax
-; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f21:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    pushl %eax
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 8
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; X86-SSE-NEXT:    cvtsd2ss %xmm0, %xmm0
-; X86-SSE-NEXT:    movss %xmm0, (%esp)
-; X86-SSE-NEXT:    flds (%esp)
+; X86-SSE-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X86-SSE-NEXT:    wait
-; X86-SSE-NEXT:    popl %eax
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f21:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; SSE-NEXT:    cvtsd2ss %xmm0, %xmm0
+; SSE-NEXT:    movss {{.*#+}} xmm0 = [4.20999985E+1,0.0E+0,0.0E+0,0.0E+0]
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f21:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    vcvtsd2ss %xmm0, %xmm0, %xmm0
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.20999985E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %result = call float @llvm.experimental.constrained.fptrunc.f32.f64(
@@ -1603,43 +1384,29 @@ entry:
 define i32 @f23(double %x) #0 {
 ; X87-LABEL: f23:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    subl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 16
+; X87-NEXT:    pushl %eax
+; X87-NEXT:    .cfi_def_cfa_offset 8
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
-; X87-NEXT:    fstpl (%esp)
+; X87-NEXT:    fistpl (%esp)
 ; X87-NEXT:    wait
-; X87-NEXT:    calll lrint
-; X87-NEXT:    addl $12, %esp
+; X87-NEXT:    movl (%esp), %eax
+; X87-NEXT:    popl %ecx
 ; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f23:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    calll lrint
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
+; X86-SSE-NEXT:    cvtsd2si {{[0-9]+}}(%esp), %eax
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f23:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq lrint at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
+; SSE-NEXT:    cvtsd2si %xmm0, %eax
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f23:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq lrint at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vcvtsd2si %xmm0, %eax
 ; AVX-NEXT:    retq
 entry:
   %result = call i32 @llvm.experimental.constrained.lrint.i32.f64(double %x,
@@ -1651,43 +1418,29 @@ entry:
 define i32 @f24(float %x) #0 {
 ; X87-LABEL: f24:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    subl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 16
+; X87-NEXT:    pushl %eax
+; X87-NEXT:    .cfi_def_cfa_offset 8
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
-; X87-NEXT:    fstps (%esp)
+; X87-NEXT:    fistpl (%esp)
 ; X87-NEXT:    wait
-; X87-NEXT:    calll lrintf
-; X87-NEXT:    addl $12, %esp
+; X87-NEXT:    movl (%esp), %eax
+; X87-NEXT:    popl %ecx
 ; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
 ;
 ; X86-SSE-LABEL: f24:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-SSE-NEXT:    movss %xmm0, (%esp)
-; X86-SSE-NEXT:    calll lrintf
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
+; X86-SSE-NEXT:    cvtss2si {{[0-9]+}}(%esp), %eax
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f24:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq lrintf at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
+; SSE-NEXT:    cvtss2si %xmm0, %eax
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f24:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq lrintf at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vcvtss2si %xmm0, %eax
 ; AVX-NEXT:    retq
 entry:
   %result = call i32 @llvm.experimental.constrained.lrint.i32.f32(float %x,
@@ -1702,9 +1455,10 @@ define i64 @f25(double %x) #0 {
 ; X87-NEXT:    subl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
-; X87-NEXT:    fstpl (%esp)
+; X87-NEXT:    fistpll (%esp)
 ; X87-NEXT:    wait
-; X87-NEXT:    calll llrint
+; X87-NEXT:    movl (%esp), %eax
+; X87-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
@@ -1715,27 +1469,23 @@ define i64 @f25(double %x) #0 {
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
 ; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
 ; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    calll llrint
+; X86-SSE-NEXT:    fldl (%esp)
+; X86-SSE-NEXT:    fistpll (%esp)
+; X86-SSE-NEXT:    wait
+; X86-SSE-NEXT:    movl (%esp), %eax
+; X86-SSE-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-SSE-NEXT:    addl $12, %esp
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f25:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq llrint at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
+; SSE-NEXT:    cvtsd2si %xmm0, %rax
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f25:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq llrint at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vcvtsd2si %xmm0, %rax
 ; AVX-NEXT:    retq
 entry:
   %result = call i64 @llvm.experimental.constrained.llrint.i64.f64(double %x,
@@ -1750,9 +1500,10 @@ define i64 @f26(float %x) #0 {
 ; X87-NEXT:    subl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
-; X87-NEXT:    fstps (%esp)
+; X87-NEXT:    fistpll (%esp)
 ; X87-NEXT:    wait
-; X87-NEXT:    calll llrintf
+; X87-NEXT:    movl (%esp), %eax
+; X87-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
 ; X87-NEXT:    retl
@@ -1763,27 +1514,23 @@ define i64 @f26(float %x) #0 {
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
 ; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; X86-SSE-NEXT:    movss %xmm0, (%esp)
-; X86-SSE-NEXT:    calll llrintf
+; X86-SSE-NEXT:    flds (%esp)
+; X86-SSE-NEXT:    fistpll (%esp)
+; X86-SSE-NEXT:    wait
+; X86-SSE-NEXT:    movl (%esp), %eax
+; X86-SSE-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-SSE-NEXT:    addl $12, %esp
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: f26:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq llrintf at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
+; SSE-NEXT:    cvtss2si %xmm0, %rax
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: f26:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq llrintf at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vcvtss2si %xmm0, %rax
 ; AVX-NEXT:    retq
 entry:
   %result = call i64 @llvm.experimental.constrained.llrint.i64.f32(float %x,
@@ -1795,44 +1542,19 @@ entry:
 define i32 @f27(double %x) #0 {
 ; X87-LABEL: f27:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    subl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 16
-; X87-NEXT:    fldl {{[0-9]+}}(%esp)
-; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
-; X87-NEXT:    calll lround
-; X87-NEXT:    addl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 4
-; X87-NEXT:    retl
+; X87-NEXT:    jmp lround # TAILCALL
 ;
 ; X86-SSE-LABEL: f27:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE-NEXT:    calll lround
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
-; X86-SSE-NEXT:    retl
+; X86-SSE-NEXT:    jmp lround # TAILCALL
 ;
 ; SSE-LABEL: f27:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq lround at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp lround at PLT # TAILCALL
 ;
 ; AVX-LABEL: f27:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq lround at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp lround at PLT # TAILCALL
 entry:
   %result = call i32 @llvm.experimental.constrained.lround.i32.f64(double %x,
                                                metadata !"fpexcept.strict") #0
@@ -1842,44 +1564,19 @@ entry:
 define i32 @f28(float %x) #0 {
 ; X87-LABEL: f28:
 ; X87:       # %bb.0: # %entry
-; X87-NEXT:    subl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 16
-; X87-NEXT:    flds {{[0-9]+}}(%esp)
-; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
-; X87-NEXT:    calll lroundf
-; X87-NEXT:    addl $12, %esp
-; X87-NEXT:    .cfi_def_cfa_offset 4
-; X87-NEXT:    retl
+; X87-NEXT:    jmp lroundf # TAILCALL
 ;
 ; X86-SSE-LABEL: f28:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
-; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-SSE-NEXT:    movss %xmm0, (%esp)
-; X86-SSE-NEXT:    calll lroundf
-; X86-SSE-NEXT:    addl $12, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
-; X86-SSE-NEXT:    retl
+; X86-SSE-NEXT:    jmp lroundf # TAILCALL
 ;
 ; SSE-LABEL: f28:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq lroundf at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp lroundf at PLT # TAILCALL
 ;
 ; AVX-LABEL: f28:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq lroundf at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp lroundf at PLT # TAILCALL
 entry:
   %result = call i32 @llvm.experimental.constrained.lround.i32.f32(float %x,
                                                metadata !"fpexcept.strict") #0
@@ -1912,21 +1609,11 @@ define i64 @f29(double %x) #0 {
 ;
 ; SSE-LABEL: f29:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq llround at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp llround at PLT # TAILCALL
 ;
 ; AVX-LABEL: f29:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq llround at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp llround at PLT # TAILCALL
 entry:
   %result = call i64 @llvm.experimental.constrained.llround.i64.f64(double %x,
                                                metadata !"fpexcept.strict") #0
@@ -1959,21 +1646,11 @@ define i64 @f30(float %x) #0 {
 ;
 ; SSE-LABEL: f30:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
-; SSE-NEXT:    callq llroundf at PLT
-; SSE-NEXT:    popq %rcx
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp llroundf at PLT # TAILCALL
 ;
 ; AVX-LABEL: f30:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    callq llroundf at PLT
-; AVX-NEXT:    popq %rcx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp llroundf at PLT # TAILCALL
 entry:
   %result = call i64 @llvm.experimental.constrained.llround.i64.f32(float %x,
                                                metadata !"fpexcept.strict") #0
@@ -2418,16 +2095,15 @@ define double @uifdi(i32 %x) #0 {
 ;
 ; X86-SSE-LABEL: uifdi:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $20, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 24
-; X86-SSE-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-SSE-NEXT:    movl %eax, (%esp)
-; X86-SSE-NEXT:    movl $0, {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    fildll (%esp)
-; X86-SSE-NEXT:    fstpl {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    fldl {{[0-9]+}}(%esp)
+; X86-SSE-NEXT:    subl $12, %esp
+; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
+; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X86-SSE-NEXT:    orpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    movsd %xmm0, (%esp)
+; X86-SSE-NEXT:    fldl (%esp)
 ; X86-SSE-NEXT:    wait
-; X86-SSE-NEXT:    addl $20, %esp
+; X86-SSE-NEXT:    addl $12, %esp
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
@@ -2475,54 +2151,38 @@ define double @uifdl(i64 %x) #0 {
 ;
 ; X86-SSE-LABEL: uifdl:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $28, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 32
-; X86-SSE-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE-NEXT:    movlps %xmm0, {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    shrl $31, %eax
-; X86-SSE-NEXT:    fildll {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; X86-SSE-NEXT:    fstpl {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    wait
+; X86-SSE-NEXT:    subl $12, %esp
+; X86-SSE-NEXT:    .cfi_def_cfa_offset 16
 ; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE-NEXT:    movsd %xmm0, (%esp)
+; X86-SSE-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; X86-SSE-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    movapd %xmm0, %xmm1
+; X86-SSE-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
+; X86-SSE-NEXT:    addsd %xmm0, %xmm1
+; X86-SSE-NEXT:    movsd %xmm1, (%esp)
 ; X86-SSE-NEXT:    fldl (%esp)
 ; X86-SSE-NEXT:    wait
-; X86-SSE-NEXT:    addl $28, %esp
+; X86-SSE-NEXT:    addl $12, %esp
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
 ; SSE-LABEL: uifdl:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    movq %rdi, %rax
-; SSE-NEXT:    shrq %rax
-; SSE-NEXT:    movl %edi, %ecx
-; SSE-NEXT:    andl $1, %ecx
-; SSE-NEXT:    orq %rax, %rcx
-; SSE-NEXT:    testq %rdi, %rdi
-; SSE-NEXT:    cmovnsq %rdi, %rcx
-; SSE-NEXT:    cvtsi2sd %rcx, %xmm0
-; SSE-NEXT:    jns .LBB48_2
-; SSE-NEXT:  # %bb.1:
-; SSE-NEXT:    addsd %xmm0, %xmm0
-; SSE-NEXT:  .LBB48_2: # %entry
+; SSE-NEXT:    movq %rdi, %xmm1
+; SSE-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
+; SSE-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE-NEXT:    movapd %xmm1, %xmm0
+; SSE-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
+; SSE-NEXT:    addsd %xmm1, %xmm0
 ; SSE-NEXT:    retq
 ;
 ; AVX1-LABEL: uifdl:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    movq %rdi, %rax
-; AVX1-NEXT:    shrq %rax
-; AVX1-NEXT:    movl %edi, %ecx
-; AVX1-NEXT:    andl $1, %ecx
-; AVX1-NEXT:    orq %rax, %rcx
-; AVX1-NEXT:    testq %rdi, %rdi
-; AVX1-NEXT:    cmovnsq %rdi, %rcx
-; AVX1-NEXT:    vcvtsi2sd %rcx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB48_2
-; AVX1-NEXT:  # %bb.1:
-; AVX1-NEXT:    vaddsd %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB48_2: # %entry
+; AVX1-NEXT:    vmovq %rdi, %xmm0
+; AVX1-NEXT:    vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; AVX1-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
+; AVX1-NEXT:    vaddsd %xmm0, %xmm1, %xmm0
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: uifdl:
@@ -2640,16 +2300,16 @@ define float @uiffi(i32 %x) #0 {
 ;
 ; X86-SSE-LABEL: uiffi:
 ; X86-SSE:       # %bb.0: # %entry
-; X86-SSE-NEXT:    subl $20, %esp
-; X86-SSE-NEXT:    .cfi_def_cfa_offset 24
-; X86-SSE-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-SSE-NEXT:    movl %eax, {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    movl $0, {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    fildll {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    fstps {{[0-9]+}}(%esp)
-; X86-SSE-NEXT:    flds {{[0-9]+}}(%esp)
+; X86-SSE-NEXT:    pushl %eax
+; X86-SSE-NEXT:    .cfi_def_cfa_offset 8
+; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X86-SSE-NEXT:    orpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    cvtsd2ss %xmm0, %xmm0
+; X86-SSE-NEXT:    movss %xmm0, (%esp)
+; X86-SSE-NEXT:    flds (%esp)
 ; X86-SSE-NEXT:    wait
-; X86-SSE-NEXT:    addl $20, %esp
+; X86-SSE-NEXT:    popl %eax
 ; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
 ; X86-SSE-NEXT:    retl
 ;
@@ -2717,34 +2377,34 @@ define float @uiffl(i64 %x) #0 {
 ;
 ; SSE-LABEL: uiffl:
 ; SSE:       # %bb.0: # %entry
+; SSE-NEXT:    testq %rdi, %rdi
+; SSE-NEXT:    js .LBB52_1
+; SSE-NEXT:  # %bb.2: # %entry
+; SSE-NEXT:    cvtsi2ss %rdi, %xmm0
+; SSE-NEXT:    retq
+; SSE-NEXT:  .LBB52_1:
 ; SSE-NEXT:    movq %rdi, %rax
 ; SSE-NEXT:    shrq %rax
-; SSE-NEXT:    movl %edi, %ecx
-; SSE-NEXT:    andl $1, %ecx
-; SSE-NEXT:    orq %rax, %rcx
-; SSE-NEXT:    testq %rdi, %rdi
-; SSE-NEXT:    cmovnsq %rdi, %rcx
-; SSE-NEXT:    cvtsi2ss %rcx, %xmm0
-; SSE-NEXT:    jns .LBB52_2
-; SSE-NEXT:  # %bb.1:
+; SSE-NEXT:    andl $1, %edi
+; SSE-NEXT:    orq %rax, %rdi
+; SSE-NEXT:    cvtsi2ss %rdi, %xmm0
 ; SSE-NEXT:    addss %xmm0, %xmm0
-; SSE-NEXT:  .LBB52_2: # %entry
 ; SSE-NEXT:    retq
 ;
 ; AVX1-LABEL: uiffl:
 ; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    testq %rdi, %rdi
+; AVX1-NEXT:    js .LBB52_1
+; AVX1-NEXT:  # %bb.2: # %entry
+; AVX1-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
+; AVX1-NEXT:    retq
+; AVX1-NEXT:  .LBB52_1:
 ; AVX1-NEXT:    movq %rdi, %rax
 ; AVX1-NEXT:    shrq %rax
-; AVX1-NEXT:    movl %edi, %ecx
-; AVX1-NEXT:    andl $1, %ecx
-; AVX1-NEXT:    orq %rax, %rcx
-; AVX1-NEXT:    testq %rdi, %rdi
-; AVX1-NEXT:    cmovnsq %rdi, %rcx
-; AVX1-NEXT:    vcvtsi2ss %rcx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB52_2
-; AVX1-NEXT:  # %bb.1:
+; AVX1-NEXT:    andl $1, %edi
+; AVX1-NEXT:    orq %rax, %rdi
+; AVX1-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
 ; AVX1-NEXT:    vaddss %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB52_2: # %entry
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: uiffl:
@@ -2785,23 +2445,13 @@ define double @ftan() #0 {
 ;
 ; SSE-LABEL: ftan:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq tan at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp tan at PLT # TAILCALL
 ;
 ; AVX-LABEL: ftan:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq tan at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp tan at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.tan.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -2836,23 +2486,13 @@ define double @facos() #0 {
 ;
 ; SSE-LABEL: facos:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq acos at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp acos at PLT # TAILCALL
 ;
 ; AVX-LABEL: facos:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq acos at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp acos at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.acos.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -2887,23 +2527,13 @@ define double @fasin() #0 {
 ;
 ; SSE-LABEL: fasin:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq asin at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp asin at PLT # TAILCALL
 ;
 ; AVX-LABEL: fasin:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq asin at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp asin at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.asin.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -2938,23 +2568,13 @@ define double @fatan() #0 {
 ;
 ; SSE-LABEL: fatan:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq atan at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp atan at PLT # TAILCALL
 ;
 ; AVX-LABEL: fatan:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq atan at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp atan at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.atan.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -2993,25 +2613,15 @@ define double @fatan2() #0 {
 ;
 ; SSE-LABEL: fatan2:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; SSE-NEXT:    movsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
-; SSE-NEXT:    callq atan2 at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp atan2 at PLT # TAILCALL
 ;
 ; AVX-LABEL: fatan2:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
-; AVX-NEXT:    callq atan2 at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp atan2 at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.atan2.f64(double 42.1,
                                                double 3.0,
@@ -3047,23 +2657,13 @@ define double @fcosh() #0 {
 ;
 ; SSE-LABEL: fcosh:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq cosh at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp cosh at PLT # TAILCALL
 ;
 ; AVX-LABEL: fcosh:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq cosh at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp cosh at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.cosh.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -3098,23 +2698,13 @@ define double @fsinh() #0 {
 ;
 ; SSE-LABEL: fsinh:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq sinh at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp sinh at PLT # TAILCALL
 ;
 ; AVX-LABEL: fsinh:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq sinh at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp sinh at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.sinh.f64(double 42.0,
                                                metadata !"round.dynamic",
@@ -3149,23 +2739,13 @@ define double @ftanh() #0 {
 ;
 ; SSE-LABEL: ftanh:
 ; SSE:       # %bb.0: # %entry
-; SSE-NEXT:    pushq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 16
 ; SSE-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; SSE-NEXT:    callq tanh at PLT
-; SSE-NEXT:    popq %rax
-; SSE-NEXT:    .cfi_def_cfa_offset 8
-; SSE-NEXT:    retq
+; SSE-NEXT:    jmp tanh at PLT # TAILCALL
 ;
 ; AVX-LABEL: ftanh:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq tanh at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX-NEXT:    jmp tanh at PLT # TAILCALL
 entry:
   %result = call double @llvm.experimental.constrained.tanh.f64(double 42.0,
                                                metadata !"round.dynamic",
diff --git a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
index 74291fbb75e81..d3c4c1252dff5 100644
--- a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
@@ -74,9 +74,8 @@ define float @frem(float %x, float %y) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    subl $20, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
-; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
-; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
+; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _fmod
@@ -128,9 +127,8 @@ define float @pow(float %x, float %y) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    subl $20, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
-; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
-; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
+; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _pow
@@ -233,9 +231,8 @@ define float @atan2(float %x, float %y) #0 {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    subl $20, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
-; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
-; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
+; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _atan2
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-fp16.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-fp16.ll
index 61a0c4eda8c72..1baa9375634c0 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-fp16.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-fp16.ll
@@ -34,17 +34,9 @@ define half @fadd_f16(half %a, half %b) nounwind strictfp {
 ;
 ; AVX-LABEL: fadd_f16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    vpextrw $0, %xmm1, %ecx
-; AVX-NEXT:    movzwl %cx, %ecx
-; AVX-NEXT:    vmovd %ecx, %xmm0
-; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm1
 ; AVX-NEXT:    vcvtph2ps %xmm1, %xmm1
-; AVX-NEXT:    vaddss %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
+; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
+; AVX-NEXT:    vaddss %xmm1, %xmm0, %xmm0
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -82,17 +74,9 @@ define half @fsub_f16(half %a, half %b) nounwind strictfp {
 ;
 ; AVX-LABEL: fsub_f16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    vpextrw $0, %xmm1, %ecx
-; AVX-NEXT:    movzwl %cx, %ecx
-; AVX-NEXT:    vmovd %ecx, %xmm0
-; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm1
 ; AVX-NEXT:    vcvtph2ps %xmm1, %xmm1
-; AVX-NEXT:    vsubss %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
+; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
+; AVX-NEXT:    vsubss %xmm1, %xmm0, %xmm0
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -130,17 +114,9 @@ define half @fmul_f16(half %a, half %b) nounwind strictfp {
 ;
 ; AVX-LABEL: fmul_f16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    vpextrw $0, %xmm1, %ecx
-; AVX-NEXT:    movzwl %cx, %ecx
-; AVX-NEXT:    vmovd %ecx, %xmm0
-; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm1
 ; AVX-NEXT:    vcvtph2ps %xmm1, %xmm1
-; AVX-NEXT:    vmulss %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
+; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
+; AVX-NEXT:    vmulss %xmm1, %xmm0, %xmm0
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -178,17 +154,9 @@ define half @fdiv_f16(half %a, half %b) nounwind strictfp {
 ;
 ; AVX-LABEL: fdiv_f16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    vpextrw $0, %xmm1, %ecx
-; AVX-NEXT:    movzwl %cx, %ecx
-; AVX-NEXT:    vmovd %ecx, %xmm0
-; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm1
 ; AVX-NEXT:    vcvtph2ps %xmm1, %xmm1
-; AVX-NEXT:    vdivss %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
+; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
+; AVX-NEXT:    vdivss %xmm1, %xmm0, %xmm0
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -221,8 +189,7 @@ define void @fpext_f16_to_f32(ptr %val, ptr %ret) nounwind strictfp {
 ;
 ; AVX-LABEL: fpext_f16_to_f32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    movzwl (%rdi), %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
+; AVX-NEXT:    vpinsrw $0, (%rdi), %xmm0, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vmovss %xmm0, (%rsi)
 ; AVX-NEXT:    retq
@@ -263,8 +230,7 @@ define void @fpext_f16_to_f64(ptr %val, ptr %ret) nounwind strictfp {
 ;
 ; AVX-LABEL: fpext_f16_to_f64:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    movzwl (%rdi), %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
+; AVX-NEXT:    vpinsrw $0, (%rdi), %xmm0, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    vmovsd %xmm0, (%rsi)
@@ -395,12 +361,9 @@ define void @fsqrt_f16(ptr %a) nounwind strictfp {
 ;
 ; AVX-LABEL: fsqrt_f16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    movzwl (%rdi), %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
+; AVX-NEXT:    vpinsrw $0, (%rdi), %xmm0, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vsqrtss %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    vpextrw $0, %xmm0, (%rdi)
 ; AVX-NEXT:    retq
@@ -431,8 +394,9 @@ define half @fma_f16(half %a, half %b, half %c) nounwind strictfp {
 ; SSE2-LABEL: fma_f16:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    subq $24, %rsp
-; SSE2-NEXT:    movss %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
 ; SSE2-NEXT:    movss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; SSE2-NEXT:    movss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
+; SSE2-NEXT:    movaps %xmm2, %xmm0
 ; SSE2-NEXT:    callq __extendhfsf2 at PLT
 ; SSE2-NEXT:    movss %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
 ; SSE2-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
@@ -442,15 +406,13 @@ define half @fma_f16(half %a, half %b, half %c) nounwind strictfp {
 ; SSE2-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; SSE2-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; SSE2-NEXT:    callq __extendhfsf2 at PLT
-; SSE2-NEXT:    xorps %xmm2, %xmm2
-; SSE2-NEXT:    cvtss2sd %xmm0, %xmm2
-; SSE2-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
-; SSE2-NEXT:    # xmm0 = mem[0],zero,zero,zero
-; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    cvtss2sd %xmm0, %xmm1
-; SSE2-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
-; SSE2-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; SSE2-NEXT:    cvtss2sd %xmm0, %xmm0
+; SSE2-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 4-byte Reload
+; SSE2-NEXT:    # xmm1 = mem[0],zero,zero,zero
+; SSE2-NEXT:    cvtss2sd %xmm1, %xmm1
+; SSE2-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm2 # 4-byte Reload
+; SSE2-NEXT:    # xmm2 = mem[0],zero,zero,zero
+; SSE2-NEXT:    cvtss2sd %xmm2, %xmm2
 ; SSE2-NEXT:    callq fma at PLT
 ; SSE2-NEXT:    callq __truncdfhf2 at PLT
 ; SSE2-NEXT:    addq $24, %rsp
@@ -459,21 +421,12 @@ define half @fma_f16(half %a, half %b, half %c) nounwind strictfp {
 ; F16C-LABEL: fma_f16:
 ; F16C:       # %bb.0:
 ; F16C-NEXT:    pushq %rax
-; F16C-NEXT:    vpextrw $0, %xmm0, %eax
-; F16C-NEXT:    vpextrw $0, %xmm1, %ecx
-; F16C-NEXT:    vpextrw $0, %xmm2, %edx
-; F16C-NEXT:    movzwl %dx, %edx
-; F16C-NEXT:    vmovd %edx, %xmm0
-; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; F16C-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm2
-; F16C-NEXT:    movzwl %cx, %ecx
-; F16C-NEXT:    vmovd %ecx, %xmm0
-; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; F16C-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm1
-; F16C-NEXT:    movzwl %ax, %eax
-; F16C-NEXT:    vmovd %eax, %xmm0
 ; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; F16C-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
+; F16C-NEXT:    vcvtph2ps %xmm1, %xmm1
+; F16C-NEXT:    vcvtss2sd %xmm1, %xmm1, %xmm1
+; F16C-NEXT:    vcvtph2ps %xmm2, %xmm2
+; F16C-NEXT:    vcvtss2sd %xmm2, %xmm2, %xmm2
 ; F16C-NEXT:    callq fma at PLT
 ; F16C-NEXT:    callq __truncdfhf2 at PLT
 ; F16C-NEXT:    popq %rax
@@ -482,22 +435,13 @@ define half @fma_f16(half %a, half %b, half %c) nounwind strictfp {
 ; AVX512-LABEL: fma_f16:
 ; AVX512:       # %bb.0:
 ; AVX512-NEXT:    pushq %rax
-; AVX512-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX512-NEXT:    vpextrw $0, %xmm1, %ecx
-; AVX512-NEXT:    vpextrw $0, %xmm2, %edx
-; AVX512-NEXT:    movzwl %dx, %edx
-; AVX512-NEXT:    vmovd %edx, %xmm0
-; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX512-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm1
-; AVX512-NEXT:    movzwl %cx, %ecx
-; AVX512-NEXT:    vmovd %ecx, %xmm0
-; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX512-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm2
-; AVX512-NEXT:    movzwl %ax, %eax
-; AVX512-NEXT:    vmovd %eax, %xmm0
+; AVX512-NEXT:    vcvtph2ps %xmm2, %xmm2
+; AVX512-NEXT:    vcvtss2sd %xmm2, %xmm2, %xmm2
 ; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
+; AVX512-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm3
+; AVX512-NEXT:    vcvtph2ps %xmm1, %xmm0
 ; AVX512-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
-; AVX512-NEXT:    vfmadd213sd {{.*#+}} xmm0 = (xmm2 * xmm0) + xmm1
+; AVX512-NEXT:    vfmadd213sd {{.*#+}} xmm0 = (xmm3 * xmm0) + xmm2
 ; AVX512-NEXT:    callq __truncdfhf2 at PLT
 ; AVX512-NEXT:    popq %rax
 ; AVX512-NEXT:    retq
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint-fp16.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint-fp16.ll
index 0498f9b7f9a3d..3e624284d7f16 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint-fp16.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint-fp16.ll
@@ -28,9 +28,6 @@ define i1 @fptosi_f16toi1(half %x) #0 {
 ;
 ; AVX-LABEL: fptosi_f16toi1:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    # kill: def $al killed $al killed $eax
@@ -64,9 +61,6 @@ define i8 @fptosi_f16toi8(half %x) #0 {
 ;
 ; AVX-LABEL: fptosi_f16toi8:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    # kill: def $al killed $al killed $eax
@@ -100,9 +94,6 @@ define i16 @fptosi_f16toi16(half %x) #0 {
 ;
 ; AVX-LABEL: fptosi_f16toi16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    # kill: def $ax killed $ax killed $eax
@@ -135,9 +126,6 @@ define i32 @fptosi_f16toi32(half %x) #0 {
 ;
 ; AVX-LABEL: fptosi_f16toi32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    retq
@@ -167,9 +155,6 @@ define i64 @fptosi_f16toi64(half %x) #0 {
 ;
 ; AVX-LABEL: fptosi_f16toi64:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %rax
 ; AVX-NEXT:    retq
@@ -203,9 +188,6 @@ define i1 @fptoui_f16toi1(half %x) #0 {
 ;
 ; AVX-LABEL: fptoui_f16toi1:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    # kill: def $al killed $al killed $eax
@@ -239,9 +221,6 @@ define i8 @fptoui_f16toi8(half %x) #0 {
 ;
 ; AVX-LABEL: fptoui_f16toi8:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    # kill: def $al killed $al killed $eax
@@ -275,9 +254,6 @@ define i16 @fptoui_f16toi16(half %x) #0 {
 ;
 ; AVX-LABEL: fptoui_f16toi16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vcvttss2si %xmm0, %eax
 ; AVX-NEXT:    # kill: def $ax killed $ax killed $eax
@@ -311,9 +287,6 @@ define i32 @fptoui_f16toi32(half %x) #0 {
 ;
 ; F16C-LABEL: fptoui_f16toi32:
 ; F16C:       # %bb.0:
-; F16C-NEXT:    vpextrw $0, %xmm0, %eax
-; F16C-NEXT:    movzwl %ax, %eax
-; F16C-NEXT:    vmovd %eax, %xmm0
 ; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; F16C-NEXT:    vcvttss2si %xmm0, %rax
 ; F16C-NEXT:    # kill: def $eax killed $eax killed $rax
@@ -321,9 +294,6 @@ define i32 @fptoui_f16toi32(half %x) #0 {
 ;
 ; AVX512-LABEL: fptoui_f16toi32:
 ; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX512-NEXT:    movzwl %ax, %eax
-; AVX512-NEXT:    vmovd %eax, %xmm0
 ; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX512-NEXT:    vcvttss2usi %xmm0, %eax
 ; AVX512-NEXT:    retq
@@ -347,48 +317,30 @@ define i64 @fptoui_f16toi64(half %x) #0 {
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    pushq %rax
 ; SSE2-NEXT:    callq __extendhfsf2 at PLT
-; SSE2-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE2-NEXT:    comiss %xmm2, %xmm0
-; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    jb .LBB9_2
-; SSE2-NEXT:  # %bb.1:
-; SSE2-NEXT:    movaps %xmm2, %xmm1
-; SSE2-NEXT:  .LBB9_2:
-; SSE2-NEXT:    subss %xmm1, %xmm0
 ; SSE2-NEXT:    cvttss2si %xmm0, %rcx
-; SSE2-NEXT:    setae %al
-; SSE2-NEXT:    movzbl %al, %eax
-; SSE2-NEXT:    shlq $63, %rax
-; SSE2-NEXT:    xorq %rcx, %rax
+; SSE2-NEXT:    movq %rcx, %rdx
+; SSE2-NEXT:    sarq $63, %rdx
+; SSE2-NEXT:    subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE2-NEXT:    cvttss2si %xmm0, %rax
+; SSE2-NEXT:    andq %rdx, %rax
+; SSE2-NEXT:    orq %rcx, %rax
 ; SSE2-NEXT:    popq %rcx
 ; SSE2-NEXT:    retq
 ;
 ; F16C-LABEL: fptoui_f16toi64:
 ; F16C:       # %bb.0:
-; F16C-NEXT:    vpextrw $0, %xmm0, %eax
-; F16C-NEXT:    movzwl %ax, %eax
-; F16C-NEXT:    vmovd %eax, %xmm0
 ; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; F16C-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; F16C-NEXT:    vcomiss %xmm1, %xmm0
-; F16C-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; F16C-NEXT:    jb .LBB9_2
-; F16C-NEXT:  # %bb.1:
-; F16C-NEXT:    vmovaps %xmm1, %xmm2
-; F16C-NEXT:  .LBB9_2:
-; F16C-NEXT:    vsubss %xmm2, %xmm0, %xmm0
 ; F16C-NEXT:    vcvttss2si %xmm0, %rcx
-; F16C-NEXT:    setae %al
-; F16C-NEXT:    movzbl %al, %eax
-; F16C-NEXT:    shlq $63, %rax
-; F16C-NEXT:    xorq %rcx, %rax
+; F16C-NEXT:    movq %rcx, %rdx
+; F16C-NEXT:    sarq $63, %rdx
+; F16C-NEXT:    vsubss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; F16C-NEXT:    vcvttss2si %xmm0, %rax
+; F16C-NEXT:    andq %rdx, %rax
+; F16C-NEXT:    orq %rcx, %rax
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: fptoui_f16toi64:
 ; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX512-NEXT:    movzwl %ax, %eax
-; AVX512-NEXT:    vmovd %eax, %xmm0
 ; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX512-NEXT:    vcvttss2usi %xmm0, %rax
 ; AVX512-NEXT:    retq
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint.ll
index ecdc507a882c3..d9b08c3277e43 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-fptoint.ll
@@ -447,19 +447,13 @@ define i32 @fptoui_f32toi32(float %x) #0 {
 ; SSE-X86-LABEL: fptoui_f32toi32:
 ; SSE-X86:       # %bb.0:
 ; SSE-X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; SSE-X86-NEXT:    movss {{.*#+}} xmm2 = [2.14748365E+9,0.0E+0,0.0E+0,0.0E+0]
-; SSE-X86-NEXT:    comiss %xmm0, %xmm2
-; SSE-X86-NEXT:    xorps %xmm1, %xmm1
-; SSE-X86-NEXT:    ja .LBB8_2
-; SSE-X86-NEXT:  # %bb.1:
-; SSE-X86-NEXT:    movaps %xmm2, %xmm1
-; SSE-X86-NEXT:  .LBB8_2:
-; SSE-X86-NEXT:    setbe %al
-; SSE-X86-NEXT:    movzbl %al, %ecx
-; SSE-X86-NEXT:    shll $31, %ecx
-; SSE-X86-NEXT:    subss %xmm1, %xmm0
+; SSE-X86-NEXT:    cvttss2si %xmm0, %ecx
+; SSE-X86-NEXT:    movl %ecx, %edx
+; SSE-X86-NEXT:    sarl $31, %edx
+; SSE-X86-NEXT:    subss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
 ; SSE-X86-NEXT:    cvttss2si %xmm0, %eax
-; SSE-X86-NEXT:    xorl %ecx, %eax
+; SSE-X86-NEXT:    andl %edx, %eax
+; SSE-X86-NEXT:    orl %ecx, %eax
 ; SSE-X86-NEXT:    retl
 ;
 ; SSE-X64-LABEL: fptoui_f32toi32:
@@ -470,22 +464,14 @@ define i32 @fptoui_f32toi32(float %x) #0 {
 ;
 ; AVX1-X86-LABEL: fptoui_f32toi32:
 ; AVX1-X86:       # %bb.0:
-; AVX1-X86-NEXT:    pushl %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa_offset 8
-; AVX1-X86-NEXT:    .cfi_offset %ebp, -8
-; AVX1-X86-NEXT:    movl %esp, %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa_register %ebp
-; AVX1-X86-NEXT:    andl $-8, %esp
-; AVX1-X86-NEXT:    subl $8, %esp
 ; AVX1-X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX1-X86-NEXT:    vmovss %xmm0, (%esp)
-; AVX1-X86-NEXT:    flds (%esp)
-; AVX1-X86-NEXT:    fisttpll (%esp)
-; AVX1-X86-NEXT:    wait
-; AVX1-X86-NEXT:    movl (%esp), %eax
-; AVX1-X86-NEXT:    movl %ebp, %esp
-; AVX1-X86-NEXT:    popl %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa %esp, 4
+; AVX1-X86-NEXT:    vcvttss2si %xmm0, %ecx
+; AVX1-X86-NEXT:    movl %ecx, %edx
+; AVX1-X86-NEXT:    sarl $31, %edx
+; AVX1-X86-NEXT:    vsubss {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vcvttss2si %xmm0, %eax
+; AVX1-X86-NEXT:    andl %edx, %eax
+; AVX1-X86-NEXT:    orl %ecx, %eax
 ; AVX1-X86-NEXT:    retl
 ;
 ; AVX1-X64-LABEL: fptoui_f32toi32:
@@ -544,7 +530,7 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ; SSE-X86-NEXT:    subl $16, %esp
 ; SSE-X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; SSE-X86-NEXT:    movss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-X86-NEXT:    comiss %xmm0, %xmm1
+; SSE-X86-NEXT:    ucomiss %xmm0, %xmm1
 ; SSE-X86-NEXT:    jbe .LBB9_2
 ; SSE-X86-NEXT:  # %bb.1:
 ; SSE-X86-NEXT:    xorps %xmm1, %xmm1
@@ -572,19 +558,13 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ;
 ; SSE-X64-LABEL: fptoui_f32toi64:
 ; SSE-X64:       # %bb.0:
-; SSE-X64-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-X64-NEXT:    comiss %xmm2, %xmm0
-; SSE-X64-NEXT:    xorps %xmm1, %xmm1
-; SSE-X64-NEXT:    jb .LBB9_2
-; SSE-X64-NEXT:  # %bb.1:
-; SSE-X64-NEXT:    movaps %xmm2, %xmm1
-; SSE-X64-NEXT:  .LBB9_2:
-; SSE-X64-NEXT:    subss %xmm1, %xmm0
 ; SSE-X64-NEXT:    cvttss2si %xmm0, %rcx
-; SSE-X64-NEXT:    setae %al
-; SSE-X64-NEXT:    movzbl %al, %eax
-; SSE-X64-NEXT:    shlq $63, %rax
-; SSE-X64-NEXT:    xorq %rcx, %rax
+; SSE-X64-NEXT:    movq %rcx, %rdx
+; SSE-X64-NEXT:    sarq $63, %rdx
+; SSE-X64-NEXT:    subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-X64-NEXT:    cvttss2si %xmm0, %rax
+; SSE-X64-NEXT:    andq %rdx, %rax
+; SSE-X64-NEXT:    orq %rcx, %rax
 ; SSE-X64-NEXT:    retq
 ;
 ; AVX1-X86-LABEL: fptoui_f32toi64:
@@ -598,7 +578,7 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ; AVX1-X86-NEXT:    subl $8, %esp
 ; AVX1-X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX1-X86-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-X86-NEXT:    vcomiss %xmm0, %xmm1
+; AVX1-X86-NEXT:    vucomiss %xmm0, %xmm1
 ; AVX1-X86-NEXT:    jbe .LBB9_2
 ; AVX1-X86-NEXT:  # %bb.1:
 ; AVX1-X86-NEXT:    vxorps %xmm1, %xmm1, %xmm1
@@ -620,19 +600,13 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ;
 ; AVX1-X64-LABEL: fptoui_f32toi64:
 ; AVX1-X64:       # %bb.0:
-; AVX1-X64-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-X64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX1-X64-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; AVX1-X64-NEXT:    jb .LBB9_2
-; AVX1-X64-NEXT:  # %bb.1:
-; AVX1-X64-NEXT:    vmovaps %xmm1, %xmm2
-; AVX1-X64-NEXT:  .LBB9_2:
-; AVX1-X64-NEXT:    vsubss %xmm2, %xmm0, %xmm0
 ; AVX1-X64-NEXT:    vcvttss2si %xmm0, %rcx
-; AVX1-X64-NEXT:    setae %al
-; AVX1-X64-NEXT:    movzbl %al, %eax
-; AVX1-X64-NEXT:    shlq $63, %rax
-; AVX1-X64-NEXT:    xorq %rcx, %rax
+; AVX1-X64-NEXT:    movq %rcx, %rdx
+; AVX1-X64-NEXT:    sarq $63, %rdx
+; AVX1-X64-NEXT:    vsubss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-X64-NEXT:    vcvttss2si %xmm0, %rax
+; AVX1-X64-NEXT:    andq %rdx, %rax
+; AVX1-X64-NEXT:    orq %rcx, %rax
 ; AVX1-X64-NEXT:    retq
 ;
 ; AVX512-X86-LABEL: fptoui_f32toi64:
@@ -647,7 +621,7 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ; AVX512-X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512-X86-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512-X86-NEXT:    xorl %edx, %edx
-; AVX512-X86-NEXT:    vcomiss %xmm0, %xmm1
+; AVX512-X86-NEXT:    vucomiss %xmm0, %xmm1
 ; AVX512-X86-NEXT:    setbe %dl
 ; AVX512-X86-NEXT:    kmovw %edx, %k1
 ; AVX512-X86-NEXT:    vmovss %xmm1, %xmm1, %xmm1 {%k1} {z}
@@ -680,7 +654,7 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ; X87-NEXT:    subl $16, %esp
 ; X87-NEXT:    flds 8(%ebp)
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
-; X87-NEXT:    fcom %st(1)
+; X87-NEXT:    fucom %st(1)
 ; X87-NEXT:    wait
 ; X87-NEXT:    fnstsw %ax
 ; X87-NEXT:    xorl %edx, %edx
@@ -695,7 +669,6 @@ define i64 @fptoui_f32toi64(float %x) #0 {
 ; X87-NEXT:  .LBB9_2:
 ; X87-NEXT:    fstp %st(0)
 ; X87-NEXT:    fsubrp %st, %st(1)
-; X87-NEXT:    wait
 ; X87-NEXT:    fnstcw {{[0-9]+}}(%esp)
 ; X87-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
 ; X87-NEXT:    orl $3072, %ecx # imm = 0xC00
@@ -1087,19 +1060,13 @@ define i32 @fptoui_f64toi32(double %x) #0 {
 ; SSE-X86-LABEL: fptoui_f64toi32:
 ; SSE-X86:       # %bb.0:
 ; SSE-X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; SSE-X86-NEXT:    movsd {{.*#+}} xmm2 = [2.147483648E+9,0.0E+0]
-; SSE-X86-NEXT:    comisd %xmm0, %xmm2
-; SSE-X86-NEXT:    xorpd %xmm1, %xmm1
-; SSE-X86-NEXT:    ja .LBB17_2
-; SSE-X86-NEXT:  # %bb.1:
-; SSE-X86-NEXT:    movapd %xmm2, %xmm1
-; SSE-X86-NEXT:  .LBB17_2:
-; SSE-X86-NEXT:    setbe %al
-; SSE-X86-NEXT:    movzbl %al, %ecx
-; SSE-X86-NEXT:    shll $31, %ecx
-; SSE-X86-NEXT:    subsd %xmm1, %xmm0
+; SSE-X86-NEXT:    cvttsd2si %xmm0, %ecx
+; SSE-X86-NEXT:    movl %ecx, %edx
+; SSE-X86-NEXT:    sarl $31, %edx
+; SSE-X86-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
 ; SSE-X86-NEXT:    cvttsd2si %xmm0, %eax
-; SSE-X86-NEXT:    xorl %ecx, %eax
+; SSE-X86-NEXT:    andl %edx, %eax
+; SSE-X86-NEXT:    orl %ecx, %eax
 ; SSE-X86-NEXT:    retl
 ;
 ; SSE-X64-LABEL: fptoui_f64toi32:
@@ -1110,22 +1077,14 @@ define i32 @fptoui_f64toi32(double %x) #0 {
 ;
 ; AVX1-X86-LABEL: fptoui_f64toi32:
 ; AVX1-X86:       # %bb.0:
-; AVX1-X86-NEXT:    pushl %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa_offset 8
-; AVX1-X86-NEXT:    .cfi_offset %ebp, -8
-; AVX1-X86-NEXT:    movl %esp, %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa_register %ebp
-; AVX1-X86-NEXT:    andl $-8, %esp
-; AVX1-X86-NEXT:    subl $8, %esp
 ; AVX1-X86-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX1-X86-NEXT:    vmovsd %xmm0, (%esp)
-; AVX1-X86-NEXT:    fldl (%esp)
-; AVX1-X86-NEXT:    fisttpll (%esp)
-; AVX1-X86-NEXT:    wait
-; AVX1-X86-NEXT:    movl (%esp), %eax
-; AVX1-X86-NEXT:    movl %ebp, %esp
-; AVX1-X86-NEXT:    popl %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa %esp, 4
+; AVX1-X86-NEXT:    vcvttsd2si %xmm0, %ecx
+; AVX1-X86-NEXT:    movl %ecx, %edx
+; AVX1-X86-NEXT:    sarl $31, %edx
+; AVX1-X86-NEXT:    vsubsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vcvttsd2si %xmm0, %eax
+; AVX1-X86-NEXT:    andl %edx, %eax
+; AVX1-X86-NEXT:    orl %ecx, %eax
 ; AVX1-X86-NEXT:    retl
 ;
 ; AVX1-X64-LABEL: fptoui_f64toi32:
@@ -1184,7 +1143,7 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ; SSE-X86-NEXT:    subl $16, %esp
 ; SSE-X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
 ; SSE-X86-NEXT:    movsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; SSE-X86-NEXT:    comisd %xmm0, %xmm1
+; SSE-X86-NEXT:    ucomisd %xmm0, %xmm1
 ; SSE-X86-NEXT:    jbe .LBB18_2
 ; SSE-X86-NEXT:  # %bb.1:
 ; SSE-X86-NEXT:    xorpd %xmm1, %xmm1
@@ -1212,19 +1171,13 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ;
 ; SSE-X64-LABEL: fptoui_f64toi64:
 ; SSE-X64:       # %bb.0:
-; SSE-X64-NEXT:    movsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
-; SSE-X64-NEXT:    comisd %xmm2, %xmm0
-; SSE-X64-NEXT:    xorpd %xmm1, %xmm1
-; SSE-X64-NEXT:    jb .LBB18_2
-; SSE-X64-NEXT:  # %bb.1:
-; SSE-X64-NEXT:    movapd %xmm2, %xmm1
-; SSE-X64-NEXT:  .LBB18_2:
-; SSE-X64-NEXT:    subsd %xmm1, %xmm0
 ; SSE-X64-NEXT:    cvttsd2si %xmm0, %rcx
-; SSE-X64-NEXT:    setae %al
-; SSE-X64-NEXT:    movzbl %al, %eax
-; SSE-X64-NEXT:    shlq $63, %rax
-; SSE-X64-NEXT:    xorq %rcx, %rax
+; SSE-X64-NEXT:    movq %rcx, %rdx
+; SSE-X64-NEXT:    sarq $63, %rdx
+; SSE-X64-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-X64-NEXT:    cvttsd2si %xmm0, %rax
+; SSE-X64-NEXT:    andq %rdx, %rax
+; SSE-X64-NEXT:    orq %rcx, %rax
 ; SSE-X64-NEXT:    retq
 ;
 ; AVX1-X86-LABEL: fptoui_f64toi64:
@@ -1238,7 +1191,7 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ; AVX1-X86-NEXT:    subl $8, %esp
 ; AVX1-X86-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
 ; AVX1-X86-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-X86-NEXT:    vcomisd %xmm0, %xmm1
+; AVX1-X86-NEXT:    vucomisd %xmm0, %xmm1
 ; AVX1-X86-NEXT:    jbe .LBB18_2
 ; AVX1-X86-NEXT:  # %bb.1:
 ; AVX1-X86-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
@@ -1260,19 +1213,13 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ;
 ; AVX1-X64-LABEL: fptoui_f64toi64:
 ; AVX1-X64:       # %bb.0:
-; AVX1-X64-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-X64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX1-X64-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
-; AVX1-X64-NEXT:    jb .LBB18_2
-; AVX1-X64-NEXT:  # %bb.1:
-; AVX1-X64-NEXT:    vmovapd %xmm1, %xmm2
-; AVX1-X64-NEXT:  .LBB18_2:
-; AVX1-X64-NEXT:    vsubsd %xmm2, %xmm0, %xmm0
 ; AVX1-X64-NEXT:    vcvttsd2si %xmm0, %rcx
-; AVX1-X64-NEXT:    setae %al
-; AVX1-X64-NEXT:    movzbl %al, %eax
-; AVX1-X64-NEXT:    shlq $63, %rax
-; AVX1-X64-NEXT:    xorq %rcx, %rax
+; AVX1-X64-NEXT:    movq %rcx, %rdx
+; AVX1-X64-NEXT:    sarq $63, %rdx
+; AVX1-X64-NEXT:    vsubsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-X64-NEXT:    vcvttsd2si %xmm0, %rax
+; AVX1-X64-NEXT:    andq %rdx, %rax
+; AVX1-X64-NEXT:    orq %rcx, %rax
 ; AVX1-X64-NEXT:    retq
 ;
 ; AVX512-X86-LABEL: fptoui_f64toi64:
@@ -1287,7 +1234,7 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ; AVX512-X86-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
 ; AVX512-X86-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
 ; AVX512-X86-NEXT:    xorl %edx, %edx
-; AVX512-X86-NEXT:    vcomisd %xmm0, %xmm1
+; AVX512-X86-NEXT:    vucomisd %xmm0, %xmm1
 ; AVX512-X86-NEXT:    setbe %dl
 ; AVX512-X86-NEXT:    kmovw %edx, %k1
 ; AVX512-X86-NEXT:    vmovsd %xmm1, %xmm1, %xmm1 {%k1} {z}
@@ -1320,7 +1267,7 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ; X87-NEXT:    subl $16, %esp
 ; X87-NEXT:    fldl 8(%ebp)
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
-; X87-NEXT:    fcom %st(1)
+; X87-NEXT:    fucom %st(1)
 ; X87-NEXT:    wait
 ; X87-NEXT:    fnstsw %ax
 ; X87-NEXT:    xorl %edx, %edx
@@ -1335,7 +1282,6 @@ define i64 @fptoui_f64toi64(double %x) #0 {
 ; X87-NEXT:  .LBB18_2:
 ; X87-NEXT:    fstp %st(0)
 ; X87-NEXT:    fsubrp %st, %st(1)
-; X87-NEXT:    wait
 ; X87-NEXT:    fnstcw {{[0-9]+}}(%esp)
 ; X87-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
 ; X87-NEXT:    orl $3072, %ecx # imm = 0xC00
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll
index 6312a26db9bf4..4551c1757f1f3 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll
@@ -34,8 +34,6 @@ define half @sitofp_i1tof16(i1 %x) #0 {
 ; AVX-NEXT:    negb %dil
 ; AVX-NEXT:    movsbl %dil, %eax
 ; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -75,8 +73,6 @@ define half @sitofp_i8tof16(i8 %x) #0 {
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    movsbl %dil, %eax
 ; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -111,8 +107,6 @@ define half @sitofp_i16tof16(i16 %x) #0 {
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    movswl %di, %eax
 ; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -145,8 +139,6 @@ define half @sitofp_i32tof16(i32 %x) #0 {
 ; AVX-LABEL: sitofp_i32tof16:
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    vcvtsi2ss %edi, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -177,8 +169,6 @@ define half @sitofp_i64tof16(i64 %x) #0 {
 ; AVX-LABEL: sitofp_i64tof16:
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -212,8 +202,6 @@ define half @uitofp_i1tof16(i1 %x) #0 {
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    andl $1, %edi
 ; AVX-NEXT:    vcvtsi2ss %edi, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -250,8 +238,6 @@ define half @uitofp_i8tof16(i8 %x) #0 {
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    movzbl %dil, %eax
 ; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -286,8 +272,6 @@ define half @uitofp_i16tof16(i16 %x) #0 {
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    movzwl %di, %eax
 ; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -322,16 +306,12 @@ define half @uitofp_i32tof16(i32 %x) #0 {
 ; F16C:       # %bb.0:
 ; F16C-NEXT:    movl %edi, %eax
 ; F16C-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; F16C-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; F16C-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: uitofp_i32tof16:
 ; AVX512:       # %bb.0:
 ; AVX512-NEXT:    vcvtusi2ss %edi, %xmm15, %xmm0
-; AVX512-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX512-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX512-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX512-NEXT:    retq
 ;
@@ -358,11 +338,10 @@ define half @uitofp_i64tof16(i64 %x) #0 {
 ; SSE2-NEXT:    movl %edi, %ecx
 ; SSE2-NEXT:    andl $1, %ecx
 ; SSE2-NEXT:    orq %rax, %rcx
+; SSE2-NEXT:    cvtsi2ss %rcx, %xmm0
+; SSE2-NEXT:    addss %xmm0, %xmm0
+; SSE2-NEXT:    cvtsi2ss %rdi, %xmm1
 ; SSE2-NEXT:    testq %rdi, %rdi
-; SSE2-NEXT:    cmovnsq %rdi, %rcx
-; SSE2-NEXT:    cvtsi2ss %rcx, %xmm1
-; SSE2-NEXT:    movaps %xmm1, %xmm0
-; SSE2-NEXT:    addss %xmm1, %xmm0
 ; SSE2-NEXT:    js .LBB9_2
 ; SSE2-NEXT:  # %bb.1:
 ; SSE2-NEXT:    movaps %xmm1, %xmm0
@@ -374,28 +353,25 @@ define half @uitofp_i64tof16(i64 %x) #0 {
 ;
 ; F16C-LABEL: uitofp_i64tof16:
 ; F16C:       # %bb.0:
+; F16C-NEXT:    testq %rdi, %rdi
+; F16C-NEXT:    js .LBB9_1
+; F16C-NEXT:  # %bb.2:
+; F16C-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
+; F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
+; F16C-NEXT:    retq
+; F16C-NEXT:  .LBB9_1:
 ; F16C-NEXT:    movq %rdi, %rax
 ; F16C-NEXT:    shrq %rax
-; F16C-NEXT:    movl %edi, %ecx
-; F16C-NEXT:    andl $1, %ecx
-; F16C-NEXT:    orq %rax, %rcx
-; F16C-NEXT:    testq %rdi, %rdi
-; F16C-NEXT:    cmovnsq %rdi, %rcx
-; F16C-NEXT:    vcvtsi2ss %rcx, %xmm15, %xmm0
-; F16C-NEXT:    jns .LBB9_2
-; F16C-NEXT:  # %bb.1:
+; F16C-NEXT:    andl $1, %edi
+; F16C-NEXT:    orq %rax, %rdi
+; F16C-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
 ; F16C-NEXT:    vaddss %xmm0, %xmm0, %xmm0
-; F16C-NEXT:  .LBB9_2:
-; F16C-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; F16C-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: uitofp_i64tof16:
 ; AVX512:       # %bb.0:
 ; AVX512-NEXT:    vcvtusi2ss %rdi, %xmm15, %xmm0
-; AVX512-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX512-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX512-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX512-NEXT:    retq
 ;
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll
index f0aa3827ce937..ad644fedda7b6 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll
@@ -485,23 +485,17 @@ define float @uitofp_i16tof32(i16 %x) #0 {
 define float @uitofp_i32tof32(i32 %x) #0 {
 ; SSE-X86-LABEL: uitofp_i32tof32:
 ; SSE-X86:       # %bb.0:
-; SSE-X86-NEXT:    pushl %ebp
+; SSE-X86-NEXT:    pushl %eax
 ; SSE-X86-NEXT:    .cfi_def_cfa_offset 8
-; SSE-X86-NEXT:    .cfi_offset %ebp, -8
-; SSE-X86-NEXT:    movl %esp, %ebp
-; SSE-X86-NEXT:    .cfi_def_cfa_register %ebp
-; SSE-X86-NEXT:    andl $-8, %esp
-; SSE-X86-NEXT:    subl $16, %esp
-; SSE-X86-NEXT:    movl 8(%ebp), %eax
-; SSE-X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    movl $0, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    fildll {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    flds {{[0-9]+}}(%esp)
+; SSE-X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; SSE-X86-NEXT:    orpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-X86-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-X86-NEXT:    cvtsd2ss %xmm0, %xmm0
+; SSE-X86-NEXT:    movss %xmm0, (%esp)
+; SSE-X86-NEXT:    flds (%esp)
 ; SSE-X86-NEXT:    wait
-; SSE-X86-NEXT:    movl %ebp, %esp
-; SSE-X86-NEXT:    popl %ebp
-; SSE-X86-NEXT:    .cfi_def_cfa %esp, 4
+; SSE-X86-NEXT:    popl %eax
+; SSE-X86-NEXT:    .cfi_def_cfa_offset 4
 ; SSE-X86-NEXT:    retl
 ;
 ; SSE-X64-LABEL: uitofp_i32tof32:
@@ -512,23 +506,17 @@ define float @uitofp_i32tof32(i32 %x) #0 {
 ;
 ; AVX1-X86-LABEL: uitofp_i32tof32:
 ; AVX1-X86:       # %bb.0:
-; AVX1-X86-NEXT:    pushl %ebp
+; AVX1-X86-NEXT:    pushl %eax
 ; AVX1-X86-NEXT:    .cfi_def_cfa_offset 8
-; AVX1-X86-NEXT:    .cfi_offset %ebp, -8
-; AVX1-X86-NEXT:    movl %esp, %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa_register %ebp
-; AVX1-X86-NEXT:    andl $-8, %esp
-; AVX1-X86-NEXT:    subl $16, %esp
-; AVX1-X86-NEXT:    movl 8(%ebp), %eax
-; AVX1-X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
-; AVX1-X86-NEXT:    movl $0, {{[0-9]+}}(%esp)
-; AVX1-X86-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX1-X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; AVX1-X86-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX1-X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; AVX1-X86-NEXT:    vorpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vsubsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vcvtsd2ss %xmm0, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vmovss %xmm0, (%esp)
+; AVX1-X86-NEXT:    flds (%esp)
 ; AVX1-X86-NEXT:    wait
-; AVX1-X86-NEXT:    movl %ebp, %esp
-; AVX1-X86-NEXT:    popl %ebp
-; AVX1-X86-NEXT:    .cfi_def_cfa %esp, 4
+; AVX1-X86-NEXT:    popl %eax
+; AVX1-X86-NEXT:    .cfi_def_cfa_offset 4
 ; AVX1-X86-NEXT:    retl
 ;
 ; AVX1-X64-LABEL: uitofp_i32tof32:
@@ -607,18 +595,18 @@ define float @uitofp_i64tof32(i64 %x) #0 {
 ;
 ; SSE-X64-LABEL: uitofp_i64tof32:
 ; SSE-X64:       # %bb.0:
+; SSE-X64-NEXT:    testq %rdi, %rdi
+; SSE-X64-NEXT:    js .LBB9_1
+; SSE-X64-NEXT:  # %bb.2:
+; SSE-X64-NEXT:    cvtsi2ss %rdi, %xmm0
+; SSE-X64-NEXT:    retq
+; SSE-X64-NEXT:  .LBB9_1:
 ; SSE-X64-NEXT:    movq %rdi, %rax
 ; SSE-X64-NEXT:    shrq %rax
-; SSE-X64-NEXT:    movl %edi, %ecx
-; SSE-X64-NEXT:    andl $1, %ecx
-; SSE-X64-NEXT:    orq %rax, %rcx
-; SSE-X64-NEXT:    testq %rdi, %rdi
-; SSE-X64-NEXT:    cmovnsq %rdi, %rcx
-; SSE-X64-NEXT:    cvtsi2ss %rcx, %xmm0
-; SSE-X64-NEXT:    jns .LBB9_2
-; SSE-X64-NEXT:  # %bb.1:
+; SSE-X64-NEXT:    andl $1, %edi
+; SSE-X64-NEXT:    orq %rax, %rdi
+; SSE-X64-NEXT:    cvtsi2ss %rdi, %xmm0
 ; SSE-X64-NEXT:    addss %xmm0, %xmm0
-; SSE-X64-NEXT:  .LBB9_2:
 ; SSE-X64-NEXT:    retq
 ;
 ; AVX-X86-LABEL: uitofp_i64tof32:
@@ -649,18 +637,18 @@ define float @uitofp_i64tof32(i64 %x) #0 {
 ;
 ; AVX1-X64-LABEL: uitofp_i64tof32:
 ; AVX1-X64:       # %bb.0:
+; AVX1-X64-NEXT:    testq %rdi, %rdi
+; AVX1-X64-NEXT:    js .LBB9_1
+; AVX1-X64-NEXT:  # %bb.2:
+; AVX1-X64-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
+; AVX1-X64-NEXT:    retq
+; AVX1-X64-NEXT:  .LBB9_1:
 ; AVX1-X64-NEXT:    movq %rdi, %rax
 ; AVX1-X64-NEXT:    shrq %rax
-; AVX1-X64-NEXT:    movl %edi, %ecx
-; AVX1-X64-NEXT:    andl $1, %ecx
-; AVX1-X64-NEXT:    orq %rax, %rcx
-; AVX1-X64-NEXT:    testq %rdi, %rdi
-; AVX1-X64-NEXT:    cmovnsq %rdi, %rcx
-; AVX1-X64-NEXT:    vcvtsi2ss %rcx, %xmm15, %xmm0
-; AVX1-X64-NEXT:    jns .LBB9_2
-; AVX1-X64-NEXT:  # %bb.1:
+; AVX1-X64-NEXT:    andl $1, %edi
+; AVX1-X64-NEXT:    orq %rax, %rdi
+; AVX1-X64-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
 ; AVX1-X64-NEXT:    vaddss %xmm0, %xmm0, %xmm0
-; AVX1-X64-NEXT:  .LBB9_2:
 ; AVX1-X64-NEXT:    retq
 ;
 ; AVX512-X64-LABEL: uitofp_i64tof32:
@@ -1174,13 +1162,12 @@ define double @uitofp_i32tof64(i32 %x) #0 {
 ; SSE-X86-NEXT:    movl %esp, %ebp
 ; SSE-X86-NEXT:    .cfi_def_cfa_register %ebp
 ; SSE-X86-NEXT:    andl $-8, %esp
-; SSE-X86-NEXT:    subl $16, %esp
-; SSE-X86-NEXT:    movl 8(%ebp), %eax
-; SSE-X86-NEXT:    movl %eax, (%esp)
-; SSE-X86-NEXT:    movl $0, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    fildll (%esp)
-; SSE-X86-NEXT:    fstpl {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    fldl {{[0-9]+}}(%esp)
+; SSE-X86-NEXT:    subl $8, %esp
+; SSE-X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; SSE-X86-NEXT:    orpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-X86-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-X86-NEXT:    movsd %xmm0, (%esp)
+; SSE-X86-NEXT:    fldl (%esp)
 ; SSE-X86-NEXT:    wait
 ; SSE-X86-NEXT:    movl %ebp, %esp
 ; SSE-X86-NEXT:    popl %ebp
@@ -1201,13 +1188,12 @@ define double @uitofp_i32tof64(i32 %x) #0 {
 ; AVX1-X86-NEXT:    movl %esp, %ebp
 ; AVX1-X86-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX1-X86-NEXT:    andl $-8, %esp
-; AVX1-X86-NEXT:    subl $16, %esp
-; AVX1-X86-NEXT:    movl 8(%ebp), %eax
-; AVX1-X86-NEXT:    movl %eax, (%esp)
-; AVX1-X86-NEXT:    movl $0, {{[0-9]+}}(%esp)
-; AVX1-X86-NEXT:    fildll (%esp)
-; AVX1-X86-NEXT:    fstpl {{[0-9]+}}(%esp)
-; AVX1-X86-NEXT:    fldl {{[0-9]+}}(%esp)
+; AVX1-X86-NEXT:    subl $8, %esp
+; AVX1-X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; AVX1-X86-NEXT:    vorpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vsubsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-X86-NEXT:    vmovsd %xmm0, (%esp)
+; AVX1-X86-NEXT:    fldl (%esp)
 ; AVX1-X86-NEXT:    wait
 ; AVX1-X86-NEXT:    movl %ebp, %esp
 ; AVX1-X86-NEXT:    popl %ebp
@@ -1276,17 +1262,14 @@ define double @uitofp_i64tof64(i64 %x) #0 {
 ; SSE-X86-NEXT:    movl %esp, %ebp
 ; SSE-X86-NEXT:    .cfi_def_cfa_register %ebp
 ; SSE-X86-NEXT:    andl $-8, %esp
-; SSE-X86-NEXT:    subl $24, %esp
-; SSE-X86-NEXT:    movl 12(%ebp), %eax
-; SSE-X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; SSE-X86-NEXT:    movlps %xmm0, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    shrl $31, %eax
-; SSE-X86-NEXT:    fildll {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; SSE-X86-NEXT:    fstpl {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    wait
+; SSE-X86-NEXT:    subl $8, %esp
 ; SSE-X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; SSE-X86-NEXT:    movsd %xmm0, (%esp)
+; SSE-X86-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; SSE-X86-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-X86-NEXT:    movapd %xmm0, %xmm1
+; SSE-X86-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
+; SSE-X86-NEXT:    addsd %xmm0, %xmm1
+; SSE-X86-NEXT:    movsd %xmm1, (%esp)
 ; SSE-X86-NEXT:    fldl (%esp)
 ; SSE-X86-NEXT:    wait
 ; SSE-X86-NEXT:    movl %ebp, %esp
@@ -1296,18 +1279,12 @@ define double @uitofp_i64tof64(i64 %x) #0 {
 ;
 ; SSE-X64-LABEL: uitofp_i64tof64:
 ; SSE-X64:       # %bb.0:
-; SSE-X64-NEXT:    movq %rdi, %rax
-; SSE-X64-NEXT:    shrq %rax
-; SSE-X64-NEXT:    movl %edi, %ecx
-; SSE-X64-NEXT:    andl $1, %ecx
-; SSE-X64-NEXT:    orq %rax, %rcx
-; SSE-X64-NEXT:    testq %rdi, %rdi
-; SSE-X64-NEXT:    cmovnsq %rdi, %rcx
-; SSE-X64-NEXT:    cvtsi2sd %rcx, %xmm0
-; SSE-X64-NEXT:    jns .LBB18_2
-; SSE-X64-NEXT:  # %bb.1:
-; SSE-X64-NEXT:    addsd %xmm0, %xmm0
-; SSE-X64-NEXT:  .LBB18_2:
+; SSE-X64-NEXT:    movq %rdi, %xmm1
+; SSE-X64-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
+; SSE-X64-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE-X64-NEXT:    movapd %xmm1, %xmm0
+; SSE-X64-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
+; SSE-X64-NEXT:    addsd %xmm1, %xmm0
 ; SSE-X64-NEXT:    retq
 ;
 ; AVX-X86-LABEL: uitofp_i64tof64:
@@ -1318,16 +1295,12 @@ define double @uitofp_i64tof64(i64 %x) #0 {
 ; AVX-X86-NEXT:    movl %esp, %ebp
 ; AVX-X86-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-X86-NEXT:    andl $-8, %esp
-; AVX-X86-NEXT:    subl $24, %esp
-; AVX-X86-NEXT:    movl 12(%ebp), %eax
-; AVX-X86-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-X86-NEXT:    vmovlps %xmm0, {{[0-9]+}}(%esp)
-; AVX-X86-NEXT:    shrl $31, %eax
-; AVX-X86-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-X86-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-X86-NEXT:    fstpl {{[0-9]+}}(%esp)
-; AVX-X86-NEXT:    wait
+; AVX-X86-NEXT:    subl $8, %esp
 ; AVX-X86-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX-X86-NEXT:    vunpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; AVX-X86-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX-X86-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
+; AVX-X86-NEXT:    vaddsd %xmm0, %xmm1, %xmm0
 ; AVX-X86-NEXT:    vmovsd %xmm0, (%esp)
 ; AVX-X86-NEXT:    fldl (%esp)
 ; AVX-X86-NEXT:    wait
@@ -1338,18 +1311,11 @@ define double @uitofp_i64tof64(i64 %x) #0 {
 ;
 ; AVX1-X64-LABEL: uitofp_i64tof64:
 ; AVX1-X64:       # %bb.0:
-; AVX1-X64-NEXT:    movq %rdi, %rax
-; AVX1-X64-NEXT:    shrq %rax
-; AVX1-X64-NEXT:    movl %edi, %ecx
-; AVX1-X64-NEXT:    andl $1, %ecx
-; AVX1-X64-NEXT:    orq %rax, %rcx
-; AVX1-X64-NEXT:    testq %rdi, %rdi
-; AVX1-X64-NEXT:    cmovnsq %rdi, %rcx
-; AVX1-X64-NEXT:    vcvtsi2sd %rcx, %xmm15, %xmm0
-; AVX1-X64-NEXT:    jns .LBB18_2
-; AVX1-X64-NEXT:  # %bb.1:
-; AVX1-X64-NEXT:    vaddsd %xmm0, %xmm0, %xmm0
-; AVX1-X64-NEXT:  .LBB18_2:
+; AVX1-X64-NEXT:    vmovq %rdi, %xmm0
+; AVX1-X64-NEXT:    vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; AVX1-X64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-X64-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
+; AVX1-X64-NEXT:    vaddsd %xmm0, %xmm1, %xmm0
 ; AVX1-X64-NEXT:    retq
 ;
 ; AVX512-X64-LABEL: uitofp_i64tof64:
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-round-fp16.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-round-fp16.ll
index 85a43394a1dc8..f07560b15d6cd 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-round-fp16.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-round-fp16.ll
@@ -25,13 +25,8 @@ define half @fceil32(half %f) #0 {
 ;
 ; AVX-LABEL: fceil32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -61,13 +56,8 @@ define half @ffloor32(half %f) #0 {
 ;
 ; AVX-LABEL: ffloor32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -97,13 +87,8 @@ define half @ftrunc32(half %f) #0 {
 ;
 ; AVX-LABEL: ftrunc32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vroundss $11, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -133,13 +118,8 @@ define half @frint32(half %f) #0 {
 ;
 ; AVX-LABEL: frint32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vroundss $4, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -170,13 +150,8 @@ define half @fnearbyint32(half %f) #0 {
 ;
 ; AVX-LABEL: fnearbyint32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vroundss $12, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -207,13 +182,8 @@ define half @froundeven16(half %f) #0 {
 ;
 ; AVX-LABEL: froundeven16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
 ; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX-NEXT:    vroundss $8, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 ;
@@ -242,41 +212,25 @@ define half @fround16(half %f) #0 {
 ; SSE2-NEXT:    popq %rax
 ; SSE2-NEXT:    retq
 ;
-; AVX-LABEL: fround16:
-; AVX:       # %bb.0:
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    vpextrw $0, %xmm0, %eax
-; AVX-NEXT:    movzwl %ax, %eax
-; AVX-NEXT:    vmovd %eax, %xmm0
-; AVX-NEXT:    vcvtph2ps %xmm0, %xmm0
-; AVX-NEXT:    callq roundf at PLT
-; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
-; AVX-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    retq
-;
 ; X86-LABEL: fround16:
 ; X86:       # %bb.0:
-; X86-NEXT:    subl $8, %esp
 ; X86-NEXT:    vmovsh {{.*#+}} xmm0 = mem[0],zero,zero,zero,zero,zero,zero,zero
-; X86-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
-; X86-NEXT:    vmovss %xmm0, (%esp)
-; X86-NEXT:    calll roundf
-; X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; X86-NEXT:    wait
-; X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
-; X86-NEXT:    addl $8, %esp
+; X86-NEXT:    vpbroadcastw {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; X86-NEXT:    vandps %xmm1, %xmm0, %xmm1
+; X86-NEXT:    vpbroadcastw {{.*#+}} xmm2 = [4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1]
+; X86-NEXT:    vorps %xmm2, %xmm1, %xmm1
+; X86-NEXT:    vaddsh %xmm1, %xmm0, %xmm0
+; X86-NEXT:    vrndscalesh $11, %xmm0, %xmm0, %xmm0
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: fround16:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    vcvtsh2ss %xmm0, %xmm0, %xmm0
-; X64-NEXT:    callq roundf at PLT
-; X64-NEXT:    vcvtss2sh %xmm0, %xmm0, %xmm0
-; X64-NEXT:    popq %rax
+; X64-NEXT:    vpbroadcastw {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; X64-NEXT:    vpand %xmm1, %xmm0, %xmm1
+; X64-NEXT:    vpbroadcastw {{.*#+}} xmm2 = [4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1,4.9976E-1]
+; X64-NEXT:    vpor %xmm2, %xmm1, %xmm1
+; X64-NEXT:    vaddsh %xmm1, %xmm0, %xmm0
+; X64-NEXT:    vrndscalesh $11, %xmm0, %xmm0, %xmm0
 ; X64-NEXT:    retq
 
   %res = call half @llvm.experimental.constrained.round.f16(
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-round.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-round.ll
index 13f890ae6e191..eca6525c95bac 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-round.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-round.ll
@@ -501,40 +501,28 @@ define float @fround32(float %f) #0 {
 ; SSE41-X86-NEXT:    pushl %eax
 ; SSE41-X86-NEXT:    .cfi_def_cfa_offset 8
 ; SSE41-X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; SSE41-X86-NEXT:    movaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; SSE41-X86-NEXT:    andps %xmm0, %xmm1
+; SSE41-X86-NEXT:    orps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
+; SSE41-X86-NEXT:    addss %xmm0, %xmm1
+; SSE41-X86-NEXT:    xorps %xmm0, %xmm0
+; SSE41-X86-NEXT:    roundss $11, %xmm1, %xmm0
 ; SSE41-X86-NEXT:    movss %xmm0, (%esp)
-; SSE41-X86-NEXT:    calll roundf
+; SSE41-X86-NEXT:    flds (%esp)
+; SSE41-X86-NEXT:    wait
 ; SSE41-X86-NEXT:    popl %eax
 ; SSE41-X86-NEXT:    .cfi_def_cfa_offset 4
 ; SSE41-X86-NEXT:    retl
 ;
 ; SSE41-X64-LABEL: fround32:
 ; SSE41-X64:       # %bb.0:
-; SSE41-X64-NEXT:    pushq %rax
-; SSE41-X64-NEXT:    .cfi_def_cfa_offset 16
-; SSE41-X64-NEXT:    callq roundf at PLT
-; SSE41-X64-NEXT:    popq %rax
-; SSE41-X64-NEXT:    .cfi_def_cfa_offset 8
+; SSE41-X64-NEXT:    movaps {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; SSE41-X64-NEXT:    andps %xmm0, %xmm1
+; SSE41-X64-NEXT:    orps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE41-X64-NEXT:    addss %xmm0, %xmm1
+; SSE41-X64-NEXT:    xorps %xmm0, %xmm0
+; SSE41-X64-NEXT:    roundss $11, %xmm1, %xmm0
 ; SSE41-X64-NEXT:    retq
-;
-; AVX-X86-LABEL: fround32:
-; AVX-X86:       # %bb.0:
-; AVX-X86-NEXT:    pushl %eax
-; AVX-X86-NEXT:    .cfi_def_cfa_offset 8
-; AVX-X86-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-X86-NEXT:    vmovss %xmm0, (%esp)
-; AVX-X86-NEXT:    calll roundf
-; AVX-X86-NEXT:    popl %eax
-; AVX-X86-NEXT:    .cfi_def_cfa_offset 4
-; AVX-X86-NEXT:    retl
-;
-; AVX-X64-LABEL: fround32:
-; AVX-X64:       # %bb.0:
-; AVX-X64-NEXT:    pushq %rax
-; AVX-X64-NEXT:    .cfi_def_cfa_offset 16
-; AVX-X64-NEXT:    callq roundf at PLT
-; AVX-X64-NEXT:    popq %rax
-; AVX-X64-NEXT:    .cfi_def_cfa_offset 8
-; AVX-X64-NEXT:    retq
   %res = call float @llvm.experimental.constrained.round.f32(
                         float %f, metadata !"fpexcept.strict") #0
   ret float %res
@@ -543,42 +531,70 @@ define float @fround32(float %f) #0 {
 define double @froundf64(double %f) #0 {
 ; SSE41-X86-LABEL: froundf64:
 ; SSE41-X86:       # %bb.0:
+; SSE41-X86-NEXT:    pushl %ebp
+; SSE41-X86-NEXT:    .cfi_def_cfa_offset 8
+; SSE41-X86-NEXT:    .cfi_offset %ebp, -8
+; SSE41-X86-NEXT:    movl %esp, %ebp
+; SSE41-X86-NEXT:    .cfi_def_cfa_register %ebp
+; SSE41-X86-NEXT:    andl $-8, %esp
 ; SSE41-X86-NEXT:    subl $8, %esp
-; SSE41-X86-NEXT:    .cfi_def_cfa_offset 12
 ; SSE41-X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; SSE41-X86-NEXT:    movapd {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0]
+; SSE41-X86-NEXT:    andpd %xmm0, %xmm1
+; SSE41-X86-NEXT:    orpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
+; SSE41-X86-NEXT:    addsd %xmm0, %xmm1
+; SSE41-X86-NEXT:    xorps %xmm0, %xmm0
+; SSE41-X86-NEXT:    roundsd $11, %xmm1, %xmm0
 ; SSE41-X86-NEXT:    movsd %xmm0, (%esp)
-; SSE41-X86-NEXT:    calll round
-; SSE41-X86-NEXT:    addl $8, %esp
-; SSE41-X86-NEXT:    .cfi_def_cfa_offset 4
+; SSE41-X86-NEXT:    fldl (%esp)
+; SSE41-X86-NEXT:    wait
+; SSE41-X86-NEXT:    movl %ebp, %esp
+; SSE41-X86-NEXT:    popl %ebp
+; SSE41-X86-NEXT:    .cfi_def_cfa %esp, 4
 ; SSE41-X86-NEXT:    retl
 ;
 ; SSE41-X64-LABEL: froundf64:
 ; SSE41-X64:       # %bb.0:
-; SSE41-X64-NEXT:    pushq %rax
-; SSE41-X64-NEXT:    .cfi_def_cfa_offset 16
-; SSE41-X64-NEXT:    callq round at PLT
-; SSE41-X64-NEXT:    popq %rax
-; SSE41-X64-NEXT:    .cfi_def_cfa_offset 8
+; SSE41-X64-NEXT:    movapd {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0]
+; SSE41-X64-NEXT:    andpd %xmm0, %xmm1
+; SSE41-X64-NEXT:    orpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE41-X64-NEXT:    addsd %xmm0, %xmm1
+; SSE41-X64-NEXT:    xorps %xmm0, %xmm0
+; SSE41-X64-NEXT:    roundsd $11, %xmm1, %xmm0
 ; SSE41-X64-NEXT:    retq
 ;
 ; AVX-X86-LABEL: froundf64:
 ; AVX-X86:       # %bb.0:
+; AVX-X86-NEXT:    pushl %ebp
+; AVX-X86-NEXT:    .cfi_def_cfa_offset 8
+; AVX-X86-NEXT:    .cfi_offset %ebp, -8
+; AVX-X86-NEXT:    movl %esp, %ebp
+; AVX-X86-NEXT:    .cfi_def_cfa_register %ebp
+; AVX-X86-NEXT:    andl $-8, %esp
 ; AVX-X86-NEXT:    subl $8, %esp
-; AVX-X86-NEXT:    .cfi_def_cfa_offset 12
 ; AVX-X86-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX-X86-NEXT:    vandpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm1
+; AVX-X86-NEXT:    vmovddup {{.*#+}} xmm2 = [4.9999999999999994E-1,4.9999999999999994E-1]
+; AVX-X86-NEXT:    # xmm2 = mem[0,0]
+; AVX-X86-NEXT:    vorpd %xmm2, %xmm1, %xmm1
+; AVX-X86-NEXT:    vaddsd %xmm1, %xmm0, %xmm0
+; AVX-X86-NEXT:    vroundsd $11, %xmm0, %xmm0, %xmm0
 ; AVX-X86-NEXT:    vmovsd %xmm0, (%esp)
-; AVX-X86-NEXT:    calll round
-; AVX-X86-NEXT:    addl $8, %esp
-; AVX-X86-NEXT:    .cfi_def_cfa_offset 4
+; AVX-X86-NEXT:    fldl (%esp)
+; AVX-X86-NEXT:    wait
+; AVX-X86-NEXT:    movl %ebp, %esp
+; AVX-X86-NEXT:    popl %ebp
+; AVX-X86-NEXT:    .cfi_def_cfa %esp, 4
 ; AVX-X86-NEXT:    retl
 ;
 ; AVX-X64-LABEL: froundf64:
 ; AVX-X64:       # %bb.0:
-; AVX-X64-NEXT:    pushq %rax
-; AVX-X64-NEXT:    .cfi_def_cfa_offset 16
-; AVX-X64-NEXT:    callq round at PLT
-; AVX-X64-NEXT:    popq %rax
-; AVX-X64-NEXT:    .cfi_def_cfa_offset 8
+; AVX-X64-NEXT:    vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; AVX-X64-NEXT:    vmovddup {{.*#+}} xmm2 = [4.9999999999999994E-1,4.9999999999999994E-1]
+; AVX-X64-NEXT:    # xmm2 = mem[0,0]
+; AVX-X64-NEXT:    vorpd %xmm2, %xmm1, %xmm1
+; AVX-X64-NEXT:    vaddsd %xmm1, %xmm0, %xmm0
+; AVX-X64-NEXT:    vroundsd $11, %xmm0, %xmm0, %xmm0
 ; AVX-X64-NEXT:    retq
   %res = call double @llvm.experimental.constrained.round.f64(
                         double %f, metadata !"fpexcept.strict") #0
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar.ll b/llvm/test/CodeGen/X86/fp-strict-scalar.ll
index f1be74f5c3ac4..321f76b26c42b 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar.ll
@@ -612,23 +612,11 @@ define void @fsqrt_f32(ptr %a) nounwind strictfp {
 define double @fma_f64(double %a, double %b, double %c) nounwind strictfp {
 ; SSE-X86-LABEL: fma_f64:
 ; SSE-X86:       # %bb.0:
-; SSE-X86-NEXT:    subl $24, %esp
-; SSE-X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; SSE-X86-NEXT:    movsd {{.*#+}} xmm1 = mem[0],zero
-; SSE-X86-NEXT:    movsd {{.*#+}} xmm2 = mem[0],zero
-; SSE-X86-NEXT:    movsd %xmm2, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    movsd %xmm1, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    movsd %xmm0, (%esp)
-; SSE-X86-NEXT:    calll fma
-; SSE-X86-NEXT:    addl $24, %esp
-; SSE-X86-NEXT:    retl
+; SSE-X86-NEXT:    jmp fma # TAILCALL
 ;
 ; SSE-X64-LABEL: fma_f64:
 ; SSE-X64:       # %bb.0:
-; SSE-X64-NEXT:    pushq %rax
-; SSE-X64-NEXT:    callq fma at PLT
-; SSE-X64-NEXT:    popq %rax
-; SSE-X64-NEXT:    retq
+; SSE-X64-NEXT:    jmp fma at PLT # TAILCALL
 ;
 ; AVX-X86-LABEL: fma_f64:
 ; AVX-X86:       # %bb.0:
@@ -673,23 +661,11 @@ define double @fma_f64(double %a, double %b, double %c) nounwind strictfp {
 define float @fma_f32(float %a, float %b, float %c) nounwind strictfp {
 ; SSE-X86-LABEL: fma_f32:
 ; SSE-X86:       # %bb.0:
-; SSE-X86-NEXT:    subl $12, %esp
-; SSE-X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; SSE-X86-NEXT:    movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; SSE-X86-NEXT:    movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; SSE-X86-NEXT:    movss %xmm2, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    movss %xmm1, {{[0-9]+}}(%esp)
-; SSE-X86-NEXT:    movss %xmm0, (%esp)
-; SSE-X86-NEXT:    calll fmaf
-; SSE-X86-NEXT:    addl $12, %esp
-; SSE-X86-NEXT:    retl
+; SSE-X86-NEXT:    jmp fmaf # TAILCALL
 ;
 ; SSE-X64-LABEL: fma_f32:
 ; SSE-X64:       # %bb.0:
-; SSE-X64-NEXT:    pushq %rax
-; SSE-X64-NEXT:    callq fmaf at PLT
-; SSE-X64-NEXT:    popq %rax
-; SSE-X64-NEXT:    retq
+; SSE-X64-NEXT:    jmp fmaf at PLT # TAILCALL
 ;
 ; AVX-X86-LABEL: fma_f32:
 ; AVX-X86:       # %bb.0:
diff --git a/llvm/test/CodeGen/X86/fp128-cast-strict.ll b/llvm/test/CodeGen/X86/fp128-cast-strict.ll
index bb5640aeb66fa..98de97b984f5e 100644
--- a/llvm/test/CodeGen/X86/fp128-cast-strict.ll
+++ b/llvm/test/CodeGen/X86/fp128-cast-strict.ll
@@ -41,10 +41,10 @@ define dso_local void @TestFPExtF16_F128() nounwind strictfp {
 ; X86-NEXT:    movzwl vf16, %eax
 ; X86-NEXT:    movl %eax, (%esp)
 ; X86-NEXT:    calll __extendhfsf2
-; X86-NEXT:    fstps {{[0-9]+}}(%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    leal {{[0-9]+}}(%esp), %eax
 ; X86-NEXT:    movl %eax, (%esp)
+; X86-NEXT:    fstps {{[0-9]+}}(%esp)
+; X86-NEXT:    wait
 ; X86-NEXT:    calll __extendsftf2
 ; X86-NEXT:    subl $4, %esp
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
@@ -499,10 +499,10 @@ define i128 @fptosi_i128(fp128 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $52, %esp
 ; X86-NEXT:    popl %esi
@@ -641,10 +641,10 @@ define i128 @fptoui_i128(fp128 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $52, %esp
 ; X86-NEXT:    popl %esi
@@ -681,10 +681,10 @@ define fp128 @sitofp_i8(i8 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -721,10 +721,10 @@ define fp128 @sitofp_i16(i16 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -759,10 +759,10 @@ define fp128 @sitofp_i32(i32 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -798,10 +798,10 @@ define fp128 @sitofp_i64(i64 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -842,10 +842,10 @@ define fp128 @sitofp_i128(i128 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $52, %esp
 ; X86-NEXT:    popl %esi
@@ -882,10 +882,10 @@ define fp128 @uitofp_i8(i8 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -922,10 +922,10 @@ define fp128 @uitofp_i16(i16 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -960,10 +960,10 @@ define fp128 @uitofp_i32(i32 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -999,10 +999,10 @@ define fp128 @uitofp_i64(i64 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $20, %esp
 ; X86-NEXT:    popl %esi
@@ -1043,10 +1043,10 @@ define fp128 @uitofp_i128(i128 %x) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, 8(%esi)
-; X86-NEXT:    movl %edx, 12(%esi)
-; X86-NEXT:    movl %eax, (%esi)
+; X86-NEXT:    movl %edi, 12(%esi)
+; X86-NEXT:    movl %edx, 8(%esi)
 ; X86-NEXT:    movl %ecx, 4(%esi)
+; X86-NEXT:    movl %eax, (%esi)
 ; X86-NEXT:    movl %esi, %eax
 ; X86-NEXT:    addl $52, %esp
 ; X86-NEXT:    popl %esi
diff --git a/llvm/test/CodeGen/X86/fp128-libcalls-strict.ll b/llvm/test/CodeGen/X86/fp128-libcalls-strict.ll
index ad2d690fd7ed0..4dfc7d025a3ac 100644
--- a/llvm/test/CodeGen/X86/fp128-libcalls-strict.ll
+++ b/llvm/test/CodeGen/X86/fp128-libcalls-strict.ll
@@ -27,17 +27,11 @@ define fp128 @add(fp128 %x, fp128 %y) nounwind strictfp {
 ;
 ; ANDROID-LABEL: add:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq __addtf3 at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp __addtf3 at PLT # TAILCALL
 ;
 ; GNU-LABEL: add:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq __addtf3 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp __addtf3 at PLT # TAILCALL
 ;
 ; X86-LABEL: add:
 ; X86:       # %bb.0: # %entry
@@ -123,10 +117,10 @@ define fp128 @add(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -149,17 +143,11 @@ define fp128 @sub(fp128 %x, fp128 %y) nounwind strictfp {
 ;
 ; ANDROID-LABEL: sub:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq __subtf3 at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp __subtf3 at PLT # TAILCALL
 ;
 ; GNU-LABEL: sub:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq __subtf3 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp __subtf3 at PLT # TAILCALL
 ;
 ; X86-LABEL: sub:
 ; X86:       # %bb.0: # %entry
@@ -245,10 +233,10 @@ define fp128 @sub(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -271,17 +259,11 @@ define fp128 @mul(fp128 %x, fp128 %y) nounwind strictfp {
 ;
 ; ANDROID-LABEL: mul:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq __multf3 at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp __multf3 at PLT # TAILCALL
 ;
 ; GNU-LABEL: mul:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq __multf3 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp __multf3 at PLT # TAILCALL
 ;
 ; X86-LABEL: mul:
 ; X86:       # %bb.0: # %entry
@@ -367,10 +349,10 @@ define fp128 @mul(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -393,17 +375,11 @@ define fp128 @div(fp128 %x, fp128 %y) nounwind strictfp {
 ;
 ; ANDROID-LABEL: div:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq __divtf3 at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp __divtf3 at PLT # TAILCALL
 ;
 ; GNU-LABEL: div:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq __divtf3 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp __divtf3 at PLT # TAILCALL
 ;
 ; X86-LABEL: div:
 ; X86:       # %bb.0: # %entry
@@ -489,10 +465,10 @@ define fp128 @div(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -508,17 +484,11 @@ entry:
 define fp128 @fma(fp128 %x, fp128 %y, fp128 %z) nounwind strictfp {
 ; ANDROID-LABEL: fma:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq fmal at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp fmal at PLT # TAILCALL
 ;
 ; GNU-LABEL: fma:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq fmaf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp fmaf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: fma:
 ; X86:       # %bb.0: # %entry
@@ -623,10 +593,10 @@ define fp128 @fma(fp128 %x, fp128 %y, fp128 %z) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -642,17 +612,11 @@ entry:
 define fp128 @frem(fp128 %x, fp128 %y) nounwind strictfp {
 ; ANDROID-LABEL: frem:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq fmodl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp fmodl at PLT # TAILCALL
 ;
 ; GNU-LABEL: frem:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq fmodf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp fmodf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: frem:
 ; X86:       # %bb.0: # %entry
@@ -738,10 +702,10 @@ define fp128 @frem(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -757,17 +721,11 @@ entry:
 define fp128 @ceil(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: ceil:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq ceill at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp ceill at PLT # TAILCALL
 ;
 ; GNU-LABEL: ceil:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq ceilf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp ceilf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: ceil:
 ; X86:       # %bb.0: # %entry
@@ -829,10 +787,10 @@ define fp128 @ceil(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -847,17 +805,11 @@ entry:
 define fp128 @acos(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: acos:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq acosl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp acosl at PLT # TAILCALL
 ;
 ; GNU-LABEL: acos:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq acosf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp acosf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: acos:
 ; X86:       # %bb.0: # %entry
@@ -919,10 +871,10 @@ define fp128 @acos(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -937,17 +889,11 @@ entry:
 define fp128 @cos(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: cos:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq cosl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp cosl at PLT # TAILCALL
 ;
 ; GNU-LABEL: cos:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq cosf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp cosf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: cos:
 ; X86:       # %bb.0: # %entry
@@ -1009,10 +955,10 @@ define fp128 @cos(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1027,17 +973,11 @@ entry:
 define fp128 @cosh(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: cosh:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq coshl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp coshl at PLT # TAILCALL
 ;
 ; GNU-LABEL: cosh:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq coshf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp coshf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: cosh:
 ; X86:       # %bb.0: # %entry
@@ -1099,10 +1039,10 @@ define fp128 @cosh(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1117,17 +1057,11 @@ entry:
 define fp128 @exp(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: exp:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq expl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp expl at PLT # TAILCALL
 ;
 ; GNU-LABEL: exp:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq expf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp expf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: exp:
 ; X86:       # %bb.0: # %entry
@@ -1189,10 +1123,10 @@ define fp128 @exp(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1207,17 +1141,11 @@ entry:
 define fp128 @exp2(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: exp2:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq exp2l at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp exp2l at PLT # TAILCALL
 ;
 ; GNU-LABEL: exp2:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq exp2f128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp exp2f128 at PLT # TAILCALL
 ;
 ; X86-LABEL: exp2:
 ; X86:       # %bb.0: # %entry
@@ -1279,10 +1207,10 @@ define fp128 @exp2(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1297,17 +1225,11 @@ entry:
 define fp128 @floor(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: floor:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq floorl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp floorl at PLT # TAILCALL
 ;
 ; GNU-LABEL: floor:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq floorf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp floorf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: floor:
 ; X86:       # %bb.0: # %entry
@@ -1369,10 +1291,10 @@ define fp128 @floor(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1387,17 +1309,11 @@ entry:
 define fp128 @log(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: log:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq logl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp logl at PLT # TAILCALL
 ;
 ; GNU-LABEL: log:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq logf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp logf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: log:
 ; X86:       # %bb.0: # %entry
@@ -1459,10 +1375,10 @@ define fp128 @log(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1477,17 +1393,11 @@ entry:
 define fp128 @log10(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: log10:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq log10l at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp log10l at PLT # TAILCALL
 ;
 ; GNU-LABEL: log10:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq log10f128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp log10f128 at PLT # TAILCALL
 ;
 ; X86-LABEL: log10:
 ; X86:       # %bb.0: # %entry
@@ -1549,10 +1459,10 @@ define fp128 @log10(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1567,17 +1477,11 @@ entry:
 define fp128 @log2(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: log2:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq log2l at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp log2l at PLT # TAILCALL
 ;
 ; GNU-LABEL: log2:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq log2f128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp log2f128 at PLT # TAILCALL
 ;
 ; X86-LABEL: log2:
 ; X86:       # %bb.0: # %entry
@@ -1639,10 +1543,10 @@ define fp128 @log2(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1657,17 +1561,11 @@ entry:
 define fp128 @maxnum(fp128 %x, fp128 %y) nounwind strictfp {
 ; ANDROID-LABEL: maxnum:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq fmaxl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp fmaxl at PLT # TAILCALL
 ;
 ; GNU-LABEL: maxnum:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq fmaxf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp fmaxf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: maxnum:
 ; X86:       # %bb.0: # %entry
@@ -1753,10 +1651,10 @@ define fp128 @maxnum(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1772,17 +1670,11 @@ entry:
 define fp128 @minnum(fp128 %x, fp128 %y) nounwind strictfp {
 ; ANDROID-LABEL: minnum:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq fminl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp fminl at PLT # TAILCALL
 ;
 ; GNU-LABEL: minnum:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq fminf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp fminf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: minnum:
 ; X86:       # %bb.0: # %entry
@@ -1868,10 +1760,10 @@ define fp128 @minnum(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1887,17 +1779,11 @@ entry:
 define fp128 @nearbyint(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: nearbyint:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq nearbyintl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp nearbyintl at PLT # TAILCALL
 ;
 ; GNU-LABEL: nearbyint:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq nearbyintf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp nearbyintf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: nearbyint:
 ; X86:       # %bb.0: # %entry
@@ -1959,10 +1845,10 @@ define fp128 @nearbyint(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -1977,17 +1863,11 @@ entry:
 define fp128 @pow(fp128 %x, fp128 %y) nounwind strictfp {
 ; ANDROID-LABEL: pow:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq powl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp powl at PLT # TAILCALL
 ;
 ; GNU-LABEL: pow:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq powf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp powf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: pow:
 ; X86:       # %bb.0: # %entry
@@ -2073,10 +1953,10 @@ define fp128 @pow(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2099,17 +1979,11 @@ define fp128 @powi(fp128 %x, i32 %y) nounwind strictfp {
 ;
 ; ANDROID-LABEL: powi:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq __powitf2 at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp __powitf2 at PLT # TAILCALL
 ;
 ; GNU-LABEL: powi:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq __powitf2 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp __powitf2 at PLT # TAILCALL
 ;
 ; X86-LABEL: powi:
 ; X86:       # %bb.0: # %entry
@@ -2178,10 +2052,10 @@ define fp128 @powi(fp128 %x, i32 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2197,17 +2071,11 @@ entry:
 define fp128 @rint(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: rint:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq rintl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp rintl at PLT # TAILCALL
 ;
 ; GNU-LABEL: rint:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq rintf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp rintf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: rint:
 ; X86:       # %bb.0: # %entry
@@ -2269,10 +2137,10 @@ define fp128 @rint(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2287,17 +2155,11 @@ entry:
 define fp128 @round(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: round:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq roundl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp roundl at PLT # TAILCALL
 ;
 ; GNU-LABEL: round:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq roundf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp roundf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: round:
 ; X86:       # %bb.0: # %entry
@@ -2359,10 +2221,10 @@ define fp128 @round(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2377,17 +2239,11 @@ entry:
 define fp128 @roundeven(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: roundeven:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq roundevenl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp roundevenl at PLT # TAILCALL
 ;
 ; GNU-LABEL: roundeven:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq roundevenf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp roundevenf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: roundeven:
 ; X86:       # %bb.0: # %entry
@@ -2449,10 +2305,10 @@ define fp128 @roundeven(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2467,17 +2323,11 @@ entry:
 define fp128 @asin(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: asin:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq asinl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp asinl at PLT # TAILCALL
 ;
 ; GNU-LABEL: asin:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq asinf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp asinf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: asin:
 ; X86:       # %bb.0: # %entry
@@ -2539,10 +2389,10 @@ define fp128 @asin(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2557,17 +2407,11 @@ entry:
 define fp128 @sin(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: sin:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq sinl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp sinl at PLT # TAILCALL
 ;
 ; GNU-LABEL: sin:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq sinf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp sinf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: sin:
 ; X86:       # %bb.0: # %entry
@@ -2629,10 +2473,10 @@ define fp128 @sin(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2647,17 +2491,11 @@ entry:
 define fp128 @sinh(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: sinh:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq sinhl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp sinhl at PLT # TAILCALL
 ;
 ; GNU-LABEL: sinh:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq sinhf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp sinhf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: sinh:
 ; X86:       # %bb.0: # %entry
@@ -2719,10 +2557,10 @@ define fp128 @sinh(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2737,17 +2575,11 @@ entry:
 define fp128 @sqrt(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: sqrt:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq sqrtl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp sqrtl at PLT # TAILCALL
 ;
 ; GNU-LABEL: sqrt:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq sqrtf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp sqrtf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: sqrt:
 ; X86:       # %bb.0: # %entry
@@ -2809,10 +2641,10 @@ define fp128 @sqrt(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2827,17 +2659,11 @@ entry:
 define fp128 @atan(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: atan:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq atanl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp atanl at PLT # TAILCALL
 ;
 ; GNU-LABEL: atan:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq atanf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp atanf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: atan:
 ; X86:       # %bb.0: # %entry
@@ -2899,10 +2725,10 @@ define fp128 @atan(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -2917,17 +2743,11 @@ entry:
 define fp128 @atan2(fp128 %x, fp128 %y) nounwind strictfp {
 ; ANDROID-LABEL: atan2:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq atan2l at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp atan2l at PLT # TAILCALL
 ;
 ; GNU-LABEL: atan2:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq atan2f128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp atan2f128 at PLT # TAILCALL
 ;
 ; X86-LABEL: atan2:
 ; X86:       # %bb.0: # %entry
@@ -3013,10 +2833,10 @@ define fp128 @atan2(fp128 %x, fp128 %y) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -12(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -3032,17 +2852,11 @@ entry:
 define fp128 @tan(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: tan:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq tanl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp tanl at PLT # TAILCALL
 ;
 ; GNU-LABEL: tan:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq tanf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp tanf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: tan:
 ; X86:       # %bb.0: # %entry
@@ -3104,10 +2918,10 @@ define fp128 @tan(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -3122,17 +2936,11 @@ entry:
 define fp128 @tanh(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: tanh:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq tanhl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp tanhl at PLT # TAILCALL
 ;
 ; GNU-LABEL: tanh:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq tanhf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp tanhf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: tanh:
 ; X86:       # %bb.0: # %entry
@@ -3194,10 +3002,10 @@ define fp128 @tanh(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -3212,17 +3020,11 @@ entry:
 define fp128 @trunc(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: trunc:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq truncl at PLT
-; ANDROID-NEXT:    popq %rax
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp truncl at PLT # TAILCALL
 ;
 ; GNU-LABEL: trunc:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq truncf128 at PLT
-; GNU-NEXT:    popq %rax
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp truncf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: trunc:
 ; X86:       # %bb.0: # %entry
@@ -3284,10 +3086,10 @@ define fp128 @trunc(fp128 %x) nounwind strictfp {
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; WIN-X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; WIN-X86-NEXT:    movl %edi, 8(%esi)
-; WIN-X86-NEXT:    movl %edx, 12(%esi)
-; WIN-X86-NEXT:    movl %eax, (%esi)
+; WIN-X86-NEXT:    movl %edi, 12(%esi)
+; WIN-X86-NEXT:    movl %edx, 8(%esi)
 ; WIN-X86-NEXT:    movl %ecx, 4(%esi)
+; WIN-X86-NEXT:    movl %eax, (%esi)
 ; WIN-X86-NEXT:    movl %esi, %eax
 ; WIN-X86-NEXT:    leal -8(%ebp), %esp
 ; WIN-X86-NEXT:    popl %esi
@@ -3302,17 +3104,11 @@ entry:
 define i32 @lrint(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: lrint:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq lrintl at PLT
-; ANDROID-NEXT:    popq %rcx
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp lrintl at PLT # TAILCALL
 ;
 ; GNU-LABEL: lrint:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq lrintf128 at PLT
-; GNU-NEXT:    popq %rcx
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp lrintf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: lrint:
 ; X86:       # %bb.0: # %entry
@@ -3358,17 +3154,11 @@ entry:
 define i64 @llrint(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: llrint:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq llrintl at PLT
-; ANDROID-NEXT:    popq %rcx
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp llrintl at PLT # TAILCALL
 ;
 ; GNU-LABEL: llrint:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq llrintf128 at PLT
-; GNU-NEXT:    popq %rcx
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp llrintf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: llrint:
 ; X86:       # %bb.0: # %entry
@@ -3414,17 +3204,11 @@ entry:
 define i32 @lround(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: lround:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq lroundl at PLT
-; ANDROID-NEXT:    popq %rcx
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp lroundl at PLT # TAILCALL
 ;
 ; GNU-LABEL: lround:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq lroundf128 at PLT
-; GNU-NEXT:    popq %rcx
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp lroundf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: lround:
 ; X86:       # %bb.0: # %entry
@@ -3470,17 +3254,11 @@ entry:
 define i64 @llround(fp128 %x) nounwind strictfp {
 ; ANDROID-LABEL: llround:
 ; ANDROID:       # %bb.0: # %entry
-; ANDROID-NEXT:    pushq %rax
-; ANDROID-NEXT:    callq llroundl at PLT
-; ANDROID-NEXT:    popq %rcx
-; ANDROID-NEXT:    retq
+; ANDROID-NEXT:    jmp llroundl at PLT # TAILCALL
 ;
 ; GNU-LABEL: llround:
 ; GNU:       # %bb.0: # %entry
-; GNU-NEXT:    pushq %rax
-; GNU-NEXT:    callq llroundf128 at PLT
-; GNU-NEXT:    popq %rcx
-; GNU-NEXT:    retq
+; GNU-NEXT:    jmp llroundf128 at PLT # TAILCALL
 ;
 ; X86-LABEL: llround:
 ; X86:       # %bb.0: # %entry
diff --git a/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll b/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
index 8bbc6247dbafd..b96edd3ee4b23 100644
--- a/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
+++ b/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
@@ -736,22 +736,20 @@ entry:
 define i32 @lrint(x86_fp80 %x) nounwind strictfp {
 ; X86-LABEL: lrint:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    subl $12, %esp
+; X86-NEXT:    pushl %eax
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
-; X86-NEXT:    fstpt (%esp)
+; X86-NEXT:    fistpl (%esp)
 ; X86-NEXT:    wait
-; X86-NEXT:    calll lrintl
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    movl (%esp), %eax
+; X86-NEXT:    popl %ecx
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: lrint:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
-; X64-NEXT:    fstpt (%rsp)
+; X64-NEXT:    fistpl -{{[0-9]+}}(%rsp)
 ; X64-NEXT:    wait
-; X64-NEXT:    callq lrintl at PLT
-; X64-NEXT:    addq $24, %rsp
+; X64-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
 ; X64-NEXT:    retq
 entry:
   %rint = call i32 @llvm.experimental.constrained.lrint.i32.f80(x86_fp80 %x, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
@@ -761,22 +759,25 @@ entry:
 define i64 @llrint(x86_fp80 %x) nounwind strictfp {
 ; X86-LABEL: llrint:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    subl $12, %esp
-; X86-NEXT:    fldt {{[0-9]+}}(%esp)
-; X86-NEXT:    fstpt (%esp)
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
+; X86-NEXT:    andl $-8, %esp
+; X86-NEXT:    subl $8, %esp
+; X86-NEXT:    fldt 8(%ebp)
+; X86-NEXT:    fistpll (%esp)
 ; X86-NEXT:    wait
-; X86-NEXT:    calll llrintl
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    movl (%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NEXT:    movl %ebp, %esp
+; X86-NEXT:    popl %ebp
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: llrint:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
-; X64-NEXT:    fstpt (%rsp)
+; X64-NEXT:    fistpll -{{[0-9]+}}(%rsp)
 ; X64-NEXT:    wait
-; X64-NEXT:    callq llrintl at PLT
-; X64-NEXT:    addq $24, %rsp
+; X64-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
 ; X64-NEXT:    retq
 entry:
   %rint = call i64 @llvm.experimental.constrained.llrint.i64.f80(x86_fp80 %x, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
@@ -786,23 +787,11 @@ entry:
 define i32 @lround(x86_fp80 %x) nounwind strictfp {
 ; X86-LABEL: lround:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    subl $12, %esp
-; X86-NEXT:    fldt {{[0-9]+}}(%esp)
-; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
-; X86-NEXT:    calll lroundl
-; X86-NEXT:    addl $12, %esp
-; X86-NEXT:    retl
+; X86-NEXT:    jmp lroundl # TAILCALL
 ;
 ; X64-LABEL: lround:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    subq $24, %rsp
-; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
-; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
-; X64-NEXT:    callq lroundl at PLT
-; X64-NEXT:    addq $24, %rsp
-; X64-NEXT:    retq
+; X64-NEXT:    jmp lroundl at PLT # TAILCALL
 entry:
   %round = call i32 @llvm.experimental.constrained.lround.i32.f80(x86_fp80 %x, metadata !"fpexcept.strict") #0
   ret i32 %round
@@ -821,13 +810,7 @@ define i64 @llround(x86_fp80 %x) nounwind strictfp {
 ;
 ; X64-LABEL: llround:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    subq $24, %rsp
-; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
-; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
-; X64-NEXT:    callq llroundl at PLT
-; X64-NEXT:    addq $24, %rsp
-; X64-NEXT:    retq
+; X64-NEXT:    jmp llroundl at PLT # TAILCALL
 entry:
   %round = call i64 @llvm.experimental.constrained.llround.i64.f80(x86_fp80 %x, metadata !"fpexcept.strict") #0
   ret i64 %round
diff --git a/llvm/test/CodeGen/X86/fp80-strict-scalar.ll b/llvm/test/CodeGen/X86/fp80-strict-scalar.ll
index b9b1ae60d479e..443e01d3bbdac 100644
--- a/llvm/test/CodeGen/X86/fp80-strict-scalar.ll
+++ b/llvm/test/CodeGen/X86/fp80-strict-scalar.ll
@@ -38,7 +38,6 @@ define x86_fp80 @fadd_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    faddp %st, %st(1)
-; X86-NEXT:    wait
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: fadd_fp80:
@@ -46,7 +45,6 @@ define x86_fp80 @fadd_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    faddp %st, %st(1)
-; X64-NEXT:    wait
 ; X64-NEXT:    retq
   %ret = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 %a, x86_fp80 %b,
                                                                     metadata !"round.dynamic",
@@ -60,7 +58,6 @@ define x86_fp80 @fsub_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fsubp %st, %st(1)
-; X86-NEXT:    wait
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: fsub_fp80:
@@ -68,7 +65,6 @@ define x86_fp80 @fsub_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fsubp %st, %st(1)
-; X64-NEXT:    wait
 ; X64-NEXT:    retq
   %ret = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 %a, x86_fp80 %b,
                                                                     metadata !"round.dynamic",
@@ -82,7 +78,6 @@ define x86_fp80 @fmul_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fmulp %st, %st(1)
-; X86-NEXT:    wait
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: fmul_fp80:
@@ -90,7 +85,6 @@ define x86_fp80 @fmul_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fmulp %st, %st(1)
-; X64-NEXT:    wait
 ; X64-NEXT:    retq
   %ret = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 %a, x86_fp80 %b,
                                                                     metadata !"round.dynamic",
@@ -104,7 +98,6 @@ define x86_fp80 @fdiv_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fdivp %st, %st(1)
-; X86-NEXT:    wait
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: fdiv_fp80:
@@ -112,7 +105,6 @@ define x86_fp80 @fdiv_fp80(x86_fp80 %a, x86_fp80 %b) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fdivp %st, %st(1)
-; X64-NEXT:    wait
 ; X64-NEXT:    retq
   %ret = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 %a, x86_fp80 %b,
                                                                     metadata !"round.dynamic",
@@ -213,14 +205,12 @@ define x86_fp80 @fsqrt_fp80(x86_fp80 %a) nounwind strictfp {
 ; X86:       # %bb.0:
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fsqrt
-; X86-NEXT:    wait
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: fsqrt_fp80:
 ; X64:       # %bb.0:
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fsqrt
-; X64-NEXT:    wait
 ; X64-NEXT:    retq
   %ret = call x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80 %a,
                                                                     metadata !"round.dynamic",
@@ -589,7 +579,7 @@ define i64 @fp80_to_uint64(x86_fp80 %x) #0 {
 ; X86-NEXT:    subl $16, %esp
 ; X86-NEXT:    fldt 8(%ebp)
 ; X86-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
-; X86-NEXT:    fcom %st(1)
+; X86-NEXT:    fucom %st(1)
 ; X86-NEXT:    wait
 ; X86-NEXT:    fnstsw %ax
 ; X86-NEXT:    xorl %edx, %edx
@@ -604,7 +594,6 @@ define i64 @fp80_to_uint64(x86_fp80 %x) #0 {
 ; X86-NEXT:  .LBB18_2:
 ; X86-NEXT:    fstp %st(0)
 ; X86-NEXT:    fsubrp %st, %st(1)
-; X86-NEXT:    wait
 ; X86-NEXT:    fnstcw {{[0-9]+}}(%esp)
 ; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    orl $3072, %ecx # imm = 0xC00
@@ -627,14 +616,12 @@ define i64 @fp80_to_uint64(x86_fp80 %x) #0 {
 ; X64-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; X64-NEXT:    wait
 ; X64-NEXT:    xorl %eax, %eax
-; X64-NEXT:    fcomi %st(1), %st
-; X64-NEXT:    wait
+; X64-NEXT:    fucomi %st(1), %st
 ; X64-NEXT:    setbe %al
 ; X64-NEXT:    fldz
 ; X64-NEXT:    fcmovbe %st(1), %st
 ; X64-NEXT:    fstp %st(1)
 ; X64-NEXT:    fsubrp %st, %st(1)
-; X64-NEXT:    wait
 ; X64-NEXT:    fnstcw -{{[0-9]+}}(%rsp)
 ; X64-NEXT:    movzwl -{{[0-9]+}}(%rsp), %ecx
 ; X64-NEXT:    orl $3072, %ecx # imm = 0xC00
diff --git a/llvm/test/CodeGen/X86/half-constrained.ll b/llvm/test/CodeGen/X86/half-constrained.ll
index d5f2060ca20e3..934e649e3656e 100644
--- a/llvm/test/CodeGen/X86/half-constrained.ll
+++ b/llvm/test/CodeGen/X86/half-constrained.ll
@@ -24,8 +24,7 @@ define float @half_to_float() strictfp {
 ; X86-F16C:       # %bb.0:
 ; X86-F16C-NEXT:    pushl %eax
 ; X86-F16C-NEXT:    .cfi_def_cfa_offset 8
-; X86-F16C-NEXT:    movzwl a, %eax
-; X86-F16C-NEXT:    vmovd %eax, %xmm0
+; X86-F16C-NEXT:    vpinsrw $0, a, %xmm0, %xmm0
 ; X86-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; X86-F16C-NEXT:    vmovss %xmm0, (%esp)
 ; X86-F16C-NEXT:    flds (%esp)
@@ -36,20 +35,14 @@ define float @half_to_float() strictfp {
 ;
 ; X64-NOF16C-LABEL: half_to_float:
 ; X64-NOF16C:       # %bb.0:
-; X64-NOF16C-NEXT:    pushq %rax
-; X64-NOF16C-NEXT:    .cfi_def_cfa_offset 16
 ; X64-NOF16C-NEXT:    movq a at GOTPCREL(%rip), %rax
 ; X64-NOF16C-NEXT:    pinsrw $0, (%rax), %xmm0
-; X64-NOF16C-NEXT:    callq __extendhfsf2 at PLT
-; X64-NOF16C-NEXT:    popq %rax
-; X64-NOF16C-NEXT:    .cfi_def_cfa_offset 8
-; X64-NOF16C-NEXT:    retq
+; X64-NOF16C-NEXT:    jmp __extendhfsf2 at PLT # TAILCALL
 ;
 ; X64-F16C-LABEL: half_to_float:
 ; X64-F16C:       # %bb.0:
 ; X64-F16C-NEXT:    movq a at GOTPCREL(%rip), %rax
-; X64-F16C-NEXT:    movzwl (%rax), %eax
-; X64-F16C-NEXT:    vmovd %eax, %xmm0
+; X64-F16C-NEXT:    vpinsrw $0, (%rax), %xmm0, %xmm0
 ; X64-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; X64-F16C-NEXT:    retq
   %1 = load half, ptr @a, align 2
@@ -73,8 +66,7 @@ define double @half_to_double() strictfp {
 ; X86-F16C:       # %bb.0:
 ; X86-F16C-NEXT:    subl $12, %esp
 ; X86-F16C-NEXT:    .cfi_def_cfa_offset 16
-; X86-F16C-NEXT:    movzwl a, %eax
-; X86-F16C-NEXT:    vmovd %eax, %xmm0
+; X86-F16C-NEXT:    vpinsrw $0, a, %xmm0, %xmm0
 ; X86-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; X86-F16C-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
 ; X86-F16C-NEXT:    vmovsd %xmm0, (%esp)
@@ -99,8 +91,7 @@ define double @half_to_double() strictfp {
 ; X64-F16C-LABEL: half_to_double:
 ; X64-F16C:       # %bb.0:
 ; X64-F16C-NEXT:    movq a at GOTPCREL(%rip), %rax
-; X64-F16C-NEXT:    movzwl (%rax), %eax
-; X64-F16C-NEXT:    vmovd %eax, %xmm0
+; X64-F16C-NEXT:    vpinsrw $0, (%rax), %xmm0, %xmm0
 ; X64-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; X64-F16C-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
 ; X64-F16C-NEXT:    retq
@@ -342,11 +333,9 @@ define void @add() strictfp {
 ;
 ; X86-F16C-LABEL: add:
 ; X86-F16C:       # %bb.0:
-; X86-F16C-NEXT:    movzwl a, %eax
-; X86-F16C-NEXT:    vmovd %eax, %xmm0
+; X86-F16C-NEXT:    vpinsrw $0, a, %xmm0, %xmm0
 ; X86-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; X86-F16C-NEXT:    movzwl b, %eax
-; X86-F16C-NEXT:    vmovd %eax, %xmm1
+; X86-F16C-NEXT:    vpinsrw $0, b, %xmm0, %xmm1
 ; X86-F16C-NEXT:    vcvtph2ps %xmm1, %xmm1
 ; X86-F16C-NEXT:    vaddss %xmm1, %xmm0, %xmm0
 ; X86-F16C-NEXT:    vxorps %xmm1, %xmm1, %xmm1
@@ -378,12 +367,10 @@ define void @add() strictfp {
 ; X64-F16C-LABEL: add:
 ; X64-F16C:       # %bb.0:
 ; X64-F16C-NEXT:    movq a at GOTPCREL(%rip), %rax
-; X64-F16C-NEXT:    movzwl (%rax), %eax
-; X64-F16C-NEXT:    vmovd %eax, %xmm0
+; X64-F16C-NEXT:    vpinsrw $0, (%rax), %xmm0, %xmm0
 ; X64-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; X64-F16C-NEXT:    movq b at GOTPCREL(%rip), %rax
-; X64-F16C-NEXT:    movzwl (%rax), %eax
-; X64-F16C-NEXT:    vmovd %eax, %xmm1
+; X64-F16C-NEXT:    vpinsrw $0, (%rax), %xmm0, %xmm1
 ; X64-F16C-NEXT:    vcvtph2ps %xmm1, %xmm1
 ; X64-F16C-NEXT:    vaddss %xmm1, %xmm0, %xmm0
 ; X64-F16C-NEXT:    vxorps %xmm1, %xmm1, %xmm1
diff --git a/llvm/test/CodeGen/X86/half-darwin.ll b/llvm/test/CodeGen/X86/half-darwin.ll
index 8765f7dbe6d34..011b5b5406319 100644
--- a/llvm/test/CodeGen/X86/half-darwin.ll
+++ b/llvm/test/CodeGen/X86/half-darwin.ll
@@ -165,8 +165,7 @@ define float @strict_extendhfsf(ptr %ptr) nounwind strictfp {
 ;
 ; CHECK-F16C-LABEL: strict_extendhfsf:
 ; CHECK-F16C:       ## %bb.0:
-; CHECK-F16C-NEXT:    movzwl (%rdi), %eax
-; CHECK-F16C-NEXT:    vmovd %eax, %xmm0
+; CHECK-F16C-NEXT:    vpinsrw $0, (%rdi), %xmm0, %xmm0
 ; CHECK-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; CHECK-F16C-NEXT:    retq
 ;
diff --git a/llvm/test/CodeGen/X86/ldexp-strict.ll b/llvm/test/CodeGen/X86/ldexp-strict.ll
index f13c59da46c23..eb80e9e50d78b 100644
--- a/llvm/test/CodeGen/X86/ldexp-strict.ll
+++ b/llvm/test/CodeGen/X86/ldexp-strict.ll
@@ -12,11 +12,8 @@
 define float @test_strict_ldexp_f32_i32(ptr addrspace(1) %out, float %a, i32 %b) nounwind #2 {
 ; X64-LABEL: test_strict_ldexp_f32_i32:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
 ; X64-NEXT:    movl %esi, %edi
-; X64-NEXT:    callq ldexpf at PLT
-; X64-NEXT:    popq %rax
-; X64-NEXT:    retq
+; X64-NEXT:    jmp ldexpf at PLT # TAILCALL
   %result = call float @llvm.experimental.constrained.ldexp.f32.i32(float %a, i32 %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %result
 }
@@ -24,11 +21,8 @@ define float @test_strict_ldexp_f32_i32(ptr addrspace(1) %out, float %a, i32 %b)
 define double @test_strict_ldexp_f64_i32(ptr addrspace(1) %out, double %a, i32 %b) nounwind #2 {
 ; X64-LABEL: test_strict_ldexp_f64_i32:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
 ; X64-NEXT:    movl %esi, %edi
-; X64-NEXT:    callq ldexp at PLT
-; X64-NEXT:    popq %rax
-; X64-NEXT:    retq
+; X64-NEXT:    jmp ldexp at PLT # TAILCALL
   %result = call double @llvm.experimental.constrained.ldexp.f64.i32(double %a, i32 %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret double %result
 }
diff --git a/llvm/test/CodeGen/X86/llrint-conv.ll b/llvm/test/CodeGen/X86/llrint-conv.ll
index 01551030d938a..7da257936a128 100644
--- a/llvm/test/CodeGen/X86/llrint-conv.ll
+++ b/llvm/test/CodeGen/X86/llrint-conv.ll
@@ -343,8 +343,10 @@ define i64 @test_llrint_i64_f16_strict(half %x) nounwind strictfp {
 ; X64-SSE:       # %bb.0: # %entry
 ; X64-SSE-NEXT:    pushq %rax
 ; X64-SSE-NEXT:    callq __extendhfsf2 at PLT
-; X64-SSE-NEXT:    callq llrintf at PLT
-; X64-SSE-NEXT:    movq %xmm0, %rax
+; X64-SSE-NEXT:    callq rintf at PLT
+; X64-SSE-NEXT:    callq __truncsfhf2 at PLT
+; X64-SSE-NEXT:    callq __extendhfsf2 at PLT
+; X64-SSE-NEXT:    cvttss2si %xmm0, %rax
 ; X64-SSE-NEXT:    popq %rcx
 ; X64-SSE-NEXT:    retq
 entry:
@@ -355,12 +357,17 @@ entry:
 define i64 @test_llrint_i64_f32_strict(float %x) nounwind strictfp {
 ; X86-NOSSE-LABEL: test_llrint_i64_f32_strict:
 ; X86-NOSSE:       # %bb.0: # %entry
-; X86-NOSSE-NEXT:    pushl %eax
-; X86-NOSSE-NEXT:    flds {{[0-9]+}}(%esp)
-; X86-NOSSE-NEXT:    fstps (%esp)
+; X86-NOSSE-NEXT:    pushl %ebp
+; X86-NOSSE-NEXT:    movl %esp, %ebp
+; X86-NOSSE-NEXT:    andl $-8, %esp
+; X86-NOSSE-NEXT:    subl $8, %esp
+; X86-NOSSE-NEXT:    flds 8(%ebp)
+; X86-NOSSE-NEXT:    fistpll (%esp)
 ; X86-NOSSE-NEXT:    wait
-; X86-NOSSE-NEXT:    calll llrintf
-; X86-NOSSE-NEXT:    popl %ecx
+; X86-NOSSE-NEXT:    movl (%esp), %eax
+; X86-NOSSE-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NOSSE-NEXT:    movl %ebp, %esp
+; X86-NOSSE-NEXT:    popl %ebp
 ; X86-NOSSE-NEXT:    retl
 ;
 ; X86-NOX87-LABEL: test_llrint_i64_f32_strict:
@@ -372,28 +379,47 @@ define i64 @test_llrint_i64_f32_strict(float %x) nounwind strictfp {
 ;
 ; X86-SSE2-LABEL: test_llrint_i64_f32_strict:
 ; X86-SSE2:       # %bb.0: # %entry
-; X86-SSE2-NEXT:    pushl %eax
+; X86-SSE2-NEXT:    pushl %ebp
+; X86-SSE2-NEXT:    movl %esp, %ebp
+; X86-SSE2-NEXT:    andl $-8, %esp
+; X86-SSE2-NEXT:    subl $8, %esp
 ; X86-SSE2-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; X86-SSE2-NEXT:    movss %xmm0, (%esp)
-; X86-SSE2-NEXT:    calll llrintf
-; X86-SSE2-NEXT:    popl %ecx
+; X86-SSE2-NEXT:    flds (%esp)
+; X86-SSE2-NEXT:    fistpll (%esp)
+; X86-SSE2-NEXT:    wait
+; X86-SSE2-NEXT:    movl (%esp), %eax
+; X86-SSE2-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-SSE2-NEXT:    movl %ebp, %esp
+; X86-SSE2-NEXT:    popl %ebp
 ; X86-SSE2-NEXT:    retl
 ;
 ; X86-AVX-LABEL: test_llrint_i64_f32_strict:
 ; X86-AVX:       # %bb.0: # %entry
-; X86-AVX-NEXT:    pushl %eax
+; X86-AVX-NEXT:    pushl %ebp
+; X86-AVX-NEXT:    movl %esp, %ebp
+; X86-AVX-NEXT:    andl $-8, %esp
+; X86-AVX-NEXT:    subl $8, %esp
 ; X86-AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; X86-AVX-NEXT:    vmovss %xmm0, (%esp)
-; X86-AVX-NEXT:    calll llrintf
-; X86-AVX-NEXT:    popl %ecx
+; X86-AVX-NEXT:    flds (%esp)
+; X86-AVX-NEXT:    fistpll (%esp)
+; X86-AVX-NEXT:    wait
+; X86-AVX-NEXT:    movl (%esp), %eax
+; X86-AVX-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-AVX-NEXT:    movl %ebp, %esp
+; X86-AVX-NEXT:    popl %ebp
 ; X86-AVX-NEXT:    retl
 ;
-; X64-LABEL: test_llrint_i64_f32_strict:
-; X64:       # %bb.0: # %entry
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    callq llrintf at PLT
-; X64-NEXT:    popq %rcx
-; X64-NEXT:    retq
+; X64-SSE-LABEL: test_llrint_i64_f32_strict:
+; X64-SSE:       # %bb.0: # %entry
+; X64-SSE-NEXT:    cvtss2si %xmm0, %rax
+; X64-SSE-NEXT:    retq
+;
+; X64-AVX-LABEL: test_llrint_i64_f32_strict:
+; X64-AVX:       # %bb.0: # %entry
+; X64-AVX-NEXT:    vcvtss2si %xmm0, %rax
+; X64-AVX-NEXT:    retq
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llrint.i64.f32(float %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %0
@@ -402,12 +428,17 @@ entry:
 define i64 @test_llrint_i64_f64_strict(double %x) nounwind strictfp {
 ; X86-NOSSE-LABEL: test_llrint_i64_f64_strict:
 ; X86-NOSSE:       # %bb.0: # %entry
+; X86-NOSSE-NEXT:    pushl %ebp
+; X86-NOSSE-NEXT:    movl %esp, %ebp
+; X86-NOSSE-NEXT:    andl $-8, %esp
 ; X86-NOSSE-NEXT:    subl $8, %esp
-; X86-NOSSE-NEXT:    fldl {{[0-9]+}}(%esp)
-; X86-NOSSE-NEXT:    fstpl (%esp)
+; X86-NOSSE-NEXT:    fldl 8(%ebp)
+; X86-NOSSE-NEXT:    fistpll (%esp)
 ; X86-NOSSE-NEXT:    wait
-; X86-NOSSE-NEXT:    calll llrint
-; X86-NOSSE-NEXT:    addl $8, %esp
+; X86-NOSSE-NEXT:    movl (%esp), %eax
+; X86-NOSSE-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NOSSE-NEXT:    movl %ebp, %esp
+; X86-NOSSE-NEXT:    popl %ebp
 ; X86-NOSSE-NEXT:    retl
 ;
 ; X86-NOX87-LABEL: test_llrint_i64_f64_strict:
@@ -420,28 +451,47 @@ define i64 @test_llrint_i64_f64_strict(double %x) nounwind strictfp {
 ;
 ; X86-SSE2-LABEL: test_llrint_i64_f64_strict:
 ; X86-SSE2:       # %bb.0: # %entry
+; X86-SSE2-NEXT:    pushl %ebp
+; X86-SSE2-NEXT:    movl %esp, %ebp
+; X86-SSE2-NEXT:    andl $-8, %esp
 ; X86-SSE2-NEXT:    subl $8, %esp
 ; X86-SSE2-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
 ; X86-SSE2-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE2-NEXT:    calll llrint
-; X86-SSE2-NEXT:    addl $8, %esp
+; X86-SSE2-NEXT:    fldl (%esp)
+; X86-SSE2-NEXT:    fistpll (%esp)
+; X86-SSE2-NEXT:    wait
+; X86-SSE2-NEXT:    movl (%esp), %eax
+; X86-SSE2-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-SSE2-NEXT:    movl %ebp, %esp
+; X86-SSE2-NEXT:    popl %ebp
 ; X86-SSE2-NEXT:    retl
 ;
 ; X86-AVX-LABEL: test_llrint_i64_f64_strict:
 ; X86-AVX:       # %bb.0: # %entry
+; X86-AVX-NEXT:    pushl %ebp
+; X86-AVX-NEXT:    movl %esp, %ebp
+; X86-AVX-NEXT:    andl $-8, %esp
 ; X86-AVX-NEXT:    subl $8, %esp
 ; X86-AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
 ; X86-AVX-NEXT:    vmovsd %xmm0, (%esp)
-; X86-AVX-NEXT:    calll llrint
-; X86-AVX-NEXT:    addl $8, %esp
+; X86-AVX-NEXT:    fldl (%esp)
+; X86-AVX-NEXT:    fistpll (%esp)
+; X86-AVX-NEXT:    wait
+; X86-AVX-NEXT:    movl (%esp), %eax
+; X86-AVX-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-AVX-NEXT:    movl %ebp, %esp
+; X86-AVX-NEXT:    popl %ebp
 ; X86-AVX-NEXT:    retl
 ;
-; X64-LABEL: test_llrint_i64_f64_strict:
-; X64:       # %bb.0: # %entry
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    callq llrint at PLT
-; X64-NEXT:    popq %rcx
-; X64-NEXT:    retq
+; X64-SSE-LABEL: test_llrint_i64_f64_strict:
+; X64-SSE:       # %bb.0: # %entry
+; X64-SSE-NEXT:    cvtsd2si %xmm0, %rax
+; X64-SSE-NEXT:    retq
+;
+; X64-AVX-LABEL: test_llrint_i64_f64_strict:
+; X64-AVX:       # %bb.0: # %entry
+; X64-AVX-NEXT:    vcvtsd2si %xmm0, %rax
+; X64-AVX-NEXT:    retq
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llrint.i64.f64(double %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %0
@@ -450,12 +500,17 @@ entry:
 define i64 @test_llrint_i64_f80_strict(x86_fp80 %x) nounwind strictfp {
 ; X86-LABEL: test_llrint_i64_f80_strict:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    subl $12, %esp
-; X86-NEXT:    fldt {{[0-9]+}}(%esp)
-; X86-NEXT:    fstpt (%esp)
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
+; X86-NEXT:    andl $-8, %esp
+; X86-NEXT:    subl $8, %esp
+; X86-NEXT:    fldt 8(%ebp)
+; X86-NEXT:    fistpll (%esp)
 ; X86-NEXT:    wait
-; X86-NEXT:    calll llrintl
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    movl (%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NEXT:    movl %ebp, %esp
+; X86-NEXT:    popl %ebp
 ; X86-NEXT:    retl
 ;
 ; X86-NOX87-LABEL: test_llrint_i64_f80_strict:
@@ -470,12 +525,10 @@ define i64 @test_llrint_i64_f80_strict(x86_fp80 %x) nounwind strictfp {
 ;
 ; X64-LABEL: test_llrint_i64_f80_strict:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
-; X64-NEXT:    fstpt (%rsp)
+; X64-NEXT:    fistpll -{{[0-9]+}}(%rsp)
 ; X64-NEXT:    wait
-; X64-NEXT:    callq llrintl at PLT
-; X64-NEXT:    addq $24, %rsp
+; X64-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
 ; X64-NEXT:    retq
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llrint.i64.f80(x86_fp80 %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
@@ -547,10 +600,7 @@ define i64 @test_llrint_i64_f128_strict(fp128 %x) nounwind strictfp {
 ;
 ; X64-LABEL: test_llrint_i64_f128_strict:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    callq llrintl at PLT
-; X64-NEXT:    popq %rcx
-; X64-NEXT:    retq
+; X64-NEXT:    jmp llrintl at PLT # TAILCALL
 entry:
   %0 = tail call i64 @llvm.experimental.constrained.llrint.i64.f128(fp128 %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %0
diff --git a/llvm/test/CodeGen/X86/lrint-conv-i32.ll b/llvm/test/CodeGen/X86/lrint-conv-i32.ll
index f4cb0d3ff87e6..a982de0a1fa52 100644
--- a/llvm/test/CodeGen/X86/lrint-conv-i32.ll
+++ b/llvm/test/CodeGen/X86/lrint-conv-i32.ll
@@ -198,11 +198,12 @@ define i32 @test_lrint_i32_f16_strict(half %x) nounwind strictfp {
 ; X86-NOSSE:       # %bb.0:
 ; X86-NOSSE-NEXT:    pushl %eax
 ; X86-NOSSE-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
-; X86-NOSSE-NEXT:    movl %eax, (%esp)
+; X86-NOSSE-NEXT:    pushl %eax
 ; X86-NOSSE-NEXT:    calll __extendhfsf2
-; X86-NOSSE-NEXT:    fstps (%esp)
+; X86-NOSSE-NEXT:    addl $4, %esp
+; X86-NOSSE-NEXT:    fistpl (%esp)
 ; X86-NOSSE-NEXT:    wait
-; X86-NOSSE-NEXT:    calll lrintf
+; X86-NOSSE-NEXT:    movl (%esp), %eax
 ; X86-NOSSE-NEXT:    popl %ecx
 ; X86-NOSSE-NEXT:    retl
 ;
@@ -215,11 +216,16 @@ define i32 @test_lrint_i32_f16_strict(half %x) nounwind strictfp {
 ; X86-SSE2-NEXT:    calll __extendhfsf2
 ; X86-SSE2-NEXT:    fstps (%esp)
 ; X86-SSE2-NEXT:    wait
-; X86-SSE2-NEXT:    calll lrintf
+; X86-SSE2-NEXT:    calll rintf
+; X86-SSE2-NEXT:    fstps (%esp)
+; X86-SSE2-NEXT:    wait
+; X86-SSE2-NEXT:    calll __truncsfhf2
+; X86-SSE2-NEXT:    pextrw $0, %xmm0, %eax
+; X86-SSE2-NEXT:    movw %ax, (%esp)
+; X86-SSE2-NEXT:    calll __extendhfsf2
 ; X86-SSE2-NEXT:    fstps {{[0-9]+}}(%esp)
 ; X86-SSE2-NEXT:    wait
-; X86-SSE2-NEXT:    movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-SSE2-NEXT:    movd %xmm0, %eax
+; X86-SSE2-NEXT:    cvttss2si {{[0-9]+}}(%esp), %eax
 ; X86-SSE2-NEXT:    addl $8, %esp
 ; X86-SSE2-NEXT:    retl
 ;
@@ -227,8 +233,10 @@ define i32 @test_lrint_i32_f16_strict(half %x) nounwind strictfp {
 ; X64-SSE:       # %bb.0:
 ; X64-SSE-NEXT:    pushq %rax
 ; X64-SSE-NEXT:    callq __extendhfsf2 at PLT
-; X64-SSE-NEXT:    callq lrintf at PLT
-; X64-SSE-NEXT:    movd %xmm0, %eax
+; X64-SSE-NEXT:    callq rintf at PLT
+; X64-SSE-NEXT:    callq __truncsfhf2 at PLT
+; X64-SSE-NEXT:    callq __extendhfsf2 at PLT
+; X64-SSE-NEXT:    cvttss2si %xmm0, %eax
 ; X64-SSE-NEXT:    popq %rcx
 ; X64-SSE-NEXT:    retq
   %conv = tail call i32 @llvm.experimental.constrained.lrint.i32.f16(half %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
@@ -240,36 +248,31 @@ define i32 @test_lrint_i32_f32_strict(float %x) nounwind strictfp {
 ; X86-NOSSE:       # %bb.0:
 ; X86-NOSSE-NEXT:    pushl %eax
 ; X86-NOSSE-NEXT:    flds {{[0-9]+}}(%esp)
-; X86-NOSSE-NEXT:    fstps (%esp)
+; X86-NOSSE-NEXT:    fistpl (%esp)
 ; X86-NOSSE-NEXT:    wait
-; X86-NOSSE-NEXT:    calll lrintf
+; X86-NOSSE-NEXT:    movl (%esp), %eax
 ; X86-NOSSE-NEXT:    popl %ecx
 ; X86-NOSSE-NEXT:    retl
 ;
 ; X86-SSE2-LABEL: test_lrint_i32_f32_strict:
 ; X86-SSE2:       # %bb.0:
-; X86-SSE2-NEXT:    pushl %eax
-; X86-SSE2-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-SSE2-NEXT:    movss %xmm0, (%esp)
-; X86-SSE2-NEXT:    calll lrintf
-; X86-SSE2-NEXT:    popl %ecx
+; X86-SSE2-NEXT:    cvtss2si {{[0-9]+}}(%esp), %eax
 ; X86-SSE2-NEXT:    retl
 ;
 ; X86-AVX-LABEL: test_lrint_i32_f32_strict:
 ; X86-AVX:       # %bb.0:
-; X86-AVX-NEXT:    pushl %eax
-; X86-AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-AVX-NEXT:    vmovss %xmm0, (%esp)
-; X86-AVX-NEXT:    calll lrintf
-; X86-AVX-NEXT:    popl %ecx
+; X86-AVX-NEXT:    vcvtss2si {{[0-9]+}}(%esp), %eax
 ; X86-AVX-NEXT:    retl
 ;
-; X64-LABEL: test_lrint_i32_f32_strict:
-; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    callq lrintf at PLT
-; X64-NEXT:    popq %rcx
-; X64-NEXT:    retq
+; X64-SSE-LABEL: test_lrint_i32_f32_strict:
+; X64-SSE:       # %bb.0:
+; X64-SSE-NEXT:    cvtss2si %xmm0, %eax
+; X64-SSE-NEXT:    retq
+;
+; X64-AVX-LABEL: test_lrint_i32_f32_strict:
+; X64-AVX:       # %bb.0:
+; X64-AVX-NEXT:    vcvtss2si %xmm0, %eax
+; X64-AVX-NEXT:    retq
   %conv = tail call i32 @llvm.experimental.constrained.lrint.i32.f32(float %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i32 %conv
 }
@@ -277,38 +280,33 @@ define i32 @test_lrint_i32_f32_strict(float %x) nounwind strictfp {
 define i32 @test_lrint_i32_f64_strict(double %x) nounwind strictfp {
 ; X86-NOSSE-LABEL: test_lrint_i32_f64_strict:
 ; X86-NOSSE:       # %bb.0:
-; X86-NOSSE-NEXT:    subl $8, %esp
+; X86-NOSSE-NEXT:    pushl %eax
 ; X86-NOSSE-NEXT:    fldl {{[0-9]+}}(%esp)
-; X86-NOSSE-NEXT:    fstpl (%esp)
+; X86-NOSSE-NEXT:    fistpl (%esp)
 ; X86-NOSSE-NEXT:    wait
-; X86-NOSSE-NEXT:    calll lrint
-; X86-NOSSE-NEXT:    addl $8, %esp
+; X86-NOSSE-NEXT:    movl (%esp), %eax
+; X86-NOSSE-NEXT:    popl %ecx
 ; X86-NOSSE-NEXT:    retl
 ;
 ; X86-SSE2-LABEL: test_lrint_i32_f64_strict:
 ; X86-SSE2:       # %bb.0:
-; X86-SSE2-NEXT:    subl $8, %esp
-; X86-SSE2-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-SSE2-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE2-NEXT:    calll lrint
-; X86-SSE2-NEXT:    addl $8, %esp
+; X86-SSE2-NEXT:    cvtsd2si {{[0-9]+}}(%esp), %eax
 ; X86-SSE2-NEXT:    retl
 ;
 ; X86-AVX-LABEL: test_lrint_i32_f64_strict:
 ; X86-AVX:       # %bb.0:
-; X86-AVX-NEXT:    subl $8, %esp
-; X86-AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; X86-AVX-NEXT:    vmovsd %xmm0, (%esp)
-; X86-AVX-NEXT:    calll lrint
-; X86-AVX-NEXT:    addl $8, %esp
+; X86-AVX-NEXT:    vcvtsd2si {{[0-9]+}}(%esp), %eax
 ; X86-AVX-NEXT:    retl
 ;
-; X64-LABEL: test_lrint_i32_f64_strict:
-; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    callq lrint at PLT
-; X64-NEXT:    popq %rcx
-; X64-NEXT:    retq
+; X64-SSE-LABEL: test_lrint_i32_f64_strict:
+; X64-SSE:       # %bb.0:
+; X64-SSE-NEXT:    cvtsd2si %xmm0, %eax
+; X64-SSE-NEXT:    retq
+;
+; X64-AVX-LABEL: test_lrint_i32_f64_strict:
+; X64-AVX:       # %bb.0:
+; X64-AVX-NEXT:    vcvtsd2si %xmm0, %eax
+; X64-AVX-NEXT:    retq
   %conv = tail call i32 @llvm.experimental.constrained.lrint.i32.f64(double %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i32 %conv
 }
@@ -316,22 +314,20 @@ define i32 @test_lrint_i32_f64_strict(double %x) nounwind strictfp {
 define i32 @test_lrint_i32_f80_strict(x86_fp80 %x) nounwind strictfp {
 ; X86-LABEL: test_lrint_i32_f80_strict:
 ; X86:       # %bb.0:
-; X86-NEXT:    subl $12, %esp
+; X86-NEXT:    pushl %eax
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
-; X86-NEXT:    fstpt (%esp)
+; X86-NEXT:    fistpl (%esp)
 ; X86-NEXT:    wait
-; X86-NEXT:    calll lrintl
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    movl (%esp), %eax
+; X86-NEXT:    popl %ecx
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: test_lrint_i32_f80_strict:
 ; X64:       # %bb.0:
-; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
-; X64-NEXT:    fstpt (%rsp)
+; X64-NEXT:    fistpl -{{[0-9]+}}(%rsp)
 ; X64-NEXT:    wait
-; X64-NEXT:    callq lrintl at PLT
-; X64-NEXT:    addq $24, %rsp
+; X64-NEXT:    movl -{{[0-9]+}}(%rsp), %eax
 ; X64-NEXT:    retq
   %conv = tail call i32 @llvm.experimental.constrained.lrint.i32.f80(x86_fp80 %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i32 %conv
@@ -386,10 +382,7 @@ define i32 @test_lrint_i32_f128_strict(fp128 %x) nounwind strictfp {
 ;
 ; X64-LABEL: test_lrint_i32_f128_strict:
 ; X64:       # %bb.0:
-; X64-NEXT:    pushq %rax
-; X64-NEXT:    callq lrintl at PLT
-; X64-NEXT:    popq %rcx
-; X64-NEXT:    retq
+; X64-NEXT:    jmp lrintl at PLT # TAILCALL
   %conv = tail call i32 @llvm.experimental.constrained.lrint.i32.f128(fp128 %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i32 %conv
 }
diff --git a/llvm/test/CodeGen/X86/lrint-conv-i64.ll b/llvm/test/CodeGen/X86/lrint-conv-i64.ll
index c45918ea4d5ee..a20ca1a59fb6a 100644
--- a/llvm/test/CodeGen/X86/lrint-conv-i64.ll
+++ b/llvm/test/CodeGen/X86/lrint-conv-i64.ll
@@ -273,8 +273,10 @@ define i64 @test_lrint_i64_f16_strict(half %x) nounwind {
 ; SSE:       # %bb.0:
 ; SSE-NEXT:    pushq %rax
 ; SSE-NEXT:    callq __extendhfsf2 at PLT
-; SSE-NEXT:    callq lrintf at PLT
-; SSE-NEXT:    movq %xmm0, %rax
+; SSE-NEXT:    callq rintf at PLT
+; SSE-NEXT:    callq __truncsfhf2 at PLT
+; SSE-NEXT:    callq __extendhfsf2 at PLT
+; SSE-NEXT:    cvttss2si %xmm0, %rax
 ; SSE-NEXT:    popq %rcx
 ; SSE-NEXT:    retq
   %conv = tail call i64 @llvm.experimental.constrained.lrint.i64.f16(half %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
@@ -284,11 +286,16 @@ define i64 @test_lrint_i64_f16_strict(half %x) nounwind {
 define i64 @test_lrint_i64_f32_strict(float %x) nounwind {
 ; X86-NOSSE-LABEL: test_lrint_i64_f32_strict:
 ; X86-NOSSE:       # %bb.0:
-; X86-NOSSE-NEXT:    pushl %eax
-; X86-NOSSE-NEXT:    flds {{[0-9]+}}(%esp)
-; X86-NOSSE-NEXT:    fstps (%esp)
-; X86-NOSSE-NEXT:    calll lrintf
-; X86-NOSSE-NEXT:    popl %ecx
+; X86-NOSSE-NEXT:    pushl %ebp
+; X86-NOSSE-NEXT:    movl %esp, %ebp
+; X86-NOSSE-NEXT:    andl $-8, %esp
+; X86-NOSSE-NEXT:    subl $8, %esp
+; X86-NOSSE-NEXT:    flds 8(%ebp)
+; X86-NOSSE-NEXT:    fistpll (%esp)
+; X86-NOSSE-NEXT:    movl (%esp), %eax
+; X86-NOSSE-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NOSSE-NEXT:    movl %ebp, %esp
+; X86-NOSSE-NEXT:    popl %ebp
 ; X86-NOSSE-NEXT:    retl
 ;
 ; X86-NOX87-LABEL: test_lrint_i64_f32_strict:
@@ -300,19 +307,29 @@ define i64 @test_lrint_i64_f32_strict(float %x) nounwind {
 ;
 ; X86-SSE2-LABEL: test_lrint_i64_f32_strict:
 ; X86-SSE2:       # %bb.0:
-; X86-SSE2-NEXT:    pushl %eax
+; X86-SSE2-NEXT:    pushl %ebp
+; X86-SSE2-NEXT:    movl %esp, %ebp
+; X86-SSE2-NEXT:    andl $-8, %esp
+; X86-SSE2-NEXT:    subl $8, %esp
 ; X86-SSE2-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; X86-SSE2-NEXT:    movss %xmm0, (%esp)
-; X86-SSE2-NEXT:    calll lrintf
-; X86-SSE2-NEXT:    popl %ecx
+; X86-SSE2-NEXT:    flds (%esp)
+; X86-SSE2-NEXT:    fistpll (%esp)
+; X86-SSE2-NEXT:    movl (%esp), %eax
+; X86-SSE2-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-SSE2-NEXT:    movl %ebp, %esp
+; X86-SSE2-NEXT:    popl %ebp
 ; X86-SSE2-NEXT:    retl
 ;
-; CHECK-LABEL: test_lrint_i64_f32_strict:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    callq lrintf at PLT
-; CHECK-NEXT:    popq %rcx
-; CHECK-NEXT:    retq
+; SSE-LABEL: test_lrint_i64_f32_strict:
+; SSE:       # %bb.0:
+; SSE-NEXT:    cvtss2si %xmm0, %rax
+; SSE-NEXT:    retq
+;
+; AVX-LABEL: test_lrint_i64_f32_strict:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vcvtss2si %xmm0, %rax
+; AVX-NEXT:    retq
   %conv = tail call i64 @llvm.experimental.constrained.lrint.i64.f32(float %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %conv
 }
@@ -320,11 +337,16 @@ define i64 @test_lrint_i64_f32_strict(float %x) nounwind {
 define i64 @test_lrint_i64_f64_strict(double %x) nounwind {
 ; X86-NOSSE-LABEL: test_lrint_i64_f64_strict:
 ; X86-NOSSE:       # %bb.0:
+; X86-NOSSE-NEXT:    pushl %ebp
+; X86-NOSSE-NEXT:    movl %esp, %ebp
+; X86-NOSSE-NEXT:    andl $-8, %esp
 ; X86-NOSSE-NEXT:    subl $8, %esp
-; X86-NOSSE-NEXT:    fldl {{[0-9]+}}(%esp)
-; X86-NOSSE-NEXT:    fstpl (%esp)
-; X86-NOSSE-NEXT:    calll lrint
-; X86-NOSSE-NEXT:    addl $8, %esp
+; X86-NOSSE-NEXT:    fldl 8(%ebp)
+; X86-NOSSE-NEXT:    fistpll (%esp)
+; X86-NOSSE-NEXT:    movl (%esp), %eax
+; X86-NOSSE-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NOSSE-NEXT:    movl %ebp, %esp
+; X86-NOSSE-NEXT:    popl %ebp
 ; X86-NOSSE-NEXT:    retl
 ;
 ; X86-NOX87-LABEL: test_lrint_i64_f64_strict:
@@ -337,19 +359,29 @@ define i64 @test_lrint_i64_f64_strict(double %x) nounwind {
 ;
 ; X86-SSE2-LABEL: test_lrint_i64_f64_strict:
 ; X86-SSE2:       # %bb.0:
+; X86-SSE2-NEXT:    pushl %ebp
+; X86-SSE2-NEXT:    movl %esp, %ebp
+; X86-SSE2-NEXT:    andl $-8, %esp
 ; X86-SSE2-NEXT:    subl $8, %esp
 ; X86-SSE2-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
 ; X86-SSE2-NEXT:    movsd %xmm0, (%esp)
-; X86-SSE2-NEXT:    calll lrint
-; X86-SSE2-NEXT:    addl $8, %esp
+; X86-SSE2-NEXT:    fldl (%esp)
+; X86-SSE2-NEXT:    fistpll (%esp)
+; X86-SSE2-NEXT:    movl (%esp), %eax
+; X86-SSE2-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-SSE2-NEXT:    movl %ebp, %esp
+; X86-SSE2-NEXT:    popl %ebp
 ; X86-SSE2-NEXT:    retl
 ;
-; CHECK-LABEL: test_lrint_i64_f64_strict:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    callq lrint at PLT
-; CHECK-NEXT:    popq %rcx
-; CHECK-NEXT:    retq
+; SSE-LABEL: test_lrint_i64_f64_strict:
+; SSE:       # %bb.0:
+; SSE-NEXT:    cvtsd2si %xmm0, %rax
+; SSE-NEXT:    retq
+;
+; AVX-LABEL: test_lrint_i64_f64_strict:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vcvtsd2si %xmm0, %rax
+; AVX-NEXT:    retq
   %conv = tail call i64 @llvm.experimental.constrained.lrint.i64.f64(double %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %conv
 }
@@ -357,11 +389,16 @@ define i64 @test_lrint_i64_f64_strict(double %x) nounwind {
 define i64 @test_lrint_i64_f80_strict(x86_fp80 %x) nounwind {
 ; X86-LABEL: test_lrint_i64_f80_strict:
 ; X86:       # %bb.0:
-; X86-NEXT:    subl $12, %esp
-; X86-NEXT:    fldt {{[0-9]+}}(%esp)
-; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    calll lrintl
-; X86-NEXT:    addl $12, %esp
+; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
+; X86-NEXT:    andl $-8, %esp
+; X86-NEXT:    subl $8, %esp
+; X86-NEXT:    fldt 8(%ebp)
+; X86-NEXT:    fistpll (%esp)
+; X86-NEXT:    movl (%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
+; X86-NEXT:    movl %ebp, %esp
+; X86-NEXT:    popl %ebp
 ; X86-NEXT:    retl
 ;
 ; X86-NOX87-LABEL: test_lrint_i64_f80_strict:
@@ -376,11 +413,9 @@ define i64 @test_lrint_i64_f80_strict(x86_fp80 %x) nounwind {
 ;
 ; CHECK-LABEL: test_lrint_i64_f80_strict:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    fldt {{[0-9]+}}(%rsp)
-; CHECK-NEXT:    fstpt (%rsp)
-; CHECK-NEXT:    callq lrintl at PLT
-; CHECK-NEXT:    addq $24, %rsp
+; CHECK-NEXT:    fistpll -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movq -{{[0-9]+}}(%rsp), %rax
 ; CHECK-NEXT:    retq
   %conv = tail call i64 @llvm.experimental.constrained.lrint.i64.f80(x86_fp80 %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %conv
@@ -422,10 +457,7 @@ define i64 @test_lrint_i64_f128_strict(fp128 %x) nounwind {
 ;
 ; CHECK-LABEL: test_lrint_i64_f128_strict:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    callq lrintl at PLT
-; CHECK-NEXT:    popq %rcx
-; CHECK-NEXT:    retq
+; CHECK-NEXT:    jmp lrintl at PLT # TAILCALL
   %conv = tail call i64 @llvm.experimental.constrained.lrint.i64.f128(fp128 %x, metadata!"round.dynamic", metadata!"fpexcept.strict")
   ret i64 %conv
 }
diff --git a/llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll b/llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll
index 325f735b09cd9..8d5787182abfe 100644
--- a/llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll
+++ b/llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll
@@ -1441,3 +1441,228 @@ define float @PR26515(<4 x float> %0) nounwind {
   %4 = extractelement <4 x float> %3, i64 0
   ret float %4
 }
+
+; llvm.fadd/fsub/fmul intrinsics lower to the same SSE/AVX scalar instructions
+; as plain fadd/fsub/fmul — the fp.control operand bundle only affects backends
+; that support per-instruction FTZ (e.g. NVPTX), not X86.
+
+define <4 x float> @test_fadd_ss_intrinsic(<4 x float> %a, <4 x float> %b) {
+; SSE-LABEL: test_fadd_ss_intrinsic:
+; SSE:       # %bb.0:
+; SSE-NEXT:    addss %xmm1, %xmm0
+; SSE-NEXT:    ret{{[l|q]}}
+;
+; AVX-LABEL: test_fadd_ss_intrinsic:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vaddss %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    ret{{[l|q]}}
+  %1 = extractelement <4 x float> %b, i32 0
+  %2 = extractelement <4 x float> %a, i32 0
+  %add = call float @llvm.fadd.f32(float %2, float %1)
+  %3 = insertelement <4 x float> %a, float %add, i32 0
+  ret <4 x float> %3
+}
+
+define <4 x float> @test_fsub_ss_intrinsic(<4 x float> %a, <4 x float> %b) {
+; SSE-LABEL: test_fsub_ss_intrinsic:
+; SSE:       # %bb.0:
+; SSE-NEXT:    subss %xmm1, %xmm0
+; SSE-NEXT:    ret{{[l|q]}}
+;
+; AVX-LABEL: test_fsub_ss_intrinsic:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vsubss %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    ret{{[l|q]}}
+  %1 = extractelement <4 x float> %b, i32 0
+  %2 = extractelement <4 x float> %a, i32 0
+  %sub = call float @llvm.fsub.f32(float %2, float %1)
+  %3 = insertelement <4 x float> %a, float %sub, i32 0
+  ret <4 x float> %3
+}
+
+define <4 x float> @test_fmul_ss_intrinsic(<4 x float> %a, <4 x float> %b) {
+; SSE-LABEL: test_fmul_ss_intrinsic:
+; SSE:       # %bb.0:
+; SSE-NEXT:    mulss %xmm1, %xmm0
+; SSE-NEXT:    ret{{[l|q]}}
+;
+; AVX-LABEL: test_fmul_ss_intrinsic:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vmulss %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    ret{{[l|q]}}
+  %1 = extractelement <4 x float> %b, i32 0
+  %2 = extractelement <4 x float> %a, i32 0
+  %mul = call float @llvm.fmul.f32(float %2, float %1)
+  %3 = insertelement <4 x float> %a, float %mul, i32 0
+  ret <4 x float> %3
+}
+
+define <2 x double> @test_fadd_sd_intrinsic(<2 x double> %a, <2 x double> %b) {
+; SSE-LABEL: test_fadd_sd_intrinsic:
+; SSE:       # %bb.0:
+; SSE-NEXT:    addsd %xmm1, %xmm0
+; SSE-NEXT:    ret{{[l|q]}}
+;
+; AVX-LABEL: test_fadd_sd_intrinsic:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vaddsd %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    ret{{[l|q]}}
+  %1 = extractelement <2 x double> %b, i32 0
+  %2 = extractelement <2 x double> %a, i32 0
+  %add = call double @llvm.fadd.f64(double %2, double %1)
+  %3 = insertelement <2 x double> %a, double %add, i32 0
+  ret <2 x double> %3
+}
+
+define <2 x double> @test_fsub_sd_intrinsic(<2 x double> %a, <2 x double> %b) {
+; SSE-LABEL: test_fsub_sd_intrinsic:
+; SSE:       # %bb.0:
+; SSE-NEXT:    subsd %xmm1, %xmm0
+; SSE-NEXT:    ret{{[l|q]}}
+;
+; AVX-LABEL: test_fsub_sd_intrinsic:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vsubsd %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    ret{{[l|q]}}
+  %1 = extractelement <2 x double> %b, i32 0
+  %2 = extractelement <2 x double> %a, i32 0
+  %sub = call double @llvm.fsub.f64(double %2, double %1)
+  %3 = insertelement <2 x double> %a, double %sub, i32 0
+  ret <2 x double> %3
+}
+
+define <2 x double> @test_fmul_sd_intrinsic(<2 x double> %a, <2 x double> %b) {
+; SSE-LABEL: test_fmul_sd_intrinsic:
+; SSE:       # %bb.0:
+; SSE-NEXT:    mulsd %xmm1, %xmm0
+; SSE-NEXT:    ret{{[l|q]}}
+;
+; AVX-LABEL: test_fmul_sd_intrinsic:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vmulsd %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    ret{{[l|q]}}
+  %1 = extractelement <2 x double> %b, i32 0
+  %2 = extractelement <2 x double> %a, i32 0
+  %mul = call double @llvm.fmul.f64(double %2, double %1)
+  %3 = insertelement <2 x double> %a, double %mul, i32 0
+  ret <2 x double> %3
+}
+
+declare float @llvm.fadd.f32(float, float)
+declare float @llvm.fsub.f32(float, float)
+declare float @llvm.fmul.f32(float, float)
+declare double @llvm.fadd.f64(double, double)
+declare double @llvm.fsub.f64(double, double)
+declare double @llvm.fmul.f64(double, double)
+declare float @llvm.fneg.f32(float)
+declare double @llvm.fneg.f64(double)
+
+; llvm.fneg intrinsic lowers to the same negation as plain 'fneg' IR —
+; the fp.control operand bundle only affects backends that support
+; per-instruction FTZ (e.g. NVPTX), not X86.
+define float @test_fneg_f32_intrinsic(float %a) {
+; X86-SSE-LABEL: test_fneg_f32_intrinsic:
+; X86-SSE:       # %bb.0:
+; X86-SSE-NEXT:    pushl %eax
+; X86-SSE-NEXT:    .cfi_def_cfa_offset 8
+; X86-SSE-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X86-SSE-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    movss %xmm0, (%esp)
+; X86-SSE-NEXT:    flds (%esp)
+; X86-SSE-NEXT:    popl %eax
+; X86-SSE-NEXT:    .cfi_def_cfa_offset 4
+; X86-SSE-NEXT:    retl
+;
+; X86-AVX1-LABEL: test_fneg_f32_intrinsic:
+; X86-AVX1:       # %bb.0:
+; X86-AVX1-NEXT:    pushl %eax
+; X86-AVX1-NEXT:    .cfi_def_cfa_offset 8
+; X86-AVX1-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; X86-AVX1-NEXT:    vxorps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; X86-AVX1-NEXT:    vmovss %xmm0, (%esp)
+; X86-AVX1-NEXT:    flds (%esp)
+; X86-AVX1-NEXT:    popl %eax
+; X86-AVX1-NEXT:    .cfi_def_cfa_offset 4
+; X86-AVX1-NEXT:    retl
+;
+; X86-AVX512-LABEL: test_fneg_f32_intrinsic:
+; X86-AVX512:       # %bb.0:
+; X86-AVX512-NEXT:    pushl %eax
+; X86-AVX512-NEXT:    .cfi_def_cfa_offset 8
+; X86-AVX512-NEXT:    vbroadcastss {{.*#+}} xmm0 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; X86-AVX512-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
+; X86-AVX512-NEXT:    vxorps %xmm0, %xmm1, %xmm0
+; X86-AVX512-NEXT:    vmovss %xmm0, (%esp)
+; X86-AVX512-NEXT:    flds (%esp)
+; X86-AVX512-NEXT:    popl %eax
+; X86-AVX512-NEXT:    .cfi_def_cfa_offset 4
+; X86-AVX512-NEXT:    retl
+;
+; X64-SSE-LABEL: test_fneg_f32_intrinsic:
+; X64-SSE:       # %bb.0:
+; X64-SSE-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-SSE-NEXT:    retq
+;
+; X64-AVX1-LABEL: test_fneg_f32_intrinsic:
+; X64-AVX1:       # %bb.0:
+; X64-AVX1-NEXT:    vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; X64-AVX1-NEXT:    retq
+;
+; X64-AVX512-LABEL: test_fneg_f32_intrinsic:
+; X64-AVX512:       # %bb.0:
+; X64-AVX512-NEXT:    vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; X64-AVX512-NEXT:    vxorps %xmm1, %xmm0, %xmm0
+; X64-AVX512-NEXT:    retq
+  %r = call float @llvm.fneg.f32(float %a)
+  ret float %r
+}
+
+define double @test_fneg_f64_intrinsic(double %a) {
+; X86-SSE-LABEL: test_fneg_f64_intrinsic:
+; X86-SSE:       # %bb.0:
+; X86-SSE-NEXT:    pushl %ebp
+; X86-SSE-NEXT:    .cfi_def_cfa_offset 8
+; X86-SSE-NEXT:    .cfi_offset %ebp, -8
+; X86-SSE-NEXT:    movl %esp, %ebp
+; X86-SSE-NEXT:    .cfi_def_cfa_register %ebp
+; X86-SSE-NEXT:    andl $-8, %esp
+; X86-SSE-NEXT:    subl $8, %esp
+; X86-SSE-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; X86-SSE-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; X86-SSE-NEXT:    movlps %xmm0, (%esp)
+; X86-SSE-NEXT:    fldl (%esp)
+; X86-SSE-NEXT:    movl %ebp, %esp
+; X86-SSE-NEXT:    popl %ebp
+; X86-SSE-NEXT:    .cfi_def_cfa %esp, 4
+; X86-SSE-NEXT:    retl
+;
+; X86-AVX-LABEL: test_fneg_f64_intrinsic:
+; X86-AVX:       # %bb.0:
+; X86-AVX-NEXT:    pushl %ebp
+; X86-AVX-NEXT:    .cfi_def_cfa_offset 8
+; X86-AVX-NEXT:    .cfi_offset %ebp, -8
+; X86-AVX-NEXT:    movl %esp, %ebp
+; X86-AVX-NEXT:    .cfi_def_cfa_register %ebp
+; X86-AVX-NEXT:    andl $-8, %esp
+; X86-AVX-NEXT:    subl $8, %esp
+; X86-AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; X86-AVX-NEXT:    vxorps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; X86-AVX-NEXT:    vmovlps %xmm0, (%esp)
+; X86-AVX-NEXT:    fldl (%esp)
+; X86-AVX-NEXT:    movl %ebp, %esp
+; X86-AVX-NEXT:    popl %ebp
+; X86-AVX-NEXT:    .cfi_def_cfa %esp, 4
+; X86-AVX-NEXT:    retl
+;
+; X64-SSE-LABEL: test_fneg_f64_intrinsic:
+; X64-SSE:       # %bb.0:
+; X64-SSE-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; X64-SSE-NEXT:    retq
+;
+; X64-AVX-LABEL: test_fneg_f64_intrinsic:
+; X64-AVX:       # %bb.0:
+; X64-AVX-NEXT:    vxorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; X64-AVX-NEXT:    retq
+  %r = call double @llvm.fneg.f64(double %a)
+  ret double %r
+}
diff --git a/llvm/test/CodeGen/X86/strict-fsub-combines.ll b/llvm/test/CodeGen/X86/strict-fsub-combines.ll
index 774ea02ccd87a..ce8eb35fa2706 100644
--- a/llvm/test/CodeGen/X86/strict-fsub-combines.ll
+++ b/llvm/test/CodeGen/X86/strict-fsub-combines.ll
@@ -8,9 +8,7 @@ define float @fneg_strict_fsub_to_strict_fadd(float %x, float %y) nounwind stric
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %eax
 ; X86-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; X86-NEXT:    movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; X86-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
-; X86-NEXT:    subss %xmm1, %xmm0
+; X86-NEXT:    addss {{[0-9]+}}(%esp), %xmm0
 ; X86-NEXT:    movss %xmm0, (%esp)
 ; X86-NEXT:    flds (%esp)
 ; X86-NEXT:    wait
@@ -19,8 +17,7 @@ define float @fneg_strict_fsub_to_strict_fadd(float %x, float %y) nounwind stric
 ;
 ; X64-LABEL: fneg_strict_fsub_to_strict_fadd:
 ; X64:       # %bb.0:
-; X64-NEXT:    xorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; X64-NEXT:    subss %xmm1, %xmm0
+; X64-NEXT:    addss %xmm1, %xmm0
 ; X64-NEXT:    retq
   %neg = fneg float %y
   %sub = call float @llvm.experimental.constrained.fsub.f32(float %x, float %neg, metadata!"round.dynamic", metadata!"fpexcept.strict")
@@ -36,9 +33,7 @@ define double @fneg_strict_fsub_to_strict_fadd_d(double %x, double %y) nounwind
 ; X86-NEXT:    andl $-8, %esp
 ; X86-NEXT:    subl $8, %esp
 ; X86-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; X86-NEXT:    movsd {{.*#+}} xmm1 = mem[0],zero
-; X86-NEXT:    xorpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
-; X86-NEXT:    subsd %xmm1, %xmm0
+; X86-NEXT:    addsd 16(%ebp), %xmm0
 ; X86-NEXT:    movsd %xmm0, (%esp)
 ; X86-NEXT:    fldl (%esp)
 ; X86-NEXT:    wait
@@ -48,8 +43,7 @@ define double @fneg_strict_fsub_to_strict_fadd_d(double %x, double %y) nounwind
 ;
 ; X64-LABEL: fneg_strict_fsub_to_strict_fadd_d:
 ; X64:       # %bb.0:
-; X64-NEXT:    xorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; X64-NEXT:    subsd %xmm1, %xmm0
+; X64-NEXT:    addsd %xmm1, %xmm0
 ; X64-NEXT:    retq
   %neg = fneg double %y
   %sub = call double @llvm.experimental.constrained.fsub.f64(double %x, double %neg, metadata!"round.dynamic", metadata!"fpexcept.strict")
diff --git a/llvm/test/CodeGen/X86/vec-strict-128-fp16.ll b/llvm/test/CodeGen/X86/vec-strict-128-fp16.ll
index 766ccdbada539..4455dc748ea93 100644
--- a/llvm/test/CodeGen/X86/vec-strict-128-fp16.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-128-fp16.ll
@@ -166,8 +166,6 @@ define <4 x float> @f18(<4 x float> %a0, <8 x half> %a1) #0 {
 define <2 x float> @f19(<2 x half> %a) #0 {
 ; CHECK-LABEL: f19:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
 ; CHECK-NEXT:    vcvtph2psx %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
   %ret = call <2 x float> @llvm.experimental.constrained.fpext.v2f32.v2f16(
@@ -190,7 +188,6 @@ define <4 x float> @f20(<4 x half> %a) #0 {
 define <2 x half> @f21(<2 x float> %a) #0 {
 ; CHECK-LABEL: f21:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvtps2phx %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
   %ret = call <2 x half> @llvm.experimental.constrained.fptrunc.v2f16.v2f32(
diff --git a/llvm/test/CodeGen/X86/vec-strict-fptoint-128-fp16.ll b/llvm/test/CodeGen/X86/vec-strict-fptoint-128-fp16.ll
index 6aad9a6d82c73..bd597eefa9644 100644
--- a/llvm/test/CodeGen/X86/vec-strict-fptoint-128-fp16.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-fptoint-128-fp16.ll
@@ -31,19 +31,14 @@ declare <8 x i1> @llvm.experimental.constrained.fptoui.v8i1.v8f16(<8 x half>, me
 define <2 x i64> @strict_vector_fptosi_v2f16_to_v2i64(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v2f16_to_v2i64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2qq %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v2f16_to_v2i64:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vcvttsh2si %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm1
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm0
-; NOVL-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
+; NOVL-NEXT:    vcvttph2qq %xmm0, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f16(<2 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -53,19 +48,14 @@ define <2 x i64> @strict_vector_fptosi_v2f16_to_v2i64(<2 x half> %a) #0 {
 define <2 x i64> @strict_vector_fptoui_v2f16_to_v2i64(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v2f16_to_v2i64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2uqq %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v2f16_to_v2i64:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vcvttsh2usi %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm1
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2usi %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm0
-; NOVL-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
+; NOVL-NEXT:    vcvttph2uqq %xmm0, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f16(<2 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -75,8 +65,6 @@ define <2 x i64> @strict_vector_fptoui_v2f16_to_v2i64(<2 x half> %a) #0 {
 define <2 x i32> @strict_vector_fptosi_v2f16_to_v2i32(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v2f16_to_v2i32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2dq %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
@@ -96,8 +84,6 @@ define <2 x i32> @strict_vector_fptosi_v2f16_to_v2i32(<2 x half> %a) #0 {
 define <2 x i32> @strict_vector_fptoui_v2f16_to_v2i32(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v2f16_to_v2i32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2udq %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
@@ -117,8 +103,6 @@ define <2 x i32> @strict_vector_fptoui_v2f16_to_v2i32(<2 x half> %a) #0 {
 define <2 x i16> @strict_vector_fptosi_v2f16_to_v2i16(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v2f16_to_v2i16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2w %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
@@ -139,8 +123,6 @@ define <2 x i16> @strict_vector_fptosi_v2f16_to_v2i16(<2 x half> %a) #0 {
 define <2 x i16> @strict_vector_fptoui_v2f16_to_v2i16(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v2f16_to_v2i16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2uw %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
@@ -161,8 +143,6 @@ define <2 x i16> @strict_vector_fptoui_v2f16_to_v2i16(<2 x half> %a) #0 {
 define <2 x i8> @strict_vector_fptosi_v2f16_to_v2i8(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v2f16_to_v2i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2w %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovwb %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
@@ -185,8 +165,6 @@ define <2 x i8> @strict_vector_fptosi_v2f16_to_v2i8(<2 x half> %a) #0 {
 define <2 x i8> @strict_vector_fptoui_v2f16_to_v2i8(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v2f16_to_v2i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2uw %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovwb %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
@@ -209,8 +187,6 @@ define <2 x i8> @strict_vector_fptoui_v2f16_to_v2i8(<2 x half> %a) #0 {
 define <2 x i1> @strict_vector_fptosi_v2f16_to_v2i1(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v2f16_to_v2i1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2w %xmm0, %xmm0
 ; CHECK-NEXT:    vpsllw $15, %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovw2m %xmm0, %k1
@@ -220,14 +196,9 @@ define <2 x i1> @strict_vector_fptosi_v2f16_to_v2i1(<2 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v2f16_to_v2i1:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    andl $1, %eax
-; NOVL-NEXT:    kmovw %eax, %k0
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $1, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k1
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
+; NOVL-NEXT:    vptestmd %zmm0, %zmm0, %k1
 ; NOVL-NEXT:    vpternlogq {{.*#+}} zmm0 {%k1} {z} = -1
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; NOVL-NEXT:    vzeroupper
@@ -240,8 +211,6 @@ define <2 x i1> @strict_vector_fptosi_v2f16_to_v2i1(<2 x half> %a) #0 {
 define <2 x i1> @strict_vector_fptoui_v2f16_to_v2i1(<2 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v2f16_to_v2i1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
 ; CHECK-NEXT:    vcvttph2uw %xmm0, %xmm0
 ; CHECK-NEXT:    vpsllw $15, %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovw2m %xmm0, %k1
@@ -251,14 +220,10 @@ define <2 x i1> @strict_vector_fptoui_v2f16_to_v2i1(<2 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v2f16_to_v2i1:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    andl $1, %eax
-; NOVL-NEXT:    kmovw %eax, %k0
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $1, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k1
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
+; NOVL-NEXT:    vpslld $31, %ymm0, %ymm0
+; NOVL-NEXT:    vptestmd %zmm0, %zmm0, %k1
 ; NOVL-NEXT:    vpternlogq {{.*#+}} zmm0 {%k1} {z} = -1
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; NOVL-NEXT:    vzeroupper
@@ -271,23 +236,15 @@ define <2 x i1> @strict_vector_fptoui_v2f16_to_v2i1(<2 x half> %a) #0 {
 define <4 x i32> @strict_vector_fptosi_v4f16_to_v4i32(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v4f16_to_v4i32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2dq %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v4f16_to_v4i32:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    vcvttsh2si %xmm0, %ecx
-; NOVL-NEXT:    vmovd %ecx, %xmm1
-; NOVL-NEXT:    vpinsrd $1, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm2, %eax
-; NOVL-NEXT:    vpinsrd $2, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    vpinsrd $3, %eax, %xmm1, %xmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -297,23 +254,15 @@ define <4 x i32> @strict_vector_fptosi_v4f16_to_v4i32(<4 x half> %a) #0 {
 define <4 x i32> @strict_vector_fptoui_v4f16_to_v4i32(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v4f16_to_v4i32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2udq %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v4f16_to_v4i32:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2usi %xmm1, %eax
-; NOVL-NEXT:    vcvttsh2usi %xmm0, %ecx
-; NOVL-NEXT:    vmovd %ecx, %xmm1
-; NOVL-NEXT:    vpinsrd $1, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2usi %xmm2, %eax
-; NOVL-NEXT:    vpinsrd $2, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2usi %xmm0, %eax
-; NOVL-NEXT:    vpinsrd $3, %eax, %xmm1, %xmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2udq %ymm0, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -323,24 +272,15 @@ define <4 x i32> @strict_vector_fptoui_v4f16_to_v4i32(<4 x half> %a) #0 {
 define <4 x i16> @strict_vector_fptosi_v4f16_to_v4i16(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v4f16_to_v4i16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2w %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v4f16_to_v4i16:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    vcvttsh2si %xmm0, %ecx
-; NOVL-NEXT:    vmovd %ecx, %xmm1
-; NOVL-NEXT:    vpinsrd $1, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm2, %eax
-; NOVL-NEXT:    vpinsrd $2, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    vpinsrd $3, %eax, %xmm1, %xmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <4 x i16> @llvm.experimental.constrained.fptosi.v4i16.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -350,24 +290,15 @@ define <4 x i16> @strict_vector_fptosi_v4f16_to_v4i16(<4 x half> %a) #0 {
 define <4 x i16> @strict_vector_fptoui_v4f16_to_v4i16(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v4f16_to_v4i16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2uw %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v4f16_to_v4i16:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    vcvttsh2si %xmm0, %ecx
-; NOVL-NEXT:    vmovd %ecx, %xmm1
-; NOVL-NEXT:    vpinsrd $1, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm2, %eax
-; NOVL-NEXT:    vpinsrd $2, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    vpinsrd $3, %eax, %xmm1, %xmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <4 x i16> @llvm.experimental.constrained.fptoui.v4i16.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -377,26 +308,17 @@ define <4 x i16> @strict_vector_fptoui_v4f16_to_v4i16(<4 x half> %a) #0 {
 define <4 x i8> @strict_vector_fptosi_v4f16_to_v4i8(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v4f16_to_v4i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2w %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovwb %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v4f16_to_v4i8:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    vcvttsh2si %xmm0, %ecx
-; NOVL-NEXT:    vmovd %ecx, %xmm1
-; NOVL-NEXT:    vpinsrd $1, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm2, %eax
-; NOVL-NEXT:    vpinsrd $2, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    vpinsrd $3, %eax, %xmm1, %xmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; NOVL-NEXT:    vpacksswb %xmm0, %xmm0, %xmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <4 x i8> @llvm.experimental.constrained.fptosi.v4i8.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -406,26 +328,17 @@ define <4 x i8> @strict_vector_fptosi_v4f16_to_v4i8(<4 x half> %a) #0 {
 define <4 x i8> @strict_vector_fptoui_v4f16_to_v4i8(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v4f16_to_v4i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2uw %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovwb %xmm0, %xmm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v4f16_to_v4i8:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    vcvttsh2si %xmm0, %ecx
-; NOVL-NEXT:    vmovd %ecx, %xmm1
-; NOVL-NEXT:    vpinsrd $1, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm2, %eax
-; NOVL-NEXT:    vpinsrd $2, %eax, %xmm1, %xmm1
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    vpinsrd $3, %eax, %xmm1, %xmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; NOVL-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
+; NOVL-NEXT:    vzeroupper
 ; NOVL-NEXT:    retq
   %ret = call <4 x i8> @llvm.experimental.constrained.fptoui.v4i8.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -435,7 +348,6 @@ define <4 x i8> @strict_vector_fptoui_v4f16_to_v4i8(<4 x half> %a) #0 {
 define <4 x i1> @strict_vector_fptosi_v4f16_to_v4i1(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v4f16_to_v4i1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2w %xmm0, %xmm0
 ; CHECK-NEXT:    vpsllw $15, %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovw2m %xmm0, %k1
@@ -445,30 +357,9 @@ define <4 x i1> @strict_vector_fptosi_v4f16_to_v4i1(<4 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v4f16_to_v4i1:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    andl $1, %eax
-; NOVL-NEXT:    kmovw %eax, %k0
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $15, %k1, %k1
-; NOVL-NEXT:    kshiftrw $14, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k0
-; NOVL-NEXT:    movw $-5, %ax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kandw %k1, %k0, %k0
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $2, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k0
-; NOVL-NEXT:    kshiftlw $13, %k0, %k0
-; NOVL-NEXT:    kshiftrw $13, %k0, %k0
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $3, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k1
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
+; NOVL-NEXT:    vptestmd %zmm0, %zmm0, %k1
 ; NOVL-NEXT:    vpternlogd {{.*#+}} zmm0 {%k1} {z} = -1
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; NOVL-NEXT:    vzeroupper
@@ -481,7 +372,6 @@ define <4 x i1> @strict_vector_fptosi_v4f16_to_v4i1(<4 x half> %a) #0 {
 define <4 x i1> @strict_vector_fptoui_v4f16_to_v4i1(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v4f16_to_v4i1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2uw %xmm0, %xmm0
 ; CHECK-NEXT:    vpsllw $15, %xmm0, %xmm0
 ; CHECK-NEXT:    vpmovw2m %xmm0, %k1
@@ -491,30 +381,10 @@ define <4 x i1> @strict_vector_fptoui_v4f16_to_v4i1(<4 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v4f16_to_v4i1:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    andl $1, %eax
-; NOVL-NEXT:    kmovw %eax, %k0
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $15, %k1, %k1
-; NOVL-NEXT:    kshiftrw $14, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k0
-; NOVL-NEXT:    movw $-5, %ax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kandw %k1, %k0, %k0
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm1, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $2, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k0
-; NOVL-NEXT:    kshiftlw $13, %k0, %k0
-; NOVL-NEXT:    kshiftrw $13, %k0, %k0
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %eax
-; NOVL-NEXT:    kmovd %eax, %k1
-; NOVL-NEXT:    kshiftlw $3, %k1, %k1
-; NOVL-NEXT:    korw %k1, %k0, %k1
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
+; NOVL-NEXT:    vpslld $31, %ymm0, %ymm0
+; NOVL-NEXT:    vptestmd %zmm0, %zmm0, %k1
 ; NOVL-NEXT:    vpternlogd {{.*#+}} zmm0 {%k1} {z} = -1
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; NOVL-NEXT:    vzeroupper
@@ -532,8 +402,7 @@ define <8 x i16> @strict_vector_fptosi_v8f16_to_v8i16(<8 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v8f16_to_v8i16:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vinsertf32x4 $0, %xmm0, %zmm1, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; NOVL-NEXT:    vcvttph2w %zmm0, %zmm0
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; NOVL-NEXT:    vzeroupper
@@ -551,8 +420,7 @@ define <8 x i16> @strict_vector_fptoui_v8f16_to_v8i16(<8 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v8f16_to_v8i16:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vinsertf32x4 $0, %xmm0, %zmm1, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; NOVL-NEXT:    vcvttph2uw %zmm0, %zmm0
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; NOVL-NEXT:    vzeroupper
@@ -571,8 +439,7 @@ define <8 x i8> @strict_vector_fptosi_v8f16_to_v8i8(<8 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v8f16_to_v8i8:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vinsertf32x4 $0, %xmm0, %zmm1, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; NOVL-NEXT:    vcvttph2w %zmm0, %zmm0
 ; NOVL-NEXT:    vpacksswb %xmm0, %xmm0, %xmm0
 ; NOVL-NEXT:    vzeroupper
@@ -591,8 +458,7 @@ define <8 x i8> @strict_vector_fptoui_v8f16_to_v8i8(<8 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v8f16_to_v8i8:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vinsertf32x4 $0, %xmm0, %zmm1, %zmm0
+; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; NOVL-NEXT:    vcvttph2w %zmm0, %zmm0
 ; NOVL-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
 ; NOVL-NEXT:    vzeroupper
@@ -614,8 +480,6 @@ define <8 x i1> @strict_vector_fptosi_v8f16_to_v8i1(<8 x half> %a) #0 {
 ; NOVL-LABEL: strict_vector_fptosi_v8f16_to_v8i1:
 ; NOVL:       # %bb.0:
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
 ; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    vptestmd %zmm0, %zmm0, %k0
 ; NOVL-NEXT:    vpmovm2w %k0, %zmm0
@@ -640,8 +504,6 @@ define <8 x i1> @strict_vector_fptoui_v8f16_to_v8i1(<8 x half> %a) #0 {
 ; NOVL-LABEL: strict_vector_fptoui_v8f16_to_v8i1:
 ; NOVL:       # %bb.0:
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
 ; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    vpslld $31, %ymm0, %ymm0
 ; NOVL-NEXT:    vptestmd %zmm0, %zmm0, %k0
diff --git a/llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll b/llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
index 48a0b27a207f3..db978c1ede1eb 100644
--- a/llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
@@ -197,7 +197,7 @@ define <2 x i64> @strict_vector_fptosi_v2f64_to_v2i64(<2 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v2f64_to_v2i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2qq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -223,7 +223,7 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; SSE-32-NEXT:    andl $-8, %esp
 ; SSE-32-NEXT:    subl $24, %esp
 ; SSE-32-NEXT:    movsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; SSE-32-NEXT:    comisd %xmm1, %xmm0
+; SSE-32-NEXT:    ucomisd %xmm1, %xmm0
 ; SSE-32-NEXT:    movapd %xmm1, %xmm2
 ; SSE-32-NEXT:    jae .LBB1_2
 ; SSE-32-NEXT:  # %bb.1:
@@ -243,7 +243,7 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; SSE-32-NEXT:    fistpll {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    fldcw {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
-; SSE-32-NEXT:    comisd %xmm1, %xmm0
+; SSE-32-NEXT:    ucomisd %xmm1, %xmm0
 ; SSE-32-NEXT:    jae .LBB1_4
 ; SSE-32-NEXT:  # %bb.3:
 ; SSE-32-NEXT:    xorpd %xmm1, %xmm1
@@ -280,35 +280,25 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f64_to_v2i64:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movsd {{.*#+}} xmm3 = [9.2233720368547758E+18,0.0E+0]
-; SSE-64-NEXT:    comisd %xmm3, %xmm0
-; SSE-64-NEXT:    xorpd %xmm2, %xmm2
-; SSE-64-NEXT:    xorpd %xmm1, %xmm1
-; SSE-64-NEXT:    jb .LBB1_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    movapd %xmm3, %xmm1
-; SSE-64-NEXT:  .LBB1_2:
-; SSE-64-NEXT:    movapd %xmm0, %xmm4
-; SSE-64-NEXT:    subsd %xmm1, %xmm4
-; SSE-64-NEXT:    cvttsd2si %xmm4, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm1
+; SSE-64-NEXT:    movsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
+; SSE-64-NEXT:    movapd %xmm0, %xmm1
+; SSE-64-NEXT:    subsd %xmm2, %xmm1
+; SSE-64-NEXT:    cvttsd2si %xmm1, %rax
+; SSE-64-NEXT:    cvttsd2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rcx, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rax, %rdx
+; SSE-64-NEXT:    orq %rcx, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm1
 ; SSE-64-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
-; SSE-64-NEXT:    comisd %xmm3, %xmm0
-; SSE-64-NEXT:    jb .LBB1_4
-; SSE-64-NEXT:  # %bb.3:
-; SSE-64-NEXT:    movapd %xmm3, %xmm2
-; SSE-64-NEXT:  .LBB1_4:
-; SSE-64-NEXT:    subsd %xmm2, %xmm0
 ; SSE-64-NEXT:    cvttsd2si %xmm0, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm0
+; SSE-64-NEXT:    subsd %xmm2, %xmm0
+; SSE-64-NEXT:    cvttsd2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rax, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rcx, %rdx
+; SSE-64-NEXT:    orq %rax, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm0
 ; SSE-64-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; SSE-64-NEXT:    movdqa %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
@@ -324,7 +314,7 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; AVX-32-NEXT:    subl $16, %esp
 ; AVX-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX-32-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovapd %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB1_2
 ; AVX-32-NEXT:  # %bb.1:
@@ -339,7 +329,7 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; AVX-32-NEXT:    movzbl %al, %eax
 ; AVX-32-NEXT:    shll $31, %eax
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm0
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB1_4
 ; AVX-32-NEXT:  # %bb.3:
 ; AVX-32-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
@@ -365,34 +355,24 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; AVX-64-LABEL: strict_vector_fptoui_v2f64_to_v2i64:
 ; AVX-64:       # %bb.0:
 ; AVX-64-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX-64-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorpd %xmm3, %xmm3, %xmm3
-; AVX-64-NEXT:    jb .LBB1_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm3
-; AVX-64-NEXT:  .LBB1_2:
-; AVX-64-NEXT:    vsubsd %xmm3, %xmm0, %xmm3
-; AVX-64-NEXT:    vcvttsd2si %xmm3, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm3
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm0, %xmm2
+; AVX-64-NEXT:    vcvttsd2si %xmm2, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm2
 ; AVX-64-NEXT:    vshufpd {{.*#+}} xmm0 = xmm0[1,0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB1_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB1_4:
-; AVX-64-NEXT:    vsubsd %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttsd2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm0, %xmm1
+; AVX-64-NEXT:    vcvttsd2si %xmm1, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-32-LABEL: strict_vector_fptoui_v2f64_to_v2i64:
@@ -407,14 +387,14 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; AVX512F-32-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
 ; AVX512F-32-NEXT:    vmovsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
 ; AVX512F-32-NEXT:    xorl %eax, %eax
-; AVX512F-32-NEXT:    vcomisd %xmm2, %xmm1
+; AVX512F-32-NEXT:    vucomisd %xmm2, %xmm1
 ; AVX512F-32-NEXT:    setae %al
 ; AVX512F-32-NEXT:    kmovw %eax, %k1
 ; AVX512F-32-NEXT:    vmovsd %xmm2, %xmm2, %xmm3 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubsd %xmm3, %xmm1, %xmm1
 ; AVX512F-32-NEXT:    vmovsd %xmm1, (%esp)
 ; AVX512F-32-NEXT:    xorl %ecx, %ecx
-; AVX512F-32-NEXT:    vcomisd %xmm2, %xmm0
+; AVX512F-32-NEXT:    vucomisd %xmm2, %xmm0
 ; AVX512F-32-NEXT:    setae %cl
 ; AVX512F-32-NEXT:    kmovw %ecx, %k1
 ; AVX512F-32-NEXT:    vmovsd %xmm2, %xmm2, %xmm1 {%k1} {z}
@@ -460,14 +440,14 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
 ; AVX512VL-32-NEXT:    vmovsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm2, %xmm1
+; AVX512VL-32-NEXT:    vucomisd %xmm2, %xmm1
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, %xmm2, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm3, %xmm1, %xmm1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, (%esp)
 ; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomisd %xmm2, %xmm0
+; AVX512VL-32-NEXT:    vucomisd %xmm2, %xmm0
 ; AVX512VL-32-NEXT:    setae %cl
 ; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, %xmm2, %xmm1 {%k1} {z}
@@ -503,7 +483,7 @@ define <2 x i64> @strict_vector_fptoui_v2f64_to_v2i64(<2 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f64_to_v2i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2uqq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -576,17 +556,17 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX-32-NEXT:    movl %esp, %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
+; AVX-32-NEXT:    subl $32, %esp
+; AVX-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    flds (%esp)
+; AVX-32-NEXT:    fisttpll (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
@@ -611,17 +591,17 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX512F-32-NEXT:    movl %esp, %ebp
 ; AVX512F-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512F-32-NEXT:    andl $-8, %esp
-; AVX512F-32-NEXT:    subl $16, %esp
-; AVX512F-32-NEXT:    vmovd %xmm0, (%esp)
-; AVX512F-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX512F-32-NEXT:    flds (%esp)
-; AVX512F-32-NEXT:    fisttpll (%esp)
+; AVX512F-32-NEXT:    subl $32, %esp
+; AVX512F-32-NEXT:    vmovd %xmm0, {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX512F-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    flds (%esp)
+; AVX512F-32-NEXT:    fisttpll (%esp)
 ; AVX512F-32-NEXT:    wait
 ; AVX512F-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512F-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX512F-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512F-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512F-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX512F-32-NEXT:    movl %ebp, %esp
 ; AVX512F-32-NEXT:    popl %ebp
@@ -646,17 +626,17 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    movl %esp, %ebp
 ; AVX512VL-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512VL-32-NEXT:    andl $-8, %esp
-; AVX512VL-32-NEXT:    subl $16, %esp
-; AVX512VL-32-NEXT:    vmovd %xmm0, (%esp)
-; AVX512VL-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    flds (%esp)
-; AVX512VL-32-NEXT:    fisttpll (%esp)
+; AVX512VL-32-NEXT:    subl $32, %esp
+; AVX512VL-32-NEXT:    vmovd %xmm0, {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    flds (%esp)
+; AVX512VL-32-NEXT:    fisttpll (%esp)
 ; AVX512VL-32-NEXT:    wait
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512VL-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    movl %ebp, %esp
 ; AVX512VL-32-NEXT:    popl %ebp
@@ -675,7 +655,7 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64(<2 x float> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v2f32_to_v2i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512DQ-NEXT:    vcvttps2qq %ymm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -750,19 +730,19 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX-32-NEXT:    movl %esp, %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
+; AVX-32-NEXT:    subl $32, %esp
 ; AVX-32-NEXT:    movl 8(%ebp), %eax
 ; AVX-32-NEXT:    vmovaps (%eax), %xmm0
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
+; AVX-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    flds (%esp)
+; AVX-32-NEXT:    fisttpll (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
@@ -786,19 +766,19 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512F-32-NEXT:    movl %esp, %ebp
 ; AVX512F-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512F-32-NEXT:    andl $-8, %esp
-; AVX512F-32-NEXT:    subl $16, %esp
+; AVX512F-32-NEXT:    subl $32, %esp
 ; AVX512F-32-NEXT:    movl 8(%ebp), %eax
 ; AVX512F-32-NEXT:    vmovdqa (%eax), %xmm0
-; AVX512F-32-NEXT:    vmovd %xmm0, (%esp)
-; AVX512F-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX512F-32-NEXT:    flds (%esp)
-; AVX512F-32-NEXT:    fisttpll (%esp)
+; AVX512F-32-NEXT:    vmovd %xmm0, {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX512F-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    flds (%esp)
+; AVX512F-32-NEXT:    fisttpll (%esp)
 ; AVX512F-32-NEXT:    wait
 ; AVX512F-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512F-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX512F-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512F-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512F-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX512F-32-NEXT:    movl %ebp, %esp
 ; AVX512F-32-NEXT:    popl %ebp
@@ -822,19 +802,19 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512VL-32-NEXT:    movl %esp, %ebp
 ; AVX512VL-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512VL-32-NEXT:    andl $-8, %esp
-; AVX512VL-32-NEXT:    subl $16, %esp
+; AVX512VL-32-NEXT:    subl $32, %esp
 ; AVX512VL-32-NEXT:    movl 8(%ebp), %eax
 ; AVX512VL-32-NEXT:    vmovdqa (%eax), %xmm0
-; AVX512VL-32-NEXT:    vmovd %xmm0, (%esp)
-; AVX512VL-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    flds (%esp)
-; AVX512VL-32-NEXT:    fisttpll (%esp)
+; AVX512VL-32-NEXT:    vmovd %xmm0, {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    flds (%esp)
+; AVX512VL-32-NEXT:    fisttpll (%esp)
 ; AVX512VL-32-NEXT:    wait
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512VL-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    movl %ebp, %esp
 ; AVX512VL-32-NEXT:    popl %ebp
@@ -853,7 +833,7 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512DQ-32-LABEL: strict_vector_fptosi_v2f32_to_v2i64_load128:
 ; AVX512DQ-32:       # %bb.0:
 ; AVX512DQ-32-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; AVX512DQ-32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX512DQ-32-NEXT:    vmovaps (%eax), %xmm0
 ; AVX512DQ-32-NEXT:    vcvttps2qq %ymm0, %zmm0
 ; AVX512DQ-32-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-32-NEXT:    vzeroupper
@@ -861,7 +841,7 @@ define <2 x i64> @strict_vector_fptosi_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ;
 ; AVX512DQ-64-LABEL: strict_vector_fptosi_v2f32_to_v2i64_load128:
 ; AVX512DQ-64:       # %bb.0:
-; AVX512DQ-64-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX512DQ-64-NEXT:    vmovaps (%rdi), %xmm0
 ; AVX512DQ-64-NEXT:    vcvttps2qq %ymm0, %zmm0
 ; AVX512DQ-64-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-64-NEXT:    vzeroupper
@@ -894,7 +874,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; SSE-32-NEXT:    andl $-8, %esp
 ; SSE-32-NEXT:    subl $24, %esp
 ; SSE-32-NEXT:    movss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-32-NEXT:    comiss %xmm1, %xmm0
+; SSE-32-NEXT:    ucomiss %xmm1, %xmm0
 ; SSE-32-NEXT:    movaps %xmm1, %xmm2
 ; SSE-32-NEXT:    jae .LBB4_2
 ; SSE-32-NEXT:  # %bb.1:
@@ -914,7 +894,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; SSE-32-NEXT:    fistpll {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    fldcw {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-32-NEXT:    comiss %xmm1, %xmm0
+; SSE-32-NEXT:    ucomiss %xmm1, %xmm0
 ; SSE-32-NEXT:    jae .LBB4_4
 ; SSE-32-NEXT:  # %bb.3:
 ; SSE-32-NEXT:    xorps %xmm1, %xmm1
@@ -951,35 +931,25 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f32_to_v2i64:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movss {{.*#+}} xmm3 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-64-NEXT:    comiss %xmm3, %xmm0
-; SSE-64-NEXT:    xorps %xmm2, %xmm2
-; SSE-64-NEXT:    xorps %xmm1, %xmm1
-; SSE-64-NEXT:    jb .LBB4_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    movaps %xmm3, %xmm1
-; SSE-64-NEXT:  .LBB4_2:
-; SSE-64-NEXT:    movaps %xmm0, %xmm4
-; SSE-64-NEXT:    subss %xmm1, %xmm4
-; SSE-64-NEXT:    cvttss2si %xmm4, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm1
+; SSE-64-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
+; SSE-64-NEXT:    movaps %xmm0, %xmm1
+; SSE-64-NEXT:    subss %xmm2, %xmm1
+; SSE-64-NEXT:    cvttss2si %xmm1, %rax
+; SSE-64-NEXT:    cvttss2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rcx, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rax, %rdx
+; SSE-64-NEXT:    orq %rcx, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm1
 ; SSE-64-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-64-NEXT:    comiss %xmm3, %xmm0
-; SSE-64-NEXT:    jb .LBB4_4
-; SSE-64-NEXT:  # %bb.3:
-; SSE-64-NEXT:    movaps %xmm3, %xmm2
-; SSE-64-NEXT:  .LBB4_4:
-; SSE-64-NEXT:    subss %xmm2, %xmm0
 ; SSE-64-NEXT:    cvttss2si %xmm0, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm0
+; SSE-64-NEXT:    subss %xmm2, %xmm0
+; SSE-64-NEXT:    cvttss2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rax, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rcx, %rdx
+; SSE-64-NEXT:    orq %rax, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm0
 ; SSE-64-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; SSE-64-NEXT:    movdqa %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
@@ -992,33 +962,33 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX-32-NEXT:    movl %esp, %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
+; AVX-32-NEXT:    subl $32, %esp
 ; AVX-32-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovaps %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB4_2
 ; AVX-32-NEXT:  # %bb.1:
 ; AVX-32-NEXT:    vxorps %xmm3, %xmm3, %xmm3
 ; AVX-32-NEXT:  .LBB4_2:
 ; AVX-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    vmovss %xmm2, (%esp)
+; AVX-32-NEXT:    flds (%esp)
+; AVX-32-NEXT:    fisttpll (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    setae %al
 ; AVX-32-NEXT:    movzbl %al, %eax
 ; AVX-32-NEXT:    shll $31, %eax
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm0
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB4_4
 ; AVX-32-NEXT:  # %bb.3:
 ; AVX-32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
 ; AVX-32-NEXT:  .LBB4_4:
 ; AVX-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
+; AVX-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    setae %cl
 ; AVX-32-NEXT:    movzbl %cl, %ecx
@@ -1026,7 +996,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
@@ -1036,34 +1006,24 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX-64-LABEL: strict_vector_fptoui_v2f32_to_v2i64:
 ; AVX-64:       # %bb.0:
 ; AVX-64-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX-64-NEXT:    jb .LBB4_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm3
-; AVX-64-NEXT:  .LBB4_2:
-; AVX-64-NEXT:    vsubss %xmm3, %xmm0, %xmm3
-; AVX-64-NEXT:    vcvttss2si %xmm3, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm3
+; AVX-64-NEXT:    vsubss %xmm1, %xmm0, %xmm2
+; AVX-64-NEXT:    vcvttss2si %xmm2, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm2
 ; AVX-64-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB4_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB4_4:
-; AVX-64-NEXT:    vsubss %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttss2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
+; AVX-64-NEXT:    vsubss %xmm1, %xmm0, %xmm1
+; AVX-64-NEXT:    vcvttss2si %xmm1, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-32-LABEL: strict_vector_fptoui_v2f32_to_v2i64:
@@ -1074,27 +1034,27 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX512F-32-NEXT:    movl %esp, %ebp
 ; AVX512F-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512F-32-NEXT:    andl $-8, %esp
-; AVX512F-32-NEXT:    subl $16, %esp
+; AVX512F-32-NEXT:    subl $32, %esp
 ; AVX512F-32-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
 ; AVX512F-32-NEXT:    vmovss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512F-32-NEXT:    xorl %eax, %eax
-; AVX512F-32-NEXT:    vcomiss %xmm2, %xmm1
+; AVX512F-32-NEXT:    vucomiss %xmm2, %xmm1
 ; AVX512F-32-NEXT:    setae %al
 ; AVX512F-32-NEXT:    kmovw %eax, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm2, %xmm2, %xmm3 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubss %xmm3, %xmm1, %xmm1
-; AVX512F-32-NEXT:    vmovss %xmm1, {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    vmovss %xmm1, (%esp)
 ; AVX512F-32-NEXT:    xorl %ecx, %ecx
-; AVX512F-32-NEXT:    vcomiss %xmm2, %xmm0
+; AVX512F-32-NEXT:    vucomiss %xmm2, %xmm0
 ; AVX512F-32-NEXT:    setae %cl
 ; AVX512F-32-NEXT:    kmovw %ecx, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm2, %xmm2, %xmm1 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX512F-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX512F-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX512F-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    flds (%esp)
 ; AVX512F-32-NEXT:    fisttpll (%esp)
+; AVX512F-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    wait
 ; AVX512F-32-NEXT:    shll $31, %eax
 ; AVX512F-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
@@ -1102,7 +1062,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX512F-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512F-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512F-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX512F-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512F-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512F-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX512F-32-NEXT:    movl %ebp, %esp
 ; AVX512F-32-NEXT:    popl %ebp
@@ -1127,27 +1087,27 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    movl %esp, %ebp
 ; AVX512VL-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512VL-32-NEXT:    andl $-8, %esp
-; AVX512VL-32-NEXT:    subl $16, %esp
+; AVX512VL-32-NEXT:    subl $32, %esp
 ; AVX512VL-32-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
 ; AVX512VL-32-NEXT:    vmovss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm2, %xmm1
+; AVX512VL-32-NEXT:    vucomiss %xmm2, %xmm1
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm2, %xmm2, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm1, %xmm1
-; AVX512VL-32-NEXT:    vmovss %xmm1, {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    vmovss %xmm1, (%esp)
 ; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomiss %xmm2, %xmm0
+; AVX512VL-32-NEXT:    vucomiss %xmm2, %xmm0
 ; AVX512VL-32-NEXT:    setae %cl
 ; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm2, %xmm2, %xmm1 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX512VL-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    flds (%esp)
 ; AVX512VL-32-NEXT:    fisttpll (%esp)
+; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    wait
 ; AVX512VL-32-NEXT:    shll $31, %eax
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
@@ -1155,7 +1115,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512VL-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    movl %ebp, %esp
 ; AVX512VL-32-NEXT:    popl %ebp
@@ -1174,7 +1134,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64(<2 x float> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f32_to_v2i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512DQ-NEXT:    vcvttps2uqq %ymm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1202,7 +1162,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; SSE-32-NEXT:    movl 8(%ebp), %eax
 ; SSE-32-NEXT:    movaps (%eax), %xmm0
 ; SSE-32-NEXT:    movss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-32-NEXT:    comiss %xmm1, %xmm0
+; SSE-32-NEXT:    ucomiss %xmm1, %xmm0
 ; SSE-32-NEXT:    movaps %xmm1, %xmm2
 ; SSE-32-NEXT:    jae .LBB5_2
 ; SSE-32-NEXT:  # %bb.1:
@@ -1222,7 +1182,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; SSE-32-NEXT:    fistpll {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    fldcw {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-32-NEXT:    comiss %xmm1, %xmm0
+; SSE-32-NEXT:    ucomiss %xmm1, %xmm0
 ; SSE-32-NEXT:    jae .LBB5_4
 ; SSE-32-NEXT:  # %bb.3:
 ; SSE-32-NEXT:    xorps %xmm1, %xmm1
@@ -1260,35 +1220,25 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; SSE-64-LABEL: strict_vector_fptoui_v2f32_to_v2i64_load128:
 ; SSE-64:       # %bb.0:
 ; SSE-64-NEXT:    movaps (%rdi), %xmm1
-; SSE-64-NEXT:    movss {{.*#+}} xmm3 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-64-NEXT:    comiss %xmm3, %xmm1
-; SSE-64-NEXT:    xorps %xmm2, %xmm2
-; SSE-64-NEXT:    xorps %xmm0, %xmm0
-; SSE-64-NEXT:    jb .LBB5_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    movaps %xmm3, %xmm0
-; SSE-64-NEXT:  .LBB5_2:
-; SSE-64-NEXT:    movaps %xmm1, %xmm4
-; SSE-64-NEXT:    subss %xmm0, %xmm4
-; SSE-64-NEXT:    cvttss2si %xmm4, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm0
+; SSE-64-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
+; SSE-64-NEXT:    movaps %xmm1, %xmm0
+; SSE-64-NEXT:    subss %xmm2, %xmm0
+; SSE-64-NEXT:    cvttss2si %xmm0, %rax
+; SSE-64-NEXT:    cvttss2si %xmm1, %rcx
+; SSE-64-NEXT:    movq %rcx, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rax, %rdx
+; SSE-64-NEXT:    orq %rcx, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm0
 ; SSE-64-NEXT:    shufps {{.*#+}} xmm1 = xmm1[1,1,1,1]
-; SSE-64-NEXT:    comiss %xmm3, %xmm1
-; SSE-64-NEXT:    jb .LBB5_4
-; SSE-64-NEXT:  # %bb.3:
-; SSE-64-NEXT:    movaps %xmm3, %xmm2
-; SSE-64-NEXT:  .LBB5_4:
-; SSE-64-NEXT:    subss %xmm2, %xmm1
 ; SSE-64-NEXT:    cvttss2si %xmm1, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm1
+; SSE-64-NEXT:    subss %xmm2, %xmm1
+; SSE-64-NEXT:    cvttss2si %xmm1, %rcx
+; SSE-64-NEXT:    movq %rax, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rcx, %rdx
+; SSE-64-NEXT:    orq %rax, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm1
 ; SSE-64-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
 ; SSE-64-NEXT:    retq
 ;
@@ -1300,35 +1250,35 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX-32-NEXT:    movl %esp, %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
+; AVX-32-NEXT:    subl $32, %esp
 ; AVX-32-NEXT:    movl 8(%ebp), %eax
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovaps %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB5_2
 ; AVX-32-NEXT:  # %bb.1:
 ; AVX-32-NEXT:    vxorps %xmm3, %xmm3, %xmm3
 ; AVX-32-NEXT:  .LBB5_2:
 ; AVX-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    vmovss %xmm2, (%esp)
+; AVX-32-NEXT:    flds (%esp)
+; AVX-32-NEXT:    fisttpll (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    setae %al
 ; AVX-32-NEXT:    movzbl %al, %eax
 ; AVX-32-NEXT:    shll $31, %eax
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm0
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB5_4
 ; AVX-32-NEXT:  # %bb.3:
 ; AVX-32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
 ; AVX-32-NEXT:  .LBB5_4:
 ; AVX-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
+; AVX-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    setae %cl
 ; AVX-32-NEXT:    movzbl %cl, %ecx
@@ -1336,7 +1286,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
@@ -1345,36 +1295,26 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ;
 ; AVX-64-LABEL: strict_vector_fptoui_v2f32_to_v2i64_load128:
 ; AVX-64:       # %bb.0:
-; AVX-64-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-64-NEXT:    vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
-; AVX-64-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm3
-; AVX-64-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX-64-NEXT:    jb .LBB5_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm4
-; AVX-64-NEXT:  .LBB5_2:
-; AVX-64-NEXT:    vsubss %xmm4, %xmm3, %xmm3
+; AVX-64-NEXT:    vmovaps (%rdi), %xmm0
+; AVX-64-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
+; AVX-64-NEXT:    vmovss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
+; AVX-64-NEXT:    vsubss %xmm2, %xmm1, %xmm3
 ; AVX-64-NEXT:    vcvttss2si %xmm3, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm3
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB5_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB5_4:
-; AVX-64-NEXT:    vsubss %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttss2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
+; AVX-64-NEXT:    vcvttss2si %xmm1, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm1
+; AVX-64-NEXT:    vsubss %xmm2, %xmm0, %xmm2
+; AVX-64-NEXT:    vcvttss2si %xmm2, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-32-LABEL: strict_vector_fptoui_v2f32_to_v2i64_load128:
@@ -1385,29 +1325,29 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512F-32-NEXT:    movl %esp, %ebp
 ; AVX512F-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512F-32-NEXT:    andl $-8, %esp
-; AVX512F-32-NEXT:    subl $16, %esp
+; AVX512F-32-NEXT:    subl $32, %esp
 ; AVX512F-32-NEXT:    movl 8(%ebp), %eax
 ; AVX512F-32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512F-32-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
 ; AVX512F-32-NEXT:    vmovss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512F-32-NEXT:    xorl %eax, %eax
-; AVX512F-32-NEXT:    vcomiss %xmm2, %xmm1
+; AVX512F-32-NEXT:    vucomiss %xmm2, %xmm1
 ; AVX512F-32-NEXT:    setae %al
 ; AVX512F-32-NEXT:    kmovw %eax, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm2, %xmm2, %xmm3 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubss %xmm3, %xmm1, %xmm1
-; AVX512F-32-NEXT:    vmovss %xmm1, {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    vmovss %xmm1, (%esp)
 ; AVX512F-32-NEXT:    xorl %ecx, %ecx
-; AVX512F-32-NEXT:    vcomiss %xmm2, %xmm0
+; AVX512F-32-NEXT:    vucomiss %xmm2, %xmm0
 ; AVX512F-32-NEXT:    setae %cl
 ; AVX512F-32-NEXT:    kmovw %ecx, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm2, %xmm2, %xmm1 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX512F-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX512F-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX512F-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    flds (%esp)
 ; AVX512F-32-NEXT:    fisttpll (%esp)
+; AVX512F-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX512F-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    wait
 ; AVX512F-32-NEXT:    shll $31, %eax
 ; AVX512F-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
@@ -1415,7 +1355,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512F-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512F-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512F-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX512F-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512F-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512F-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX512F-32-NEXT:    movl %ebp, %esp
 ; AVX512F-32-NEXT:    popl %ebp
@@ -1439,29 +1379,29 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512VL-32-NEXT:    movl %esp, %ebp
 ; AVX512VL-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX512VL-32-NEXT:    andl $-8, %esp
-; AVX512VL-32-NEXT:    subl $16, %esp
+; AVX512VL-32-NEXT:    subl $32, %esp
 ; AVX512VL-32-NEXT:    movl 8(%ebp), %eax
 ; AVX512VL-32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512VL-32-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
 ; AVX512VL-32-NEXT:    vmovss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm2, %xmm1
+; AVX512VL-32-NEXT:    vucomiss %xmm2, %xmm1
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm2, %xmm2, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm1, %xmm1
-; AVX512VL-32-NEXT:    vmovss %xmm1, {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    vmovss %xmm1, (%esp)
 ; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomiss %xmm2, %xmm0
+; AVX512VL-32-NEXT:    vucomiss %xmm2, %xmm0
 ; AVX512VL-32-NEXT:    setae %cl
 ; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm2, %xmm2, %xmm1 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX512VL-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    flds (%esp)
 ; AVX512VL-32-NEXT:    fisttpll (%esp)
+; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    wait
 ; AVX512VL-32-NEXT:    shll $31, %eax
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
@@ -1469,7 +1409,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX512VL-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    movl %ebp, %esp
 ; AVX512VL-32-NEXT:    popl %ebp
@@ -1488,7 +1428,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ; AVX512DQ-32-LABEL: strict_vector_fptoui_v2f32_to_v2i64_load128:
 ; AVX512DQ-32:       # %bb.0:
 ; AVX512DQ-32-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; AVX512DQ-32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX512DQ-32-NEXT:    vmovaps (%eax), %xmm0
 ; AVX512DQ-32-NEXT:    vcvttps2uqq %ymm0, %zmm0
 ; AVX512DQ-32-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-32-NEXT:    vzeroupper
@@ -1496,7 +1436,7 @@ define <2 x i64> @strict_vector_fptoui_v2f32_to_v2i64_load128(ptr %x) strictfp {
 ;
 ; AVX512DQ-64-LABEL: strict_vector_fptoui_v2f32_to_v2i64_load128:
 ; AVX512DQ-64:       # %bb.0:
-; AVX512DQ-64-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX512DQ-64-NEXT:    vmovaps (%rdi), %xmm0
 ; AVX512DQ-64-NEXT:    vcvttps2uqq %ymm0, %zmm0
 ; AVX512DQ-64-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-64-NEXT:    vzeroupper
@@ -1561,85 +1501,49 @@ define <2 x i32> @strict_vector_fptosi_v2f64_to_v2i32(<2 x double> %a) #0 {
 define <2 x i32> @strict_vector_fptoui_v2f64_to_v2i32(<2 x double> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptoui_v2f64_to_v2i32:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movsd {{.*#+}} xmm3 = [2.147483648E+9,0.0E+0]
-; SSE-32-NEXT:    comisd %xmm3, %xmm0
-; SSE-32-NEXT:    xorpd %xmm2, %xmm2
-; SSE-32-NEXT:    xorpd %xmm1, %xmm1
-; SSE-32-NEXT:    jb .LBB7_2
-; SSE-32-NEXT:  # %bb.1:
-; SSE-32-NEXT:    movapd %xmm3, %xmm1
-; SSE-32-NEXT:  .LBB7_2:
-; SSE-32-NEXT:    setae %al
-; SSE-32-NEXT:    movzbl %al, %eax
-; SSE-32-NEXT:    shll $31, %eax
-; SSE-32-NEXT:    movapd %xmm0, %xmm4
-; SSE-32-NEXT:    subsd %xmm1, %xmm4
-; SSE-32-NEXT:    cvttsd2si %xmm4, %ecx
-; SSE-32-NEXT:    xorl %eax, %ecx
-; SSE-32-NEXT:    movd %ecx, %xmm1
-; SSE-32-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
-; SSE-32-NEXT:    comisd %xmm3, %xmm0
-; SSE-32-NEXT:    jb .LBB7_4
-; SSE-32-NEXT:  # %bb.3:
-; SSE-32-NEXT:    movapd %xmm3, %xmm2
-; SSE-32-NEXT:  .LBB7_4:
-; SSE-32-NEXT:    setae %al
-; SSE-32-NEXT:    movzbl %al, %eax
-; SSE-32-NEXT:    shll $31, %eax
-; SSE-32-NEXT:    subsd %xmm2, %xmm0
-; SSE-32-NEXT:    cvttsd2si %xmm0, %ecx
-; SSE-32-NEXT:    xorl %eax, %ecx
-; SSE-32-NEXT:    movd %ecx, %xmm0
-; SSE-32-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; SSE-32-NEXT:    movdqa %xmm1, %xmm0
+; SSE-32-NEXT:    cvttpd2dq %xmm0, %xmm1
+; SSE-32-NEXT:    movapd %xmm1, %xmm2
+; SSE-32-NEXT:    psrad $31, %xmm2
+; SSE-32-NEXT:    addpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-32-NEXT:    cvttpd2dq %xmm0, %xmm0
+; SSE-32-NEXT:    andpd %xmm2, %xmm0
+; SSE-32-NEXT:    orpd %xmm1, %xmm0
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f64_to_v2i32:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    cvttsd2si %xmm0, %rax
-; SSE-64-NEXT:    movd %eax, %xmm1
-; SSE-64-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
-; SSE-64-NEXT:    cvttsd2si %xmm0, %rax
-; SSE-64-NEXT:    movd %eax, %xmm0
-; SSE-64-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; SSE-64-NEXT:    movdqa %xmm1, %xmm0
+; SSE-64-NEXT:    cvttpd2dq %xmm0, %xmm1
+; SSE-64-NEXT:    movapd %xmm1, %xmm2
+; SSE-64-NEXT:    psrad $31, %xmm2
+; SSE-64-NEXT:    addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-64-NEXT:    cvttpd2dq %xmm0, %xmm0
+; SSE-64-NEXT:    andpd %xmm2, %xmm0
+; SSE-64-NEXT:    orpd %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
 ;
 ; AVX-32-LABEL: strict_vector_fptoui_v2f64_to_v2i32:
 ; AVX-32:       # %bb.0:
-; AVX-32-NEXT:    pushl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_offset 8
-; AVX-32-NEXT:    .cfi_offset %ebp, -8
-; AVX-32-NEXT:    movl %esp, %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
-; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
-; AVX-32-NEXT:    vmovlps %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vmovhps %xmm0, (%esp)
-; AVX-32-NEXT:    fldl {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fldl (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-32-NEXT:    vpinsrd $1, (%esp), %xmm0, %xmm0
-; AVX-32-NEXT:    movl %ebp, %esp
-; AVX-32-NEXT:    popl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa %esp, 4
+; AVX-32-NEXT:    vcvttpd2dq %xmm0, %xmm1
+; AVX-32-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-32-NEXT:    vaddpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX-32-NEXT:    vcvttpd2dq %xmm0, %xmm0
+; AVX-32-NEXT:    vandpd %xmm2, %xmm0, %xmm0
+; AVX-32-NEXT:    vorpd %xmm0, %xmm1, %xmm0
 ; AVX-32-NEXT:    retl
 ;
 ; AVX-64-LABEL: strict_vector_fptoui_v2f64_to_v2i32:
 ; AVX-64:       # %bb.0:
-; AVX-64-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
-; AVX-64-NEXT:    vcvttsd2si %xmm1, %rax
-; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
-; AVX-64-NEXT:    vmovd %ecx, %xmm0
-; AVX-64-NEXT:    vpinsrd $1, %eax, %xmm0, %xmm0
+; AVX-64-NEXT:    vcvttpd2dq %xmm0, %xmm1
+; AVX-64-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-64-NEXT:    vaddpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX-64-NEXT:    vcvttpd2dq %xmm0, %xmm0
+; AVX-64-NEXT:    vandpd %xmm2, %xmm0, %xmm0
+; AVX-64-NEXT:    vorpd %xmm0, %xmm1, %xmm0
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v2f64_to_v2i32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512F-NEXT:    vcvttpd2udq %zmm0, %ymm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512F-NEXT:    vzeroupper
@@ -1652,7 +1556,7 @@ define <2 x i32> @strict_vector_fptoui_v2f64_to_v2i32(<2 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f64_to_v2i32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2udq %zmm0, %ymm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1670,43 +1574,36 @@ define <2 x i32> @strict_vector_fptoui_v2f64_to_v2i32(<2 x double> %a) #0 {
 define <2 x i32> @strict_vector_fptosi_v2f32_to_v2i32(<2 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-64-NEXT:    retq
 ;
 ; AVX-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512F-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512F-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512VL-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptosi_v2f32_to_v2i32:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
   %ret = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32(<2 x float> %a,
@@ -1717,85 +1614,49 @@ define <2 x i32> @strict_vector_fptosi_v2f32_to_v2i32(<2 x float> %a) #0 {
 define <2 x i32> @strict_vector_fptoui_v2f32_to_v2i32(<2 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movss {{.*#+}} xmm3 = [2.14748365E+9,0.0E+0,0.0E+0,0.0E+0]
-; SSE-32-NEXT:    comiss %xmm3, %xmm0
-; SSE-32-NEXT:    xorps %xmm2, %xmm2
-; SSE-32-NEXT:    xorps %xmm1, %xmm1
-; SSE-32-NEXT:    jb .LBB9_2
-; SSE-32-NEXT:  # %bb.1:
-; SSE-32-NEXT:    movaps %xmm3, %xmm1
-; SSE-32-NEXT:  .LBB9_2:
-; SSE-32-NEXT:    setae %al
-; SSE-32-NEXT:    movzbl %al, %eax
-; SSE-32-NEXT:    shll $31, %eax
-; SSE-32-NEXT:    movaps %xmm0, %xmm4
-; SSE-32-NEXT:    subss %xmm1, %xmm4
-; SSE-32-NEXT:    cvttss2si %xmm4, %ecx
-; SSE-32-NEXT:    xorl %eax, %ecx
-; SSE-32-NEXT:    movd %ecx, %xmm1
-; SSE-32-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-32-NEXT:    comiss %xmm3, %xmm0
-; SSE-32-NEXT:    jb .LBB9_4
-; SSE-32-NEXT:  # %bb.3:
-; SSE-32-NEXT:    movaps %xmm3, %xmm2
-; SSE-32-NEXT:  .LBB9_4:
-; SSE-32-NEXT:    setae %al
-; SSE-32-NEXT:    movzbl %al, %eax
-; SSE-32-NEXT:    shll $31, %eax
-; SSE-32-NEXT:    subss %xmm2, %xmm0
-; SSE-32-NEXT:    cvttss2si %xmm0, %ecx
-; SSE-32-NEXT:    xorl %eax, %ecx
-; SSE-32-NEXT:    movd %ecx, %xmm0
-; SSE-32-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; SSE-32-NEXT:    movdqa %xmm1, %xmm0
+; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm1
+; SSE-32-NEXT:    movdqa %xmm1, %xmm2
+; SSE-32-NEXT:    psrad $31, %xmm2
+; SSE-32-NEXT:    subps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
+; SSE-32-NEXT:    pand %xmm2, %xmm0
+; SSE-32-NEXT:    por %xmm1, %xmm0
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    cvttss2si %xmm0, %rax
-; SSE-64-NEXT:    movd %eax, %xmm1
-; SSE-64-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-64-NEXT:    cvttss2si %xmm0, %rax
-; SSE-64-NEXT:    movd %eax, %xmm0
-; SSE-64-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; SSE-64-NEXT:    movdqa %xmm1, %xmm0
+; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm1
+; SSE-64-NEXT:    movdqa %xmm1, %xmm2
+; SSE-64-NEXT:    psrad $31, %xmm2
+; SSE-64-NEXT:    subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
+; SSE-64-NEXT:    pand %xmm2, %xmm0
+; SSE-64-NEXT:    por %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
 ;
 ; AVX-32-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; AVX-32:       # %bb.0:
-; AVX-32-NEXT:    pushl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_offset 8
-; AVX-32-NEXT:    .cfi_offset %ebp, -8
-; AVX-32-NEXT:    movl %esp, %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
-; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
-; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX-32-NEXT:    movl %ebp, %esp
-; AVX-32-NEXT:    popl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa %esp, 4
+; AVX-32-NEXT:    vcvttps2dq %xmm0, %xmm1
+; AVX-32-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-32-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX-32-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX-32-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; AVX-32-NEXT:    vpor %xmm0, %xmm1, %xmm0
 ; AVX-32-NEXT:    retl
 ;
 ; AVX-64-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; AVX-64:       # %bb.0:
-; AVX-64-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; AVX-64-NEXT:    vcvttss2si %xmm1, %rax
-; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
-; AVX-64-NEXT:    vmovd %ecx, %xmm0
-; AVX-64-NEXT:    vpinsrd $1, %eax, %xmm0, %xmm0
+; AVX-64-NEXT:    vcvttps2dq %xmm0, %xmm1
+; AVX-64-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-64-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX-64-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX-64-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; AVX-64-NEXT:    vpor %xmm0, %xmm1, %xmm0
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512F-NEXT:    vcvttps2udq %zmm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -1803,13 +1664,12 @@ define <2 x i32> @strict_vector_fptoui_v2f32_to_v2i32(<2 x float> %a) #0 {
 ;
 ; AVX512VL-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvttps2udq %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttps2udq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1817,7 +1677,6 @@ define <2 x i32> @strict_vector_fptoui_v2f32_to_v2i32(<2 x float> %a) #0 {
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptoui_v2f32_to_v2i32:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VLDQ-NEXT:    vcvttps2udq %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
   %ret = call <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f32(<2 x float> %a,
@@ -1922,49 +1781,42 @@ define <2 x i16> @strict_vector_fptoui_v2f64_to_v2i16(<2 x double> %a) #0 {
 define <2 x i16> @strict_vector_fptosi_v2f32_to_v2i16(<2 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-32-NEXT:    packssdw %xmm0, %xmm0
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-64-NEXT:    packssdw %xmm0, %xmm0
 ; SSE-64-NEXT:    retq
 ;
 ; AVX-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512F-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512F-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX512F-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512VL-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VL-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512DQ-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptosi_v2f32_to_v2i16:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
@@ -1976,49 +1828,42 @@ define <2 x i16> @strict_vector_fptosi_v2f32_to_v2i16(<2 x float> %a) #0 {
 define <2 x i16> @strict_vector_fptoui_v2f32_to_v2i16(<2 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-32-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-64-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
 ; SSE-64-NEXT:    retq
 ;
 ; AVX-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512F-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX512F-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512VL-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VL-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512DQ-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptoui_v2f32_to_v2i16:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
@@ -2134,7 +1979,6 @@ define <2 x i8> @strict_vector_fptoui_v2f64_to_v2i8(<2 x double> %a) #0 {
 define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-32-NEXT:    packssdw %xmm0, %xmm0
 ; SSE-32-NEXT:    packsswb %xmm0, %xmm0
@@ -2142,7 +1986,6 @@ define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; SSE-64-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-64-NEXT:    packssdw %xmm0, %xmm0
 ; SSE-64-NEXT:    packsswb %xmm0, %xmm0
@@ -2150,7 +1993,6 @@ define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    vpacksswb %xmm0, %xmm0, %xmm0
@@ -2158,7 +2000,6 @@ define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX512F-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512F-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX512F-NEXT:    vpacksswb %xmm0, %xmm0, %xmm0
@@ -2166,14 +2007,12 @@ define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX512VL-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VL-NEXT:    vpmovdb %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512DQ-NEXT:    vpackssdw %xmm0, %xmm0, %xmm0
 ; AVX512DQ-NEXT:    vpacksswb %xmm0, %xmm0, %xmm0
@@ -2181,7 +2020,6 @@ define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptosi_v2f32_to_v2i8:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    vpmovdb %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
@@ -2193,7 +2031,6 @@ define <2 x i8> @strict_vector_fptosi_v2f32_to_v2i8(<2 x float> %a) #0 {
 define <2 x i8> @strict_vector_fptoui_v2f32_to_v2i8(<2 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-32-NEXT:    packuswb %xmm0, %xmm0
 ; SSE-32-NEXT:    packuswb %xmm0, %xmm0
@@ -2201,7 +2038,6 @@ define <2 x i8> @strict_vector_fptoui_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
 ; SSE-64-NEXT:    packuswb %xmm0, %xmm0
 ; SSE-64-NEXT:    packuswb %xmm0, %xmm0
@@ -2209,7 +2045,6 @@ define <2 x i8> @strict_vector_fptoui_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
@@ -2217,7 +2052,6 @@ define <2 x i8> @strict_vector_fptoui_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512F-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX512F-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
@@ -2225,14 +2059,12 @@ define <2 x i8> @strict_vector_fptoui_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX512VL-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VL-NEXT:    vpmovdb %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512DQ-NEXT:    vpackusdw %xmm0, %xmm0, %xmm0
 ; AVX512DQ-NEXT:    vpackuswb %xmm0, %xmm0, %xmm0
@@ -2240,7 +2072,6 @@ define <2 x i8> @strict_vector_fptoui_v2f32_to_v2i8(<2 x float> %a) #0 {
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptoui_v2f32_to_v2i8:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    vpmovdb %xmm0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
@@ -2385,7 +2216,7 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ; SSE-32-NEXT:    andl $-8, %esp
 ; SSE-32-NEXT:    subl $24, %esp
 ; SSE-32-NEXT:    movsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; SSE-32-NEXT:    comisd %xmm1, %xmm0
+; SSE-32-NEXT:    ucomisd %xmm1, %xmm0
 ; SSE-32-NEXT:    movapd %xmm1, %xmm2
 ; SSE-32-NEXT:    jae .LBB19_2
 ; SSE-32-NEXT:  # %bb.1:
@@ -2405,7 +2236,7 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ; SSE-32-NEXT:    fistpll {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    fldcw {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
-; SSE-32-NEXT:    comisd %xmm1, %xmm0
+; SSE-32-NEXT:    ucomisd %xmm1, %xmm0
 ; SSE-32-NEXT:    jae .LBB19_4
 ; SSE-32-NEXT:  # %bb.3:
 ; SSE-32-NEXT:    xorpd %xmm1, %xmm1
@@ -2442,35 +2273,25 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f64_to_v2i1:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movsd {{.*#+}} xmm3 = [9.2233720368547758E+18,0.0E+0]
-; SSE-64-NEXT:    comisd %xmm3, %xmm0
-; SSE-64-NEXT:    xorpd %xmm2, %xmm2
-; SSE-64-NEXT:    xorpd %xmm1, %xmm1
-; SSE-64-NEXT:    jb .LBB19_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    movapd %xmm3, %xmm1
-; SSE-64-NEXT:  .LBB19_2:
-; SSE-64-NEXT:    movapd %xmm0, %xmm4
-; SSE-64-NEXT:    subsd %xmm1, %xmm4
-; SSE-64-NEXT:    cvttsd2si %xmm4, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm1
+; SSE-64-NEXT:    movsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
+; SSE-64-NEXT:    movapd %xmm0, %xmm1
+; SSE-64-NEXT:    subsd %xmm2, %xmm1
+; SSE-64-NEXT:    cvttsd2si %xmm1, %rax
+; SSE-64-NEXT:    cvttsd2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rcx, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rax, %rdx
+; SSE-64-NEXT:    orq %rcx, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm1
 ; SSE-64-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1,1]
-; SSE-64-NEXT:    comisd %xmm3, %xmm0
-; SSE-64-NEXT:    jb .LBB19_4
-; SSE-64-NEXT:  # %bb.3:
-; SSE-64-NEXT:    movapd %xmm3, %xmm2
-; SSE-64-NEXT:  .LBB19_4:
-; SSE-64-NEXT:    subsd %xmm2, %xmm0
 ; SSE-64-NEXT:    cvttsd2si %xmm0, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm0
+; SSE-64-NEXT:    subsd %xmm2, %xmm0
+; SSE-64-NEXT:    cvttsd2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rax, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rcx, %rdx
+; SSE-64-NEXT:    orq %rax, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm0
 ; SSE-64-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; SSE-64-NEXT:    movdqa %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
@@ -2486,7 +2307,7 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ; AVX-32-NEXT:    subl $16, %esp
 ; AVX-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX-32-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovapd %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB19_2
 ; AVX-32-NEXT:  # %bb.1:
@@ -2501,7 +2322,7 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ; AVX-32-NEXT:    movzbl %al, %eax
 ; AVX-32-NEXT:    shll $31, %eax
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm0
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB19_4
 ; AVX-32-NEXT:  # %bb.3:
 ; AVX-32-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
@@ -2527,39 +2348,29 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ; AVX-64-LABEL: strict_vector_fptoui_v2f64_to_v2i1:
 ; AVX-64:       # %bb.0:
 ; AVX-64-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX-64-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorpd %xmm3, %xmm3, %xmm3
-; AVX-64-NEXT:    jb .LBB19_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm3
-; AVX-64-NEXT:  .LBB19_2:
-; AVX-64-NEXT:    vsubsd %xmm3, %xmm0, %xmm3
-; AVX-64-NEXT:    vcvttsd2si %xmm3, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm3
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm0, %xmm2
+; AVX-64-NEXT:    vcvttsd2si %xmm2, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm2
 ; AVX-64-NEXT:    vshufpd {{.*#+}} xmm0 = xmm0[1,0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB19_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB19_4:
-; AVX-64-NEXT:    vsubsd %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttsd2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm0, %xmm1
+; AVX-64-NEXT:    vcvttsd2si %xmm1, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v2f64_to_v2i1:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512F-NEXT:    vcvttpd2udq %zmm0, %ymm0
 ; AVX512F-NEXT:    vpslld $31, %ymm0, %ymm0
 ; AVX512F-NEXT:    vptestmd %zmm0, %zmm0, %k1
@@ -2579,7 +2390,7 @@ define <2 x i1> @strict_vector_fptoui_v2f64_to_v2i1(<2 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f64_to_v2i1:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2udq %zmm0, %ymm0
 ; AVX512DQ-NEXT:    vpslld $31, %ymm0, %ymm0
 ; AVX512DQ-NEXT:    vpmovd2m %zmm0, %k0
@@ -2658,17 +2469,17 @@ define <2 x i1> @strict_vector_fptosi_v2f32_to_v2i1(<2 x float> %a) #0 {
 ; AVX-32-NEXT:    movl %esp, %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    vextractps $1, %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
+; AVX-32-NEXT:    subl $32, %esp
+; AVX-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    vextractps $1, %xmm0, (%esp)
 ; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    flds (%esp)
+; AVX-32-NEXT:    fisttpll (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
-; AVX-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
@@ -2687,14 +2498,8 @@ define <2 x i1> @strict_vector_fptosi_v2f32_to_v2i1(<2 x float> %a) #0 {
 ;
 ; AVX512F-LABEL: strict_vector_fptosi_v2f32_to_v2i1:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512F-NEXT:    andl $1, %eax
-; AVX512F-NEXT:    kmovw %eax, %k0
-; AVX512F-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX512F-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512F-NEXT:    kmovw %eax, %k1
-; AVX512F-NEXT:    kshiftlw $1, %k1, %k1
-; AVX512F-NEXT:    korw %k1, %k0, %k1
+; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512F-NEXT:    vptestmd %zmm0, %zmm0, %k1
 ; AVX512F-NEXT:    vpternlogq {{.*#+}} zmm0 {%k1} {z} = -1
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -2702,29 +2507,16 @@ define <2 x i1> @strict_vector_fptosi_v2f32_to_v2i1(<2 x float> %a) #0 {
 ;
 ; AVX512VL-LABEL: strict_vector_fptosi_v2f32_to_v2i1:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512VL-NEXT:    andl $1, %eax
-; AVX512VL-NEXT:    kmovw %eax, %k0
-; AVX512VL-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX512VL-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512VL-NEXT:    kmovw %eax, %k1
-; AVX512VL-NEXT:    kshiftlw $1, %k1, %k1
-; AVX512VL-NEXT:    korw %k1, %k0, %k1
+; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512VL-NEXT:    vptestmd %xmm0, %xmm0, %k1
 ; AVX512VL-NEXT:    vpcmpeqd %xmm0, %xmm0, %xmm0
 ; AVX512VL-NEXT:    vmovdqa64 %xmm0, %xmm0 {%k1} {z}
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v2f32_to_v2i1:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; AVX512DQ-NEXT:    vcvttss2si %xmm1, %eax
-; AVX512DQ-NEXT:    kmovw %eax, %k0
-; AVX512DQ-NEXT:    kshiftlb $1, %k0, %k0
-; AVX512DQ-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512DQ-NEXT:    kmovw %eax, %k1
-; AVX512DQ-NEXT:    kshiftlb $7, %k1, %k1
-; AVX512DQ-NEXT:    kshiftrb $7, %k1, %k1
-; AVX512DQ-NEXT:    korw %k0, %k1, %k0
+; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512DQ-NEXT:    vpmovd2m %zmm0, %k0
 ; AVX512DQ-NEXT:    vpmovm2q %k0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -2732,15 +2524,8 @@ define <2 x i1> @strict_vector_fptosi_v2f32_to_v2i1(<2 x float> %a) #0 {
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptosi_v2f32_to_v2i1:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; AVX512VLDQ-NEXT:    vcvttss2si %xmm1, %eax
-; AVX512VLDQ-NEXT:    kmovw %eax, %k0
-; AVX512VLDQ-NEXT:    kshiftlb $1, %k0, %k0
-; AVX512VLDQ-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512VLDQ-NEXT:    kmovw %eax, %k1
-; AVX512VLDQ-NEXT:    kshiftlb $7, %k1, %k1
-; AVX512VLDQ-NEXT:    kshiftrb $7, %k1, %k1
-; AVX512VLDQ-NEXT:    korw %k0, %k1, %k0
+; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512VLDQ-NEXT:    vpmovd2m %xmm0, %k0
 ; AVX512VLDQ-NEXT:    vpmovm2q %k0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
   %ret = call <2 x i1> @llvm.experimental.constrained.fptosi.v2i1.v2f32(<2 x float> %a,
@@ -2759,7 +2544,7 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ; SSE-32-NEXT:    andl $-8, %esp
 ; SSE-32-NEXT:    subl $24, %esp
 ; SSE-32-NEXT:    movss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-32-NEXT:    comiss %xmm1, %xmm0
+; SSE-32-NEXT:    ucomiss %xmm1, %xmm0
 ; SSE-32-NEXT:    movaps %xmm1, %xmm2
 ; SSE-32-NEXT:    jae .LBB21_2
 ; SSE-32-NEXT:  # %bb.1:
@@ -2779,7 +2564,7 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ; SSE-32-NEXT:    fistpll {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    fldcw {{[0-9]+}}(%esp)
 ; SSE-32-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-32-NEXT:    comiss %xmm1, %xmm0
+; SSE-32-NEXT:    ucomiss %xmm1, %xmm0
 ; SSE-32-NEXT:    jae .LBB21_4
 ; SSE-32-NEXT:  # %bb.3:
 ; SSE-32-NEXT:    xorps %xmm1, %xmm1
@@ -2816,35 +2601,25 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v2f32_to_v2i1:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movss {{.*#+}} xmm3 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; SSE-64-NEXT:    comiss %xmm3, %xmm0
-; SSE-64-NEXT:    xorps %xmm2, %xmm2
-; SSE-64-NEXT:    xorps %xmm1, %xmm1
-; SSE-64-NEXT:    jb .LBB21_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    movaps %xmm3, %xmm1
-; SSE-64-NEXT:  .LBB21_2:
-; SSE-64-NEXT:    movaps %xmm0, %xmm4
-; SSE-64-NEXT:    subss %xmm1, %xmm4
-; SSE-64-NEXT:    cvttss2si %xmm4, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm1
+; SSE-64-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
+; SSE-64-NEXT:    movaps %xmm0, %xmm1
+; SSE-64-NEXT:    subss %xmm2, %xmm1
+; SSE-64-NEXT:    cvttss2si %xmm1, %rax
+; SSE-64-NEXT:    cvttss2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rcx, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rax, %rdx
+; SSE-64-NEXT:    orq %rcx, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm1
 ; SSE-64-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
-; SSE-64-NEXT:    comiss %xmm3, %xmm0
-; SSE-64-NEXT:    jb .LBB21_4
-; SSE-64-NEXT:  # %bb.3:
-; SSE-64-NEXT:    movaps %xmm3, %xmm2
-; SSE-64-NEXT:  .LBB21_4:
-; SSE-64-NEXT:    subss %xmm2, %xmm0
 ; SSE-64-NEXT:    cvttss2si %xmm0, %rax
-; SSE-64-NEXT:    setae %cl
-; SSE-64-NEXT:    movzbl %cl, %ecx
-; SSE-64-NEXT:    shlq $63, %rcx
-; SSE-64-NEXT:    xorq %rax, %rcx
-; SSE-64-NEXT:    movq %rcx, %xmm0
+; SSE-64-NEXT:    subss %xmm2, %xmm0
+; SSE-64-NEXT:    cvttss2si %xmm0, %rcx
+; SSE-64-NEXT:    movq %rax, %rdx
+; SSE-64-NEXT:    sarq $63, %rdx
+; SSE-64-NEXT:    andq %rcx, %rdx
+; SSE-64-NEXT:    orq %rax, %rdx
+; SSE-64-NEXT:    movq %rdx, %xmm0
 ; SSE-64-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; SSE-64-NEXT:    movdqa %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
@@ -2857,33 +2632,33 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ; AVX-32-NEXT:    movl %esp, %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
 ; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $16, %esp
+; AVX-32-NEXT:    subl $32, %esp
 ; AVX-32-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovaps %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB21_2
 ; AVX-32-NEXT:  # %bb.1:
 ; AVX-32-NEXT:    vxorps %xmm3, %xmm3, %xmm3
 ; AVX-32-NEXT:  .LBB21_2:
 ; AVX-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    vmovss %xmm2, (%esp)
+; AVX-32-NEXT:    flds (%esp)
+; AVX-32-NEXT:    fisttpll (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    setae %al
 ; AVX-32-NEXT:    movzbl %al, %eax
 ; AVX-32-NEXT:    shll $31, %eax
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm0
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB21_4
 ; AVX-32-NEXT:  # %bb.3:
 ; AVX-32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
 ; AVX-32-NEXT:  .LBB21_4:
 ; AVX-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
-; AVX-32-NEXT:    vmovss %xmm0, (%esp)
-; AVX-32-NEXT:    flds (%esp)
-; AVX-32-NEXT:    fisttpll (%esp)
+; AVX-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    flds {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    setae %cl
 ; AVX-32-NEXT:    movzbl %cl, %ecx
@@ -2891,7 +2666,7 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; AVX-32-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
+; AVX-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX-32-NEXT:    vpinsrd $3, %eax, %xmm0, %xmm0
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
@@ -2901,46 +2676,31 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ; AVX-64-LABEL: strict_vector_fptoui_v2f32_to_v2i1:
 ; AVX-64:       # %bb.0:
 ; AVX-64-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX-64-NEXT:    jb .LBB21_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm3
-; AVX-64-NEXT:  .LBB21_2:
-; AVX-64-NEXT:    vsubss %xmm3, %xmm0, %xmm3
-; AVX-64-NEXT:    vcvttss2si %xmm3, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm3
+; AVX-64-NEXT:    vsubss %xmm1, %xmm0, %xmm2
+; AVX-64-NEXT:    vcvttss2si %xmm2, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm2
 ; AVX-64-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB21_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB21_4:
-; AVX-64-NEXT:    vsubss %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttss2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
+; AVX-64-NEXT:    vsubss %xmm1, %xmm0, %xmm1
+; AVX-64-NEXT:    vcvttss2si %xmm1, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v2f32_to_v2i1:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512F-NEXT:    andl $1, %eax
-; AVX512F-NEXT:    kmovw %eax, %k0
-; AVX512F-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX512F-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512F-NEXT:    kmovw %eax, %k1
-; AVX512F-NEXT:    kshiftlw $1, %k1, %k1
-; AVX512F-NEXT:    korw %k1, %k0, %k1
+; AVX512F-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512F-NEXT:    vpslld $31, %xmm0, %xmm0
+; AVX512F-NEXT:    vptestmd %zmm0, %zmm0, %k1
 ; AVX512F-NEXT:    vpternlogq {{.*#+}} zmm0 {%k1} {z} = -1
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -2948,29 +2708,18 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ;
 ; AVX512VL-LABEL: strict_vector_fptoui_v2f32_to_v2i1:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512VL-NEXT:    andl $1, %eax
-; AVX512VL-NEXT:    kmovw %eax, %k0
-; AVX512VL-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX512VL-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512VL-NEXT:    kmovw %eax, %k1
-; AVX512VL-NEXT:    kshiftlw $1, %k1, %k1
-; AVX512VL-NEXT:    korw %k1, %k0, %k1
+; AVX512VL-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512VL-NEXT:    vpslld $31, %xmm0, %xmm0
+; AVX512VL-NEXT:    vptestmd %xmm0, %xmm0, %k1
 ; AVX512VL-NEXT:    vpcmpeqd %xmm0, %xmm0, %xmm0
 ; AVX512VL-NEXT:    vmovdqa64 %xmm0, %xmm0 {%k1} {z}
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v2f32_to_v2i1:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; AVX512DQ-NEXT:    vcvttss2si %xmm1, %eax
-; AVX512DQ-NEXT:    kmovw %eax, %k0
-; AVX512DQ-NEXT:    kshiftlb $1, %k0, %k0
-; AVX512DQ-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512DQ-NEXT:    kmovw %eax, %k1
-; AVX512DQ-NEXT:    kshiftlb $7, %k1, %k1
-; AVX512DQ-NEXT:    kshiftrb $7, %k1, %k1
-; AVX512DQ-NEXT:    korw %k0, %k1, %k0
+; AVX512DQ-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512DQ-NEXT:    vpslld $31, %xmm0, %xmm0
+; AVX512DQ-NEXT:    vpmovd2m %zmm0, %k0
 ; AVX512DQ-NEXT:    vpmovm2q %k0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -2978,15 +2727,9 @@ define <2 x i1> @strict_vector_fptoui_v2f32_to_v2i1(<2 x float> %a) #0 {
 ;
 ; AVX512VLDQ-LABEL: strict_vector_fptoui_v2f32_to_v2i1:
 ; AVX512VLDQ:       # %bb.0:
-; AVX512VLDQ-NEXT:    vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
-; AVX512VLDQ-NEXT:    vcvttss2si %xmm1, %eax
-; AVX512VLDQ-NEXT:    kmovw %eax, %k0
-; AVX512VLDQ-NEXT:    kshiftlb $1, %k0, %k0
-; AVX512VLDQ-NEXT:    vcvttss2si %xmm0, %eax
-; AVX512VLDQ-NEXT:    kmovw %eax, %k1
-; AVX512VLDQ-NEXT:    kshiftlb $7, %k1, %k1
-; AVX512VLDQ-NEXT:    kshiftrb $7, %k1, %k1
-; AVX512VLDQ-NEXT:    korw %k0, %k1, %k0
+; AVX512VLDQ-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX512VLDQ-NEXT:    vpslld $31, %xmm0, %xmm0
+; AVX512VLDQ-NEXT:    vpmovd2m %xmm0, %k0
 ; AVX512VLDQ-NEXT:    vpmovm2q %k0, %xmm0
 ; AVX512VLDQ-NEXT:    ret{{[l|q]}}
   %ret = call <2 x i1> @llvm.experimental.constrained.fptoui.v2i1.v2f32(<2 x float> %a,
@@ -3037,46 +2780,49 @@ define <4 x i32> @strict_vector_fptosi_v4f32_to_v4i32(<4 x float> %a) #0 {
 define <4 x i32> @strict_vector_fptoui_v4f32_to_v4i32(<4 x float> %a) #0 {
 ; SSE-32-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    movaps {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
-; SSE-32-NEXT:    movaps %xmm0, %xmm2
-; SSE-32-NEXT:    cmpltps %xmm1, %xmm2
-; SSE-32-NEXT:    movaps %xmm2, %xmm3
-; SSE-32-NEXT:    andnps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm3
-; SSE-32-NEXT:    andnps %xmm1, %xmm2
-; SSE-32-NEXT:    subps %xmm2, %xmm0
+; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm1
+; SSE-32-NEXT:    movdqa %xmm1, %xmm2
+; SSE-32-NEXT:    psrad $31, %xmm2
+; SSE-32-NEXT:    subps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
 ; SSE-32-NEXT:    cvttps2dq %xmm0, %xmm0
-; SSE-32-NEXT:    xorps %xmm3, %xmm0
+; SSE-32-NEXT:    pand %xmm2, %xmm0
+; SSE-32-NEXT:    por %xmm1, %xmm0
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movaps {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
-; SSE-64-NEXT:    movaps %xmm0, %xmm2
-; SSE-64-NEXT:    cmpltps %xmm1, %xmm2
-; SSE-64-NEXT:    movaps %xmm2, %xmm3
-; SSE-64-NEXT:    andnps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3
-; SSE-64-NEXT:    andnps %xmm1, %xmm2
-; SSE-64-NEXT:    subps %xmm2, %xmm0
+; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm1
+; SSE-64-NEXT:    movdqa %xmm1, %xmm2
+; SSE-64-NEXT:    psrad $31, %xmm2
+; SSE-64-NEXT:    subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; SSE-64-NEXT:    cvttps2dq %xmm0, %xmm0
-; SSE-64-NEXT:    xorps %xmm3, %xmm0
+; SSE-64-NEXT:    pand %xmm2, %xmm0
+; SSE-64-NEXT:    por %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
 ;
-; AVX-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vbroadcastss {{.*#+}} xmm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
-; AVX-NEXT:    vcmpltps %xmm1, %xmm0, %xmm2
-; AVX-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX-NEXT:    vbroadcastss {{.*#+}} xmm4 = [2147483648,2147483648,2147483648,2147483648]
-; AVX-NEXT:    vblendvps %xmm2, %xmm3, %xmm4, %xmm4
-; AVX-NEXT:    vblendvps %xmm2, %xmm3, %xmm1, %xmm1
-; AVX-NEXT:    vsubps %xmm1, %xmm0, %xmm0
-; AVX-NEXT:    vcvttps2dq %xmm0, %xmm0
-; AVX-NEXT:    vxorps %xmm4, %xmm0, %xmm0
-; AVX-NEXT:    ret{{[l|q]}}
+; AVX-32-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
+; AVX-32:       # %bb.0:
+; AVX-32-NEXT:    vcvttps2dq %xmm0, %xmm1
+; AVX-32-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-32-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX-32-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX-32-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; AVX-32-NEXT:    vpor %xmm0, %xmm1, %xmm0
+; AVX-32-NEXT:    retl
+;
+; AVX-64-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
+; AVX-64:       # %bb.0:
+; AVX-64-NEXT:    vcvttps2dq %xmm0, %xmm1
+; AVX-64-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-64-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX-64-NEXT:    vcvttps2dq %xmm0, %xmm0
+; AVX-64-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; AVX-64-NEXT:    vpor %xmm0, %xmm1, %xmm0
+; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512F-NEXT:    vcvttps2udq %zmm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -3089,7 +2835,7 @@ define <4 x i32> @strict_vector_fptoui_v4f32_to_v4i32(<4 x float> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v4f32_to_v4i32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttps2udq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
diff --git a/llvm/test/CodeGen/X86/vec-strict-fptoint-256-fp16.ll b/llvm/test/CodeGen/X86/vec-strict-fptoint-256-fp16.ll
index a232122e9c707..93209f0da8af8 100644
--- a/llvm/test/CodeGen/X86/vec-strict-fptoint-256-fp16.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-fptoint-256-fp16.ll
@@ -18,26 +18,13 @@ declare <16 x i1> @llvm.experimental.constrained.fptoui.v16i1.v16f16(<16 x half>
 define <4 x i64> @strict_vector_fptosi_v4f16_to_v4i64(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptosi_v4f16_to_v4i64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2qq %xmm0, %ymm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v4f16_to_v4i64:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2si %xmm1, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2si %xmm2, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm2
-; NOVL-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; NOVL-NEXT:    vcvttsh2si %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm2
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2si %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm0
-; NOVL-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; NOVL-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
+; NOVL-NEXT:    vcvttph2qq %xmm0, %zmm0
+; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; NOVL-NEXT:    retq
   %ret = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -47,26 +34,13 @@ define <4 x i64> @strict_vector_fptosi_v4f16_to_v4i64(<4 x half> %a) #0 {
 define <4 x i64> @strict_vector_fptoui_v4f16_to_v4i64(<4 x half> %a) #0 {
 ; CHECK-LABEL: strict_vector_fptoui_v4f16_to_v4i64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    vcvttph2uqq %xmm0, %ymm0
 ; CHECK-NEXT:    ret{{[l|q]}}
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v4f16_to_v4i64:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vpsrlq $48, %xmm0, %xmm1
-; NOVL-NEXT:    vcvttsh2usi %xmm1, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm1
-; NOVL-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; NOVL-NEXT:    vcvttsh2usi %xmm2, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm2
-; NOVL-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; NOVL-NEXT:    vcvttsh2usi %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm2
-; NOVL-NEXT:    vpsrld $16, %xmm0, %xmm0
-; NOVL-NEXT:    vcvttsh2usi %xmm0, %rax
-; NOVL-NEXT:    vmovq %rax, %xmm0
-; NOVL-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; NOVL-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
+; NOVL-NEXT:    vcvttph2uqq %xmm0, %zmm0
+; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; NOVL-NEXT:    retq
   %ret = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f16(<4 x half> %a,
                                               metadata !"fpexcept.strict") #0
@@ -82,8 +56,6 @@ define <8 x i32> @strict_vector_fptosi_v8f16_to_v8i32(<8 x half> %a) #0 {
 ; NOVL-LABEL: strict_vector_fptosi_v8f16_to_v8i32:
 ; NOVL:       # %bb.0:
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
 ; NOVL-NEXT:    vcvttph2dq %ymm0, %zmm0
 ; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; NOVL-NEXT:    retq
@@ -101,8 +73,6 @@ define <8 x i32> @strict_vector_fptoui_v8f16_to_v8i32(<8 x half> %a) #0 {
 ; NOVL-LABEL: strict_vector_fptoui_v8f16_to_v8i32:
 ; NOVL:       # %bb.0:
 ; NOVL-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vblendps {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
 ; NOVL-NEXT:    vcvttph2udq %ymm0, %zmm0
 ; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; NOVL-NEXT:    retq
@@ -119,8 +89,7 @@ define <16 x i16> @strict_vector_fptosi_v16f16_to_v16i16(<16 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptosi_v16f16_to_v16i16:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vinsertf64x4 $0, %ymm0, %zmm1, %zmm0
+; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; NOVL-NEXT:    vcvttph2w %zmm0, %zmm0
 ; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; NOVL-NEXT:    retq
@@ -137,8 +106,7 @@ define <16 x i16> @strict_vector_fptoui_v16f16_to_v16i16(<16 x half> %a) #0 {
 ;
 ; NOVL-LABEL: strict_vector_fptoui_v16f16_to_v16i16:
 ; NOVL:       # %bb.0:
-; NOVL-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; NOVL-NEXT:    vinsertf64x4 $0, %ymm0, %zmm1, %zmm0
+; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; NOVL-NEXT:    vcvttph2uw %zmm0, %zmm0
 ; NOVL-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; NOVL-NEXT:    retq
diff --git a/llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll b/llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
index 179e8ad69672b..f31a4208020c0 100644
--- a/llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
@@ -202,7 +202,7 @@ define <4 x i64> @strict_vector_fptosi_v4f64_to_v4i64(<4 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v4f64_to_v4i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2qq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -228,7 +228,7 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX-32-NEXT:    subl $32, %esp
 ; AVX-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX-32-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovapd %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB1_2
 ; AVX-32-NEXT:  # %bb.1:
@@ -245,7 +245,7 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
 ; AVX-32-NEXT:    vextractf128 $1, %ymm0, %xmm2
 ; AVX-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm3
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm3
 ; AVX-32-NEXT:    vmovapd %xmm1, %xmm4
 ; AVX-32-NEXT:    jae .LBB1_4
 ; AVX-32-NEXT:  # %bb.3:
@@ -260,7 +260,7 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX-32-NEXT:    movzbl %cl, %ecx
 ; AVX-32-NEXT:    shll $31, %ecx
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovapd %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB1_6
 ; AVX-32-NEXT:  # %bb.5:
@@ -275,7 +275,7 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX-32-NEXT:    movzbl %dl, %edx
 ; AVX-32-NEXT:    shll $31, %edx
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %edx
-; AVX-32-NEXT:    vcomisd %xmm1, %xmm0
+; AVX-32-NEXT:    vucomisd %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB1_8
 ; AVX-32-NEXT:  # %bb.7:
 ; AVX-32-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
@@ -305,65 +305,45 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ;
 ; AVX-64-LABEL: strict_vector_fptoui_v4f64_to_v4i64:
 ; AVX-64:       # %bb.0:
-; AVX-64-NEXT:    vextractf128 $1, %ymm0, %xmm3
+; AVX-64-NEXT:    vextractf128 $1, %ymm0, %xmm2
 ; AVX-64-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm3
-; AVX-64-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorpd %xmm4, %xmm4, %xmm4
-; AVX-64-NEXT:    jb .LBB1_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm4
-; AVX-64-NEXT:  .LBB1_2:
-; AVX-64-NEXT:    vsubsd %xmm4, %xmm3, %xmm4
-; AVX-64-NEXT:    vcvttsd2si %xmm4, %rcx
-; AVX-64-NEXT:    setae %al
-; AVX-64-NEXT:    movzbl %al, %eax
-; AVX-64-NEXT:    shlq $63, %rax
-; AVX-64-NEXT:    xorq %rcx, %rax
-; AVX-64-NEXT:    vshufpd {{.*#+}} xmm4 = xmm3[1,0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm4
-; AVX-64-NEXT:    vxorpd %xmm5, %xmm5, %xmm5
-; AVX-64-NEXT:    jb .LBB1_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm5
-; AVX-64-NEXT:  .LBB1_4:
-; AVX-64-NEXT:    vmovq %rax, %xmm3
-; AVX-64-NEXT:    vsubsd %xmm5, %xmm4, %xmm4
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm2, %xmm3
+; AVX-64-NEXT:    vcvttsd2si %xmm3, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm2, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm3
+; AVX-64-NEXT:    vshufpd {{.*#+}} xmm2 = xmm2[1,0]
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm2, %xmm4
 ; AVX-64-NEXT:    vcvttsd2si %xmm4, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm4
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX-64-NEXT:    vxorpd %xmm5, %xmm5, %xmm5
-; AVX-64-NEXT:    jb .LBB1_6
-; AVX-64-NEXT:  # %bb.5:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm5
-; AVX-64-NEXT:  .LBB1_6:
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm3 = xmm3[0],xmm4[0]
-; AVX-64-NEXT:    vsubsd %xmm5, %xmm0, %xmm4
-; AVX-64-NEXT:    vcvttsd2si %xmm4, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm4
+; AVX-64-NEXT:    vcvttsd2si %xmm2, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm2
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm0, %xmm3
+; AVX-64-NEXT:    vcvttsd2si %xmm3, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm3
 ; AVX-64-NEXT:    vshufpd {{.*#+}} xmm0 = xmm0[1,0]
-; AVX-64-NEXT:    vcomisd %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB1_8
-; AVX-64-NEXT:  # %bb.7:
-; AVX-64-NEXT:    vmovapd %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB1_8:
-; AVX-64-NEXT:    vsubsd %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttsd2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm4[0],xmm0[0]
-; AVX-64-NEXT:    vinsertf128 $1, %xmm3, %ymm0, %ymm0
+; AVX-64-NEXT:    vsubsd %xmm1, %xmm0, %xmm1
+; AVX-64-NEXT:    vcvttsd2si %xmm1, %rax
+; AVX-64-NEXT:    vcvttsd2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
+; AVX-64-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-32-LABEL: strict_vector_fptoui_v4f64_to_v4i64:
@@ -381,14 +361,14 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX512F-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
 ; AVX512F-32-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
 ; AVX512F-32-NEXT:    xorl %eax, %eax
-; AVX512F-32-NEXT:    vcomisd %xmm1, %xmm3
+; AVX512F-32-NEXT:    vucomisd %xmm1, %xmm3
 ; AVX512F-32-NEXT:    setae %al
 ; AVX512F-32-NEXT:    kmovw %eax, %k1
 ; AVX512F-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubsd %xmm4, %xmm3, %xmm3
 ; AVX512F-32-NEXT:    vmovsd %xmm3, (%esp)
 ; AVX512F-32-NEXT:    xorl %edx, %edx
-; AVX512F-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX512F-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX512F-32-NEXT:    setae %dl
 ; AVX512F-32-NEXT:    kmovw %edx, %k1
 ; AVX512F-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
@@ -396,14 +376,14 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX512F-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX512F-32-NEXT:    xorl %ecx, %ecx
-; AVX512F-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX512F-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX512F-32-NEXT:    setae %cl
 ; AVX512F-32-NEXT:    kmovw %ecx, %k1
 ; AVX512F-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
 ; AVX512F-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    xorl %ebx, %ebx
-; AVX512F-32-NEXT:    vcomisd %xmm1, %xmm0
+; AVX512F-32-NEXT:    vucomisd %xmm1, %xmm0
 ; AVX512F-32-NEXT:    setae %bl
 ; AVX512F-32-NEXT:    kmovw %ebx, %k1
 ; AVX512F-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm1 {%k1} {z}
@@ -474,14 +454,14 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
 ; AVX512VL-32-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm3
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm3
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovsd %xmm3, (%esp)
 ; AVX512VL-32-NEXT:    xorl %edx, %edx
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX512VL-32-NEXT:    setae %dl
 ; AVX512VL-32-NEXT:    kmovw %edx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
@@ -489,14 +469,14 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm2
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm2
 ; AVX512VL-32-NEXT:    setae %cl
 ; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    xorl %ebx, %ebx
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm0
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm0
 ; AVX512VL-32-NEXT:    setae %bl
 ; AVX512VL-32-NEXT:    kmovw %ebx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm1 {%k1} {z}
@@ -554,7 +534,7 @@ define <4 x i64> @strict_vector_fptoui_v4f64_to_v4i64(<4 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v4f64_to_v4i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2uqq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -733,7 +713,7 @@ define <4 x i64> @strict_vector_fptosi_v4f32_to_v4i64(<4 x float> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptosi_v4f32_to_v4i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512DQ-NEXT:    vcvttps2qq %ymm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -759,7 +739,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX-32-NEXT:    subl $32, %esp
 ; AVX-32-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovaps %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB3_2
 ; AVX-32-NEXT:  # %bb.1:
@@ -775,7 +755,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX-32-NEXT:    shll $31, %eax
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
 ; AVX-32-NEXT:    vshufps {{.*#+}} xmm2 = xmm0[3,3,3,3]
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovaps %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB3_4
 ; AVX-32-NEXT:  # %bb.3:
@@ -791,7 +771,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX-32-NEXT:    shll $31, %ecx
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX-32-NEXT:    vmovaps %xmm1, %xmm3
 ; AVX-32-NEXT:    jae .LBB3_6
 ; AVX-32-NEXT:  # %bb.5:
@@ -806,7 +786,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX-32-NEXT:    movzbl %dl, %edx
 ; AVX-32-NEXT:    shll $31, %edx
 ; AVX-32-NEXT:    xorl {{[0-9]+}}(%esp), %edx
-; AVX-32-NEXT:    vcomiss %xmm1, %xmm0
+; AVX-32-NEXT:    vucomiss %xmm1, %xmm0
 ; AVX-32-NEXT:    jae .LBB3_8
 ; AVX-32-NEXT:  # %bb.7:
 ; AVX-32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
@@ -836,65 +816,45 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ;
 ; AVX-64-LABEL: strict_vector_fptoui_v4f32_to_v4i64:
 ; AVX-64:       # %bb.0:
-; AVX-64-NEXT:    vshufps {{.*#+}} xmm3 = xmm0[3,3,3,3]
+; AVX-64-NEXT:    vshufps {{.*#+}} xmm2 = xmm0[3,3,3,3]
 ; AVX-64-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm3
-; AVX-64-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; AVX-64-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX-64-NEXT:    jb .LBB3_2
-; AVX-64-NEXT:  # %bb.1:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm4
-; AVX-64-NEXT:  .LBB3_2:
-; AVX-64-NEXT:    vsubss %xmm4, %xmm3, %xmm3
-; AVX-64-NEXT:    vcvttss2si %xmm3, %rcx
-; AVX-64-NEXT:    setae %al
-; AVX-64-NEXT:    movzbl %al, %eax
-; AVX-64-NEXT:    shlq $63, %rax
-; AVX-64-NEXT:    xorq %rcx, %rax
-; AVX-64-NEXT:    vshufpd {{.*#+}} xmm4 = xmm0[1,0]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm4
-; AVX-64-NEXT:    vxorps %xmm5, %xmm5, %xmm5
-; AVX-64-NEXT:    jb .LBB3_4
-; AVX-64-NEXT:  # %bb.3:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm5
-; AVX-64-NEXT:  .LBB3_4:
-; AVX-64-NEXT:    vmovq %rax, %xmm3
-; AVX-64-NEXT:    vsubss %xmm5, %xmm4, %xmm4
+; AVX-64-NEXT:    vsubss %xmm1, %xmm2, %xmm3
+; AVX-64-NEXT:    vcvttss2si %xmm3, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm2, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm2
+; AVX-64-NEXT:    vshufpd {{.*#+}} xmm3 = xmm0[1,0]
+; AVX-64-NEXT:    vsubss %xmm1, %xmm3, %xmm4
 ; AVX-64-NEXT:    vcvttss2si %xmm4, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm4
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    vxorps %xmm5, %xmm5, %xmm5
-; AVX-64-NEXT:    jb .LBB3_6
-; AVX-64-NEXT:  # %bb.5:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm5
-; AVX-64-NEXT:  .LBB3_6:
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm3 = xmm4[0],xmm3[0]
-; AVX-64-NEXT:    vsubss %xmm5, %xmm0, %xmm4
-; AVX-64-NEXT:    vcvttss2si %xmm4, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm4
+; AVX-64-NEXT:    vcvttss2si %xmm3, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm3
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
+; AVX-64-NEXT:    vsubss %xmm1, %xmm0, %xmm3
+; AVX-64-NEXT:    vcvttss2si %xmm3, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm3
 ; AVX-64-NEXT:    vmovshdup {{.*#+}} xmm0 = xmm0[1,1,3,3]
-; AVX-64-NEXT:    vcomiss %xmm1, %xmm0
-; AVX-64-NEXT:    jb .LBB3_8
-; AVX-64-NEXT:  # %bb.7:
-; AVX-64-NEXT:    vmovaps %xmm1, %xmm2
-; AVX-64-NEXT:  .LBB3_8:
-; AVX-64-NEXT:    vsubss %xmm2, %xmm0, %xmm0
-; AVX-64-NEXT:    vcvttss2si %xmm0, %rax
-; AVX-64-NEXT:    setae %cl
-; AVX-64-NEXT:    movzbl %cl, %ecx
-; AVX-64-NEXT:    shlq $63, %rcx
-; AVX-64-NEXT:    xorq %rax, %rcx
-; AVX-64-NEXT:    vmovq %rcx, %xmm0
-; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm4[0],xmm0[0]
-; AVX-64-NEXT:    vinsertf128 $1, %xmm3, %ymm0, %ymm0
+; AVX-64-NEXT:    vsubss %xmm1, %xmm0, %xmm1
+; AVX-64-NEXT:    vcvttss2si %xmm1, %rax
+; AVX-64-NEXT:    vcvttss2si %xmm0, %rcx
+; AVX-64-NEXT:    movq %rcx, %rdx
+; AVX-64-NEXT:    sarq $63, %rdx
+; AVX-64-NEXT:    andq %rax, %rdx
+; AVX-64-NEXT:    orq %rcx, %rdx
+; AVX-64-NEXT:    vmovq %rdx, %xmm0
+; AVX-64-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm3[0],xmm0[0]
+; AVX-64-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
 ; AVX-64-NEXT:    retq
 ;
 ; AVX512F-32-LABEL: strict_vector_fptoui_v4f32_to_v4i64:
@@ -911,7 +871,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX512F-32-NEXT:    vshufps {{.*#+}} xmm2 = xmm0[3,3,3,3]
 ; AVX512F-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512F-32-NEXT:    xorl %eax, %eax
-; AVX512F-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX512F-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX512F-32-NEXT:    setae %al
 ; AVX512F-32-NEXT:    kmovw %eax, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
@@ -919,7 +879,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX512F-32-NEXT:    vmovss %xmm2, (%esp)
 ; AVX512F-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX512F-32-NEXT:    xorl %edx, %edx
-; AVX512F-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX512F-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX512F-32-NEXT:    setae %dl
 ; AVX512F-32-NEXT:    kmovw %edx, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
@@ -927,14 +887,14 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX512F-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
 ; AVX512F-32-NEXT:    xorl %ecx, %ecx
-; AVX512F-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX512F-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX512F-32-NEXT:    setae %cl
 ; AVX512F-32-NEXT:    kmovw %ecx, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512F-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
 ; AVX512F-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512F-32-NEXT:    xorl %ebx, %ebx
-; AVX512F-32-NEXT:    vcomiss %xmm1, %xmm0
+; AVX512F-32-NEXT:    vucomiss %xmm1, %xmm0
 ; AVX512F-32-NEXT:    setae %bl
 ; AVX512F-32-NEXT:    kmovw %ebx, %k1
 ; AVX512F-32-NEXT:    vmovss %xmm1, %xmm1, %xmm1 {%k1} {z}
@@ -1004,7 +964,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    vshufps {{.*#+}} xmm2 = xmm0[3,3,3,3]
 ; AVX512VL-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
@@ -1012,7 +972,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    vmovss %xmm2, (%esp)
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
 ; AVX512VL-32-NEXT:    xorl %edx, %edx
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX512VL-32-NEXT:    setae %dl
 ; AVX512VL-32-NEXT:    kmovw %edx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
@@ -1020,14 +980,14 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
 ; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
 ; AVX512VL-32-NEXT:    setae %cl
 ; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    xorl %ebx, %ebx
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm0
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm0
 ; AVX512VL-32-NEXT:    setae %bl
 ; AVX512VL-32-NEXT:    kmovw %ebx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm1 {%k1} {z}
@@ -1085,7 +1045,7 @@ define <4 x i64> @strict_vector_fptoui_v4f32_to_v4i64(<4 x float> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v4f32_to_v4i64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512DQ-NEXT:    vcvttps2uqq %ymm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -1111,26 +1071,31 @@ define <4 x i32> @strict_vector_fptosi_v4f64_to_v4i32(<4 x double> %a) #0 {
 }
 
 define <4 x i32> @strict_vector_fptoui_v4f64_to_v4i32(<4 x double> %a) #0 {
-; AVX-LABEL: strict_vector_fptoui_v4f64_to_v4i32:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm1 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
-; AVX-NEXT:    vcmpltpd %ymm1, %ymm0, %ymm2
-; AVX-NEXT:    vextractf128 $1, %ymm2, %xmm3
-; AVX-NEXT:    vshufps {{.*#+}} xmm3 = xmm2[0,2],xmm3[0,2]
-; AVX-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX-NEXT:    vbroadcastss {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
-; AVX-NEXT:    vblendvps %xmm3, %xmm4, %xmm5, %xmm3
-; AVX-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX-NEXT:    vblendvpd %ymm2, %ymm4, %ymm1, %ymm1
-; AVX-NEXT:    vsubpd %ymm1, %ymm0, %ymm0
-; AVX-NEXT:    vcvttpd2dq %ymm0, %xmm0
-; AVX-NEXT:    vxorpd %xmm3, %xmm0, %xmm0
-; AVX-NEXT:    vzeroupper
-; AVX-NEXT:    ret{{[l|q]}}
+; AVX-32-LABEL: strict_vector_fptoui_v4f64_to_v4i32:
+; AVX-32:       # %bb.0:
+; AVX-32-NEXT:    vcvttpd2dq %ymm0, %xmm1
+; AVX-32-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-32-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0
+; AVX-32-NEXT:    vcvttpd2dq %ymm0, %xmm0
+; AVX-32-NEXT:    vandpd %xmm2, %xmm0, %xmm0
+; AVX-32-NEXT:    vorpd %xmm0, %xmm1, %xmm0
+; AVX-32-NEXT:    vzeroupper
+; AVX-32-NEXT:    retl
+;
+; AVX-64-LABEL: strict_vector_fptoui_v4f64_to_v4i32:
+; AVX-64:       # %bb.0:
+; AVX-64-NEXT:    vcvttpd2dq %ymm0, %xmm1
+; AVX-64-NEXT:    vpsrad $31, %xmm1, %xmm2
+; AVX-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX-64-NEXT:    vcvttpd2dq %ymm0, %xmm0
+; AVX-64-NEXT:    vandpd %xmm2, %xmm0, %xmm0
+; AVX-64-NEXT:    vorpd %xmm0, %xmm1, %xmm0
+; AVX-64-NEXT:    vzeroupper
+; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v4f64_to_v4i32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512F-NEXT:    vcvttpd2udq %zmm0, %ymm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512F-NEXT:    vzeroupper
@@ -1144,7 +1109,7 @@ define <4 x i32> @strict_vector_fptoui_v4f64_to_v4i32(<4 x double> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v4f64_to_v4i32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttpd2udq %zmm0, %ymm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1377,22 +1342,27 @@ define <8 x i32> @strict_vector_fptosi_v8f32_to_v8i32(<8 x float> %a) #0 {
 }
 
 define <8 x i32> @strict_vector_fptoui_v8f32_to_v8i32(<8 x float> %a) #0 {
-; AVX-LABEL: strict_vector_fptoui_v8f32_to_v8i32:
-; AVX:       # %bb.0:
-; AVX-NEXT:    vbroadcastss {{.*#+}} ymm1 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
-; AVX-NEXT:    vcmpltps %ymm1, %ymm0, %ymm2
-; AVX-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX-NEXT:    vbroadcastss {{.*#+}} ymm4 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
-; AVX-NEXT:    vblendvps %ymm2, %ymm3, %ymm4, %ymm4
-; AVX-NEXT:    vblendvps %ymm2, %ymm3, %ymm1, %ymm1
-; AVX-NEXT:    vsubps %ymm1, %ymm0, %ymm0
-; AVX-NEXT:    vcvttps2dq %ymm0, %ymm0
-; AVX-NEXT:    vxorps %ymm4, %ymm0, %ymm0
-; AVX-NEXT:    ret{{[l|q]}}
+; AVX-32-LABEL: strict_vector_fptoui_v8f32_to_v8i32:
+; AVX-32:       # %bb.0:
+; AVX-32-NEXT:    vcvttps2dq %ymm0, %ymm1
+; AVX-32-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0
+; AVX-32-NEXT:    vcvttps2dq %ymm0, %ymm0
+; AVX-32-NEXT:    vorps %ymm0, %ymm1, %ymm0
+; AVX-32-NEXT:    vblendvps %ymm1, %ymm0, %ymm1, %ymm0
+; AVX-32-NEXT:    retl
+;
+; AVX-64-LABEL: strict_vector_fptoui_v8f32_to_v8i32:
+; AVX-64:       # %bb.0:
+; AVX-64-NEXT:    vcvttps2dq %ymm0, %ymm1
+; AVX-64-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX-64-NEXT:    vcvttps2dq %ymm0, %ymm0
+; AVX-64-NEXT:    vorps %ymm0, %ymm1, %ymm0
+; AVX-64-NEXT:    vblendvps %ymm1, %ymm0, %ymm1, %ymm0
+; AVX-64-NEXT:    retq
 ;
 ; AVX512F-LABEL: strict_vector_fptoui_v8f32_to_v8i32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512F-NEXT:    vcvttps2udq %zmm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512F-NEXT:    ret{{[l|q]}}
@@ -1404,7 +1374,7 @@ define <8 x i32> @strict_vector_fptoui_v8f32_to_v8i32(<8 x float> %a) #0 {
 ;
 ; AVX512DQ-LABEL: strict_vector_fptoui_v8f32_to_v8i32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvttps2udq %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
diff --git a/llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll b/llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll
index ce5db5b246775..08215f8c5241e 100644
--- a/llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll
@@ -153,67 +153,67 @@ define <8 x i64> @strict_vector_fptoui_v8f64_to_v8i64(<8 x double> %a) #0 {
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
 ; AVX512VL-32-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm3
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm3
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    movl %eax, %esi
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovsd %xmm3, (%esp)
-; AVX512VL-32-NEXT:    xorl %ebx, %ebx
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %bl
-; AVX512VL-32-NEXT:    kmovw %ebx, %k1
+; AVX512VL-32-NEXT:    xorl %eax, %eax
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %al
+; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vextractf32x4 $2, %zmm0, %xmm2
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm3
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
+; AVX512VL-32-NEXT:    xorl %ecx, %ecx
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm3
+; AVX512VL-32-NEXT:    setae %cl
+; AVX512VL-32-NEXT:    kmovw %ecx, %k1
+; AVX512VL-32-NEXT:    movl %ecx, %edi
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovsd %xmm3, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    xorl %edx, %edx
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %dl
-; AVX512VL-32-NEXT:    kmovw %edx, %k1
+; AVX512VL-32-NEXT:    xorl %ecx, %ecx
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %cl
+; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vextractf128 $1, %ymm0, %xmm2
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm3
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
+; AVX512VL-32-NEXT:    xorl %edx, %edx
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm3
+; AVX512VL-32-NEXT:    setae %dl
+; AVX512VL-32-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; AVX512VL-32-NEXT:    kmovw %edx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovsd %xmm3, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %cl
-; AVX512VL-32-NEXT:    kmovw %ecx, %k1
+; AVX512VL-32-NEXT:    xorl %edx, %edx
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %dl
+; AVX512VL-32-NEXT:    kmovw %edx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
-; AVX512VL-32-NEXT:    movl %eax, %edi
+; AVX512VL-32-NEXT:    xorl %ebx, %ebx
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %bl
+; AVX512VL-32-NEXT:    movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; AVX512VL-32-NEXT:    kmovw %ebx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovsd %xmm2, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomisd %xmm1, %xmm0
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
+; AVX512VL-32-NEXT:    xorl %ebx, %ebx
+; AVX512VL-32-NEXT:    vucomisd %xmm1, %xmm0
+; AVX512VL-32-NEXT:    setae %bl
+; AVX512VL-32-NEXT:    kmovw %ebx, %k1
 ; AVX512VL-32-NEXT:    vmovsd %xmm1, %xmm1, %xmm1 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubsd %xmm1, %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vmovsd %xmm0, {{[0-9]+}}(%esp)
@@ -234,40 +234,40 @@ define <8 x i64> @strict_vector_fptoui_v8f64_to_v8i64(<8 x double> %a) #0 {
 ; AVX512VL-32-NEXT:    fldl {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    wait
-; AVX512VL-32-NEXT:    shll $31, %ebx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ebx
+; AVX512VL-32-NEXT:    shll $31, %eax
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %ebx, %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vpinsrd $1, %eax, %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    shll $31, %esi
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %esi
 ; AVX512VL-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vpinsrd $3, %esi, %xmm0, %xmm0
-; AVX512VL-32-NEXT:    shll $31, %edx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edx
+; AVX512VL-32-NEXT:    shll $31, %ecx
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %edx, %xmm1, %xmm1
-; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
+; AVX512VL-32-NEXT:    vpinsrd $1, %ecx, %xmm1, %xmm1
+; AVX512VL-32-NEXT:    shll $31, %edi
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edi
+; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm1, %xmm1
+; AVX512VL-32-NEXT:    vpinsrd $3, %edi, %xmm1, %xmm1
 ; AVX512VL-32-NEXT:    shll $31, %edx
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edx
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm1, %xmm1
-; AVX512VL-32-NEXT:    vpinsrd $3, %edx, %xmm1, %xmm1
-; AVX512VL-32-NEXT:    shll $31, %ecx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %ecx, %xmm2, %xmm2
-; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
-; AVX512VL-32-NEXT:    shll $31, %ecx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm2, %xmm2
-; AVX512VL-32-NEXT:    vpinsrd $3, %ecx, %xmm2, %xmm2
+; AVX512VL-32-NEXT:    vpinsrd $1, %edx, %xmm2, %xmm2
+; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
 ; AVX512VL-32-NEXT:    shll $31, %eax
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
+; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm2, %xmm2
+; AVX512VL-32-NEXT:    vpinsrd $3, %eax, %xmm2, %xmm2
+; AVX512VL-32-NEXT:    shll $31, %ebx
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ebx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm3 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %eax, %xmm3, %xmm3
-; AVX512VL-32-NEXT:    shll $31, %edi
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edi
+; AVX512VL-32-NEXT:    vpinsrd $1, %ebx, %xmm3, %xmm3
+; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; AVX512VL-32-NEXT:    shll $31, %eax
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
 ; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm3, %xmm3
-; AVX512VL-32-NEXT:    vpinsrd $3, %edi, %xmm3, %xmm3
+; AVX512VL-32-NEXT:    vpinsrd $3, %eax, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
 ; AVX512VL-32-NEXT:    vinserti128 $1, %xmm2, %ymm3, %ymm1
 ; AVX512VL-32-NEXT:    vinserti64x4 $1, %ymm0, %zmm1, %zmm0
@@ -445,7 +445,7 @@ define <8 x i64> @strict_vector_fptoui_v8f32_to_v8i64(<8 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    vshufps {{.*#+}} xmm3 = xmm2[3,3,3,3]
 ; AVX512VL-32-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
 ; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm3
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm3
 ; AVX512VL-32-NEXT:    setae %al
 ; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    movl %eax, %esi
@@ -453,59 +453,59 @@ define <8 x i64> @strict_vector_fptoui_v8f32_to_v8i64(<8 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    vsubss %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovss %xmm3, (%esp)
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm3 = xmm2[1,0]
-; AVX512VL-32-NEXT:    xorl %ebx, %ebx
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm3
-; AVX512VL-32-NEXT:    setae %bl
-; AVX512VL-32-NEXT:    kmovw %ebx, %k1
+; AVX512VL-32-NEXT:    xorl %eax, %eax
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm3
+; AVX512VL-32-NEXT:    setae %al
+; AVX512VL-32-NEXT:    kmovw %eax, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovss %xmm3, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vmovshdup {{.*#+}} xmm3 = xmm2[1,1,3,3]
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm3
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
+; AVX512VL-32-NEXT:    xorl %ecx, %ecx
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm3
+; AVX512VL-32-NEXT:    setae %cl
+; AVX512VL-32-NEXT:    kmovw %ecx, %k1
+; AVX512VL-32-NEXT:    movl %ecx, %edi
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm4 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm4, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vmovss %xmm3, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    xorl %edx, %edx
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %dl
-; AVX512VL-32-NEXT:    kmovw %edx, %k1
+; AVX512VL-32-NEXT:    xorl %ecx, %ecx
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %cl
+; AVX512VL-32-NEXT:    kmovw %ecx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vshufps {{.*#+}} xmm2 = xmm0[3,3,3,3]
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
+; AVX512VL-32-NEXT:    xorl %edx, %edx
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %dl
+; AVX512VL-32-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; AVX512VL-32-NEXT:    kmovw %edx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vshufpd {{.*#+}} xmm2 = xmm0[1,0]
-; AVX512VL-32-NEXT:    xorl %ecx, %ecx
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %cl
-; AVX512VL-32-NEXT:    kmovw %ecx, %k1
+; AVX512VL-32-NEXT:    xorl %edx, %edx
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %dl
+; AVX512VL-32-NEXT:    kmovw %edx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm2
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
-; AVX512VL-32-NEXT:    movl %eax, %edi
+; AVX512VL-32-NEXT:    xorl %ebx, %ebx
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm2
+; AVX512VL-32-NEXT:    setae %bl
+; AVX512VL-32-NEXT:    movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; AVX512VL-32-NEXT:    kmovw %ebx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm3 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm3, %xmm2, %xmm2
 ; AVX512VL-32-NEXT:    vmovss %xmm2, {{[0-9]+}}(%esp)
-; AVX512VL-32-NEXT:    xorl %eax, %eax
-; AVX512VL-32-NEXT:    vcomiss %xmm1, %xmm0
-; AVX512VL-32-NEXT:    setae %al
-; AVX512VL-32-NEXT:    kmovw %eax, %k1
+; AVX512VL-32-NEXT:    xorl %ebx, %ebx
+; AVX512VL-32-NEXT:    vucomiss %xmm1, %xmm0
+; AVX512VL-32-NEXT:    setae %bl
+; AVX512VL-32-NEXT:    kmovw %ebx, %k1
 ; AVX512VL-32-NEXT:    vmovss %xmm1, %xmm1, %xmm1 {%k1} {z}
 ; AVX512VL-32-NEXT:    vsubss %xmm1, %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vmovss %xmm0, {{[0-9]+}}(%esp)
@@ -526,40 +526,40 @@ define <8 x i64> @strict_vector_fptoui_v8f32_to_v8i64(<8 x float> %a) #0 {
 ; AVX512VL-32-NEXT:    flds {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    fisttpll {{[0-9]+}}(%esp)
 ; AVX512VL-32-NEXT:    wait
-; AVX512VL-32-NEXT:    shll $31, %ebx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ebx
+; AVX512VL-32-NEXT:    shll $31, %eax
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %ebx, %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vpinsrd $1, %eax, %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    shll $31, %esi
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %esi
 ; AVX512VL-32-NEXT:    vpinsrd $2, (%esp), %xmm0, %xmm0
 ; AVX512VL-32-NEXT:    vpinsrd $3, %esi, %xmm0, %xmm0
-; AVX512VL-32-NEXT:    shll $31, %edx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edx
+; AVX512VL-32-NEXT:    shll $31, %ecx
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %edx, %xmm1, %xmm1
-; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
+; AVX512VL-32-NEXT:    vpinsrd $1, %ecx, %xmm1, %xmm1
+; AVX512VL-32-NEXT:    shll $31, %edi
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edi
+; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm1, %xmm1
+; AVX512VL-32-NEXT:    vpinsrd $3, %edi, %xmm1, %xmm1
 ; AVX512VL-32-NEXT:    shll $31, %edx
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edx
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm1, %xmm1
-; AVX512VL-32-NEXT:    vpinsrd $3, %edx, %xmm1, %xmm1
-; AVX512VL-32-NEXT:    shll $31, %ecx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %ecx, %xmm2, %xmm2
-; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
-; AVX512VL-32-NEXT:    shll $31, %ecx
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ecx
-; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm2, %xmm2
-; AVX512VL-32-NEXT:    vpinsrd $3, %ecx, %xmm2, %xmm2
+; AVX512VL-32-NEXT:    vpinsrd $1, %edx, %xmm2, %xmm2
+; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
 ; AVX512VL-32-NEXT:    shll $31, %eax
 ; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
+; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm2, %xmm2
+; AVX512VL-32-NEXT:    vpinsrd $3, %eax, %xmm2, %xmm2
+; AVX512VL-32-NEXT:    shll $31, %ebx
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %ebx
 ; AVX512VL-32-NEXT:    vmovd {{.*#+}} xmm3 = mem[0],zero,zero,zero
-; AVX512VL-32-NEXT:    vpinsrd $1, %eax, %xmm3, %xmm3
-; AVX512VL-32-NEXT:    shll $31, %edi
-; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %edi
+; AVX512VL-32-NEXT:    vpinsrd $1, %ebx, %xmm3, %xmm3
+; AVX512VL-32-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
+; AVX512VL-32-NEXT:    shll $31, %eax
+; AVX512VL-32-NEXT:    xorl {{[0-9]+}}(%esp), %eax
 ; AVX512VL-32-NEXT:    vpinsrd $2, {{[0-9]+}}(%esp), %xmm3, %xmm3
-; AVX512VL-32-NEXT:    vpinsrd $3, %edi, %xmm3, %xmm3
+; AVX512VL-32-NEXT:    vpinsrd $3, %eax, %xmm3, %xmm3
 ; AVX512VL-32-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
 ; AVX512VL-32-NEXT:    vinserti128 $1, %xmm2, %ymm3, %ymm1
 ; AVX512VL-32-NEXT:    vinserti64x4 $1, %ymm0, %zmm1, %zmm0
diff --git a/llvm/test/CodeGen/X86/vec-strict-inttofp-128.ll b/llvm/test/CodeGen/X86/vec-strict-inttofp-128.ll
index cd4ceca6716b1..d9f79603317e9 100644
--- a/llvm/test/CodeGen/X86/vec-strict-inttofp-128.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-inttofp-128.ll
@@ -40,19 +40,16 @@ declare <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64>
 define <2 x float> @sitofp_v2i32_v2f32(<2 x i32> %x) #0 {
 ; SSE-LABEL: sitofp_v2i32_v2f32:
 ; SSE:       # %bb.0:
-; SSE-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE-NEXT:    cvtdq2ps %xmm0, %xmm0
 ; SSE-NEXT:    ret{{[l|q]}}
 ;
 ; SSE41-LABEL: sitofp_v2i32_v2f32:
 ; SSE41:       # %bb.0:
-; SSE41-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; SSE41-NEXT:    cvtdq2ps %xmm0, %xmm0
 ; SSE41-NEXT:    ret{{[l|q]}}
 ;
 ; AVX-LABEL: sitofp_v2i32_v2f32:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvtdq2ps %xmm0, %xmm0
 ; AVX-NEXT:    ret{{[l|q]}}
  %result = call <2 x float> @llvm.experimental.constrained.sitofp.v2f32.v2i32(<2 x i32> %x,
@@ -94,7 +91,7 @@ define <2 x float> @uitofp_v2i32_v2f32(<2 x i32> %x) #0 {
 ;
 ; AVX512F-LABEL: uitofp_v2i32_v2f32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512F-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -102,13 +99,12 @@ define <2 x float> @uitofp_v2i32_v2f32(<2 x i32> %x) #0 {
 ;
 ; AVX512VL-LABEL: uitofp_v2i32_v2f32:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512VL-NEXT:    vcvtudq2ps %xmm0, %xmm0
 ; AVX512VL-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQ-LABEL: uitofp_v2i32_v2f32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -116,7 +112,6 @@ define <2 x float> @uitofp_v2i32_v2f32(<2 x i32> %x) #0 {
 ;
 ; AVX512DQVL-LABEL: uitofp_v2i32_v2f32:
 ; AVX512DQVL:       # %bb.0:
-; AVX512DQVL-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX512DQVL-NEXT:    vcvtudq2ps %xmm0, %xmm0
 ; AVX512DQVL-NEXT:    ret{{[l|q]}}
  %result = call <2 x float> @llvm.experimental.constrained.uitofp.v2f32.v2i32(<2 x i32> %x,
@@ -213,12 +208,12 @@ define <2 x float> @sitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; AVX-32-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[2,3,2,3]
 ; AVX-32-NEXT:    vmovlps %xmm0, {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fstps (%esp)
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
 ; AVX-32-NEXT:    fstps {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
+; AVX-32-NEXT:    fstps (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-32-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
+; AVX-32-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],zero,zero
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa %esp, 4
@@ -230,27 +225,16 @@ define <2 x float> @sitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; AVX-64-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
 ; AVX-64-NEXT:    vmovq %xmm0, %rax
 ; AVX-64-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; AVX-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
+; AVX-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
 ; AVX-64-NEXT:    retq
 ;
-; AVX512DQ-32-LABEL: sitofp_v2i64_v2f32:
-; AVX512DQ-32:       # %bb.0:
-; AVX512DQ-32-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
-; AVX512DQ-32-NEXT:    vcvtqq2ps %zmm0, %ymm1
-; AVX512DQ-32-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[2,3,2,3]
-; AVX512DQ-32-NEXT:    vcvtqq2ps %zmm0, %ymm0
-; AVX512DQ-32-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],zero,zero
-; AVX512DQ-32-NEXT:    vzeroupper
-; AVX512DQ-32-NEXT:    retl
-;
-; AVX512DQ-64-LABEL: sitofp_v2i64_v2f32:
-; AVX512DQ-64:       # %bb.0:
-; AVX512DQ-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512DQ-64-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
-; AVX512DQ-64-NEXT:    vmovq %xmm0, %rax
-; AVX512DQ-64-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; AVX512DQ-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
-; AVX512DQ-64-NEXT:    retq
+; AVX512DQ-LABEL: sitofp_v2i64_v2f32:
+; AVX512DQ:       # %bb.0:
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtqq2ps %zmm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
+; AVX512DQ-NEXT:    vzeroupper
+; AVX512DQ-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQVL-LABEL: sitofp_v2i64_v2f32:
 ; AVX512DQVL:       # %bb.0:
@@ -301,34 +285,40 @@ define <2 x float> @uitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; SSE-64:       # %bb.0:
 ; SSE-64-NEXT:    movdqa %xmm0, %xmm1
 ; SSE-64-NEXT:    movq %xmm0, %rax
-; SSE-64-NEXT:    movq %rax, %rcx
-; SSE-64-NEXT:    shrq %rcx
-; SSE-64-NEXT:    movl %eax, %edx
-; SSE-64-NEXT:    andl $1, %edx
-; SSE-64-NEXT:    orq %rcx, %rdx
 ; SSE-64-NEXT:    testq %rax, %rax
-; SSE-64-NEXT:    cmovnsq %rax, %rdx
+; SSE-64-NEXT:    js .LBB3_1
+; SSE-64-NEXT:  # %bb.2:
 ; SSE-64-NEXT:    xorps %xmm0, %xmm0
-; SSE-64-NEXT:    cvtsi2ss %rdx, %xmm0
-; SSE-64-NEXT:    jns .LBB3_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    addss %xmm0, %xmm0
-; SSE-64-NEXT:  .LBB3_2:
+; SSE-64-NEXT:    cvtsi2ss %rax, %xmm0
 ; SSE-64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
 ; SSE-64-NEXT:    movq %xmm1, %rax
+; SSE-64-NEXT:    testq %rax, %rax
+; SSE-64-NEXT:    jns .LBB3_5
+; SSE-64-NEXT:  .LBB3_4:
 ; SSE-64-NEXT:    movq %rax, %rcx
 ; SSE-64-NEXT:    shrq %rcx
-; SSE-64-NEXT:    movl %eax, %edx
-; SSE-64-NEXT:    andl $1, %edx
-; SSE-64-NEXT:    orq %rcx, %rdx
-; SSE-64-NEXT:    testq %rax, %rax
-; SSE-64-NEXT:    cmovnsq %rax, %rdx
+; SSE-64-NEXT:    andl $1, %eax
+; SSE-64-NEXT:    orq %rcx, %rax
 ; SSE-64-NEXT:    xorps %xmm1, %xmm1
-; SSE-64-NEXT:    cvtsi2ss %rdx, %xmm1
-; SSE-64-NEXT:    jns .LBB3_4
-; SSE-64-NEXT:  # %bb.3:
+; SSE-64-NEXT:    cvtsi2ss %rax, %xmm1
 ; SSE-64-NEXT:    addss %xmm1, %xmm1
-; SSE-64-NEXT:  .LBB3_4:
+; SSE-64-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE-64-NEXT:    retq
+; SSE-64-NEXT:  .LBB3_1:
+; SSE-64-NEXT:    movq %rax, %rcx
+; SSE-64-NEXT:    shrq %rcx
+; SSE-64-NEXT:    andl $1, %eax
+; SSE-64-NEXT:    orq %rcx, %rax
+; SSE-64-NEXT:    xorps %xmm0, %xmm0
+; SSE-64-NEXT:    cvtsi2ss %rax, %xmm0
+; SSE-64-NEXT:    addss %xmm0, %xmm0
+; SSE-64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
+; SSE-64-NEXT:    movq %xmm1, %rax
+; SSE-64-NEXT:    testq %rax, %rax
+; SSE-64-NEXT:    js .LBB3_4
+; SSE-64-NEXT:  .LBB3_5:
+; SSE-64-NEXT:    xorps %xmm1, %xmm1
+; SSE-64-NEXT:    cvtsi2ss %rax, %xmm1
 ; SSE-64-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
 ; SSE-64-NEXT:    retq
 ;
@@ -370,34 +360,40 @@ define <2 x float> @uitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; SSE41-64:       # %bb.0:
 ; SSE41-64-NEXT:    movdqa %xmm0, %xmm1
 ; SSE41-64-NEXT:    movq %xmm0, %rax
-; SSE41-64-NEXT:    movq %rax, %rcx
-; SSE41-64-NEXT:    shrq %rcx
-; SSE41-64-NEXT:    movl %eax, %edx
-; SSE41-64-NEXT:    andl $1, %edx
-; SSE41-64-NEXT:    orq %rcx, %rdx
 ; SSE41-64-NEXT:    testq %rax, %rax
-; SSE41-64-NEXT:    cmovnsq %rax, %rdx
+; SSE41-64-NEXT:    js .LBB3_1
+; SSE41-64-NEXT:  # %bb.2:
 ; SSE41-64-NEXT:    xorps %xmm0, %xmm0
-; SSE41-64-NEXT:    cvtsi2ss %rdx, %xmm0
-; SSE41-64-NEXT:    jns .LBB3_2
-; SSE41-64-NEXT:  # %bb.1:
-; SSE41-64-NEXT:    addss %xmm0, %xmm0
-; SSE41-64-NEXT:  .LBB3_2:
+; SSE41-64-NEXT:    cvtsi2ss %rax, %xmm0
 ; SSE41-64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
 ; SSE41-64-NEXT:    movq %xmm1, %rax
+; SSE41-64-NEXT:    testq %rax, %rax
+; SSE41-64-NEXT:    jns .LBB3_5
+; SSE41-64-NEXT:  .LBB3_4:
 ; SSE41-64-NEXT:    movq %rax, %rcx
 ; SSE41-64-NEXT:    shrq %rcx
-; SSE41-64-NEXT:    movl %eax, %edx
-; SSE41-64-NEXT:    andl $1, %edx
-; SSE41-64-NEXT:    orq %rcx, %rdx
-; SSE41-64-NEXT:    testq %rax, %rax
-; SSE41-64-NEXT:    cmovnsq %rax, %rdx
+; SSE41-64-NEXT:    andl $1, %eax
+; SSE41-64-NEXT:    orq %rcx, %rax
 ; SSE41-64-NEXT:    xorps %xmm1, %xmm1
-; SSE41-64-NEXT:    cvtsi2ss %rdx, %xmm1
-; SSE41-64-NEXT:    jns .LBB3_4
-; SSE41-64-NEXT:  # %bb.3:
+; SSE41-64-NEXT:    cvtsi2ss %rax, %xmm1
 ; SSE41-64-NEXT:    addss %xmm1, %xmm1
-; SSE41-64-NEXT:  .LBB3_4:
+; SSE41-64-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE41-64-NEXT:    retq
+; SSE41-64-NEXT:  .LBB3_1:
+; SSE41-64-NEXT:    movq %rax, %rcx
+; SSE41-64-NEXT:    shrq %rcx
+; SSE41-64-NEXT:    andl $1, %eax
+; SSE41-64-NEXT:    orq %rcx, %rax
+; SSE41-64-NEXT:    xorps %xmm0, %xmm0
+; SSE41-64-NEXT:    cvtsi2ss %rax, %xmm0
+; SSE41-64-NEXT:    addss %xmm0, %xmm0
+; SSE41-64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
+; SSE41-64-NEXT:    movq %xmm1, %rax
+; SSE41-64-NEXT:    testq %rax, %rax
+; SSE41-64-NEXT:    js .LBB3_4
+; SSE41-64-NEXT:  .LBB3_5:
+; SSE41-64-NEXT:    xorps %xmm1, %xmm1
+; SSE41-64-NEXT:    cvtsi2ss %rax, %xmm1
 ; SSE41-64-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
 ; SSE41-64-NEXT:    retq
 ;
@@ -426,7 +422,7 @@ define <2 x float> @uitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; AVX-32-NEXT:    fstps (%esp)
 ; AVX-32-NEXT:    wait
 ; AVX-32-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-32-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
+; AVX-32-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],zero,zero
 ; AVX-32-NEXT:    movl %ebp, %esp
 ; AVX-32-NEXT:    popl %ebp
 ; AVX-32-NEXT:    .cfi_def_cfa %esp, 4
@@ -456,7 +452,7 @@ define <2 x float> @uitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; AVX512F-64-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
 ; AVX512F-64-NEXT:    vmovq %xmm0, %rax
 ; AVX512F-64-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
-; AVX512F-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
+; AVX512F-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
 ; AVX512F-64-NEXT:    retq
 ;
 ; AVX512VL-64-LABEL: uitofp_v2i64_v2f32:
@@ -465,27 +461,16 @@ define <2 x float> @uitofp_v2i64_v2f32(<2 x i64> %x) #0 {
 ; AVX512VL-64-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
 ; AVX512VL-64-NEXT:    vmovq %xmm0, %rax
 ; AVX512VL-64-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
-; AVX512VL-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
+; AVX512VL-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
 ; AVX512VL-64-NEXT:    retq
 ;
-; AVX512DQ-32-LABEL: uitofp_v2i64_v2f32:
-; AVX512DQ-32:       # %bb.0:
-; AVX512DQ-32-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
-; AVX512DQ-32-NEXT:    vcvtuqq2ps %zmm0, %ymm1
-; AVX512DQ-32-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[2,3,2,3]
-; AVX512DQ-32-NEXT:    vcvtuqq2ps %zmm0, %ymm0
-; AVX512DQ-32-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],zero,zero
-; AVX512DQ-32-NEXT:    vzeroupper
-; AVX512DQ-32-NEXT:    retl
-;
-; AVX512DQ-64-LABEL: uitofp_v2i64_v2f32:
-; AVX512DQ-64:       # %bb.0:
-; AVX512DQ-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512DQ-64-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
-; AVX512DQ-64-NEXT:    vmovq %xmm0, %rax
-; AVX512DQ-64-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
-; AVX512DQ-64-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
-; AVX512DQ-64-NEXT:    retq
+; AVX512DQ-LABEL: uitofp_v2i64_v2f32:
+; AVX512DQ:       # %bb.0:
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtuqq2ps %zmm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
+; AVX512DQ-NEXT:    vzeroupper
+; AVX512DQ-NEXT:    ret{{[l|q]}}
 ;
 ; AVX512DQVL-LABEL: uitofp_v2i64_v2f32:
 ; AVX512DQVL:       # %bb.0:
@@ -798,7 +783,7 @@ define <4 x float> @uitofp_v4i32_v4f32(<4 x i32> %x) #0 {
 ;
 ; AVX512F-LABEL: uitofp_v4i32_v4f32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512F-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -811,7 +796,7 @@ define <4 x float> @uitofp_v4i32_v4f32(<4 x i32> %x) #0 {
 ;
 ; AVX512DQ-LABEL: uitofp_v4i32_v4f32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -953,7 +938,7 @@ define <2 x double> @sitofp_v2i8_v2f64(<2 x i8> %x) #0 {
 ; SSE-LABEL: sitofp_v2i8_v2f64:
 ; SSE:       # %bb.0:
 ; SSE-NEXT:    punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
-; SSE-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
+; SSE-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,0,2,1,4,5,6,7]
 ; SSE-NEXT:    psrad $24, %xmm0
 ; SSE-NEXT:    cvtdq2pd %xmm0, %xmm0
 ; SSE-NEXT:    ret{{[l|q]}}
@@ -961,7 +946,7 @@ define <2 x double> @sitofp_v2i8_v2f64(<2 x i8> %x) #0 {
 ; SSE41-LABEL: sitofp_v2i8_v2f64:
 ; SSE41:       # %bb.0:
 ; SSE41-NEXT:    punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
-; SSE41-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
+; SSE41-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,0,2,1,4,5,6,7]
 ; SSE41-NEXT:    psrad $24, %xmm0
 ; SSE41-NEXT:    cvtdq2pd %xmm0, %xmm0
 ; SSE41-NEXT:    ret{{[l|q]}}
@@ -1008,14 +993,14 @@ define <2 x double> @uitofp_v2i8_v2f64(<2 x i8> %x) #0 {
 define <2 x double> @sitofp_v2i16_v2f64(<2 x i16> %x) #0 {
 ; SSE-LABEL: sitofp_v2i16_v2f64:
 ; SSE:       # %bb.0:
-; SSE-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
+; SSE-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,0,2,1,4,5,6,7]
 ; SSE-NEXT:    psrad $16, %xmm0
 ; SSE-NEXT:    cvtdq2pd %xmm0, %xmm0
 ; SSE-NEXT:    ret{{[l|q]}}
 ;
 ; SSE41-LABEL: sitofp_v2i16_v2f64:
 ; SSE41:       # %bb.0:
-; SSE41-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
+; SSE41-NEXT:    pshuflw {{.*#+}} xmm0 = xmm0[0,0,2,1,4,5,6,7]
 ; SSE41-NEXT:    psrad $16, %xmm0
 ; SSE41-NEXT:    cvtdq2pd %xmm0, %xmm0
 ; SSE41-NEXT:    ret{{[l|q]}}
@@ -1108,7 +1093,7 @@ define <2 x double> @uitofp_v2i32_v2f64(<2 x i32> %x) #0 {
 ;
 ; AVX512F-LABEL: uitofp_v2i32_v2f64:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512F-NEXT:    vcvtudq2pd %ymm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512F-NEXT:    vzeroupper
@@ -1121,7 +1106,7 @@ define <2 x double> @uitofp_v2i32_v2f64(<2 x i32> %x) #0 {
 ;
 ; AVX512DQ-LABEL: uitofp_v2i32_v2f64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512DQ-NEXT:    vcvtudq2pd %ymm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1245,7 +1230,7 @@ define <2 x double> @sitofp_v2i64_v2f64(<2 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: sitofp_v2i64_v2f64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1264,223 +1249,106 @@ define <2 x double> @sitofp_v2i64_v2f64(<2 x i64> %x) #0 {
 define <2 x double> @uitofp_v2i64_v2f64(<2 x i64> %x) #0 {
 ; SSE-32-LABEL: uitofp_v2i64_v2f64:
 ; SSE-32:       # %bb.0:
-; SSE-32-NEXT:    pushl %ebp
-; SSE-32-NEXT:    .cfi_def_cfa_offset 8
-; SSE-32-NEXT:    .cfi_offset %ebp, -8
-; SSE-32-NEXT:    movl %esp, %ebp
-; SSE-32-NEXT:    .cfi_def_cfa_register %ebp
-; SSE-32-NEXT:    andl $-8, %esp
-; SSE-32-NEXT:    subl $32, %esp
-; SSE-32-NEXT:    movq %xmm0, {{[0-9]+}}(%esp)
-; SSE-32-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; SSE-32-NEXT:    movq %xmm1, {{[0-9]+}}(%esp)
-; SSE-32-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]
-; SSE-32-NEXT:    movd %xmm1, %eax
-; SSE-32-NEXT:    shrl $31, %eax
-; SSE-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; SSE-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; SSE-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; SSE-32-NEXT:    wait
-; SSE-32-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; SSE-32-NEXT:    movd %xmm0, %eax
-; SSE-32-NEXT:    shrl $31, %eax
-; SSE-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; SSE-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; SSE-32-NEXT:    fstpl (%esp)
-; SSE-32-NEXT:    wait
-; SSE-32-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; SSE-32-NEXT:    movhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
-; SSE-32-NEXT:    movl %ebp, %esp
-; SSE-32-NEXT:    popl %ebp
-; SSE-32-NEXT:    .cfi_def_cfa %esp, 4
+; SSE-32-NEXT:    movdqa {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; SSE-32-NEXT:    pand %xmm0, %xmm1
+; SSE-32-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
+; SSE-32-NEXT:    psrlq $32, %xmm0
+; SSE-32-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-32-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE-32-NEXT:    addpd %xmm1, %xmm0
 ; SSE-32-NEXT:    retl
 ;
 ; SSE-64-LABEL: uitofp_v2i64_v2f64:
 ; SSE-64:       # %bb.0:
-; SSE-64-NEXT:    movdqa %xmm0, %xmm1
-; SSE-64-NEXT:    movq %xmm0, %rax
-; SSE-64-NEXT:    movq %rax, %rcx
-; SSE-64-NEXT:    shrq %rcx
-; SSE-64-NEXT:    movl %eax, %edx
-; SSE-64-NEXT:    andl $1, %edx
-; SSE-64-NEXT:    orq %rcx, %rdx
-; SSE-64-NEXT:    testq %rax, %rax
-; SSE-64-NEXT:    cmovnsq %rax, %rdx
-; SSE-64-NEXT:    xorps %xmm0, %xmm0
-; SSE-64-NEXT:    cvtsi2sd %rdx, %xmm0
-; SSE-64-NEXT:    jns .LBB21_2
-; SSE-64-NEXT:  # %bb.1:
-; SSE-64-NEXT:    addsd %xmm0, %xmm0
-; SSE-64-NEXT:  .LBB21_2:
-; SSE-64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
-; SSE-64-NEXT:    movq %xmm1, %rax
-; SSE-64-NEXT:    movq %rax, %rcx
-; SSE-64-NEXT:    shrq %rcx
-; SSE-64-NEXT:    movl %eax, %edx
-; SSE-64-NEXT:    andl $1, %edx
-; SSE-64-NEXT:    orq %rcx, %rdx
-; SSE-64-NEXT:    testq %rax, %rax
-; SSE-64-NEXT:    cmovnsq %rax, %rdx
-; SSE-64-NEXT:    xorps %xmm1, %xmm1
-; SSE-64-NEXT:    cvtsi2sd %rdx, %xmm1
-; SSE-64-NEXT:    jns .LBB21_4
-; SSE-64-NEXT:  # %bb.3:
-; SSE-64-NEXT:    addsd %xmm1, %xmm1
-; SSE-64-NEXT:  .LBB21_4:
-; SSE-64-NEXT:    unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; SSE-64-NEXT:    movdqa {{.*#+}} xmm1 = [4294967295,4294967295]
+; SSE-64-NEXT:    pand %xmm0, %xmm1
+; SSE-64-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE-64-NEXT:    psrlq $32, %xmm0
+; SSE-64-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-64-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE-64-NEXT:    addpd %xmm1, %xmm0
 ; SSE-64-NEXT:    retq
 ;
 ; SSE41-32-LABEL: uitofp_v2i64_v2f64:
 ; SSE41-32:       # %bb.0:
-; SSE41-32-NEXT:    pushl %ebp
-; SSE41-32-NEXT:    .cfi_def_cfa_offset 8
-; SSE41-32-NEXT:    .cfi_offset %ebp, -8
-; SSE41-32-NEXT:    movl %esp, %ebp
-; SSE41-32-NEXT:    .cfi_def_cfa_register %ebp
-; SSE41-32-NEXT:    andl $-8, %esp
-; SSE41-32-NEXT:    subl $32, %esp
-; SSE41-32-NEXT:    movq %xmm0, {{[0-9]+}}(%esp)
-; SSE41-32-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; SSE41-32-NEXT:    movq %xmm1, {{[0-9]+}}(%esp)
-; SSE41-32-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]
-; SSE41-32-NEXT:    movd %xmm1, %eax
-; SSE41-32-NEXT:    shrl $31, %eax
-; SSE41-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; SSE41-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; SSE41-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; SSE41-32-NEXT:    wait
-; SSE41-32-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; SSE41-32-NEXT:    movd %xmm0, %eax
-; SSE41-32-NEXT:    shrl $31, %eax
-; SSE41-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; SSE41-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; SSE41-32-NEXT:    fstpl (%esp)
-; SSE41-32-NEXT:    wait
-; SSE41-32-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
-; SSE41-32-NEXT:    movhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
-; SSE41-32-NEXT:    movl %ebp, %esp
-; SSE41-32-NEXT:    popl %ebp
-; SSE41-32-NEXT:    .cfi_def_cfa %esp, 4
+; SSE41-32-NEXT:    movdqa {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; SSE41-32-NEXT:    pand %xmm0, %xmm1
+; SSE41-32-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
+; SSE41-32-NEXT:    psrlq $32, %xmm0
+; SSE41-32-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE41-32-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+; SSE41-32-NEXT:    addpd %xmm1, %xmm0
 ; SSE41-32-NEXT:    retl
 ;
 ; SSE41-64-LABEL: uitofp_v2i64_v2f64:
 ; SSE41-64:       # %bb.0:
-; SSE41-64-NEXT:    movdqa %xmm0, %xmm1
-; SSE41-64-NEXT:    movq %xmm0, %rax
-; SSE41-64-NEXT:    movq %rax, %rcx
-; SSE41-64-NEXT:    shrq %rcx
-; SSE41-64-NEXT:    movl %eax, %edx
-; SSE41-64-NEXT:    andl $1, %edx
-; SSE41-64-NEXT:    orq %rcx, %rdx
-; SSE41-64-NEXT:    testq %rax, %rax
-; SSE41-64-NEXT:    cmovnsq %rax, %rdx
-; SSE41-64-NEXT:    xorps %xmm0, %xmm0
-; SSE41-64-NEXT:    cvtsi2sd %rdx, %xmm0
-; SSE41-64-NEXT:    jns .LBB21_2
-; SSE41-64-NEXT:  # %bb.1:
-; SSE41-64-NEXT:    addsd %xmm0, %xmm0
-; SSE41-64-NEXT:  .LBB21_2:
-; SSE41-64-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
-; SSE41-64-NEXT:    movq %xmm1, %rax
-; SSE41-64-NEXT:    movq %rax, %rcx
-; SSE41-64-NEXT:    shrq %rcx
-; SSE41-64-NEXT:    movl %eax, %edx
-; SSE41-64-NEXT:    andl $1, %edx
-; SSE41-64-NEXT:    orq %rcx, %rdx
-; SSE41-64-NEXT:    testq %rax, %rax
-; SSE41-64-NEXT:    cmovnsq %rax, %rdx
-; SSE41-64-NEXT:    xorps %xmm1, %xmm1
-; SSE41-64-NEXT:    cvtsi2sd %rdx, %xmm1
-; SSE41-64-NEXT:    jns .LBB21_4
-; SSE41-64-NEXT:  # %bb.3:
-; SSE41-64-NEXT:    addsd %xmm1, %xmm1
-; SSE41-64-NEXT:  .LBB21_4:
-; SSE41-64-NEXT:    unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; SSE41-64-NEXT:    movdqa {{.*#+}} xmm1 = [4294967295,4294967295]
+; SSE41-64-NEXT:    pand %xmm0, %xmm1
+; SSE41-64-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; SSE41-64-NEXT:    psrlq $32, %xmm0
+; SSE41-64-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE41-64-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE41-64-NEXT:    addpd %xmm1, %xmm0
 ; SSE41-64-NEXT:    retq
 ;
-; AVX-32-LABEL: uitofp_v2i64_v2f64:
-; AVX-32:       # %bb.0:
-; AVX-32-NEXT:    pushl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_offset 8
-; AVX-32-NEXT:    .cfi_offset %ebp, -8
-; AVX-32-NEXT:    movl %esp, %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
-; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $32, %esp
-; AVX-32-NEXT:    vmovlps %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vshufps {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; AVX-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vextractps $1, %xmm0, %eax
-; AVX-32-NEXT:    shrl $31, %eax
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vextractps $3, %xmm0, %eax
-; AVX-32-NEXT:    shrl $31, %eax
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-32-NEXT:    fstpl (%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-32-NEXT:    vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
-; AVX-32-NEXT:    movl %ebp, %esp
-; AVX-32-NEXT:    popl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa %esp, 4
-; AVX-32-NEXT:    retl
+; AVX1-32-LABEL: uitofp_v2i64_v2f64:
+; AVX1-32:       # %bb.0:
+; AVX1-32-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX1-32-NEXT:    vpblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3],xmm0[4,5],xmm1[6,7]
+; AVX1-32-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1, %xmm1
+; AVX1-32-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX1-32-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-32-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
+; AVX1-32-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
+; AVX1-32-NEXT:    retl
 ;
 ; AVX1-64-LABEL: uitofp_v2i64_v2f64:
 ; AVX1-64:       # %bb.0:
-; AVX1-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX1-64-NEXT:    movq %rax, %rcx
-; AVX1-64-NEXT:    shrq %rcx
-; AVX1-64-NEXT:    movl %eax, %edx
-; AVX1-64-NEXT:    andl $1, %edx
-; AVX1-64-NEXT:    orq %rcx, %rdx
-; AVX1-64-NEXT:    testq %rax, %rax
-; AVX1-64-NEXT:    cmovnsq %rax, %rdx
-; AVX1-64-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm1
-; AVX1-64-NEXT:    jns .LBB21_2
-; AVX1-64-NEXT:  # %bb.1:
-; AVX1-64-NEXT:    vaddsd %xmm1, %xmm1, %xmm1
-; AVX1-64-NEXT:  .LBB21_2:
-; AVX1-64-NEXT:    vmovq %xmm0, %rax
-; AVX1-64-NEXT:    movq %rax, %rcx
-; AVX1-64-NEXT:    shrq %rcx
-; AVX1-64-NEXT:    movl %eax, %edx
-; AVX1-64-NEXT:    andl $1, %edx
-; AVX1-64-NEXT:    orq %rcx, %rdx
-; AVX1-64-NEXT:    testq %rax, %rax
-; AVX1-64-NEXT:    cmovnsq %rax, %rdx
-; AVX1-64-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm0
-; AVX1-64-NEXT:    jns .LBB21_4
-; AVX1-64-NEXT:  # %bb.3:
-; AVX1-64-NEXT:    vaddsd %xmm0, %xmm0, %xmm0
-; AVX1-64-NEXT:  .LBB21_4:
-; AVX1-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX1-64-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX1-64-NEXT:    vpblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3],xmm0[4,5],xmm1[6,7]
+; AVX1-64-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; AVX1-64-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX1-64-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-64-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
 ; AVX1-64-NEXT:    retq
 ;
 ; AVX512F-64-LABEL: uitofp_v2i64_v2f64:
 ; AVX512F-64:       # %bb.0:
-; AVX512F-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512F-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512F-64-NEXT:    vmovq %xmm0, %rax
-; AVX512F-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512F-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX512F-64-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512F-64-NEXT:    vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
+; AVX512F-64-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; AVX512F-64-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX512F-64-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX512F-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX512F-64-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
 ; AVX512F-64-NEXT:    retq
 ;
+; AVX512VL-32-LABEL: uitofp_v2i64_v2f64:
+; AVX512VL-32:       # %bb.0:
+; AVX512VL-32-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512VL-32-NEXT:    vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
+; AVX512VL-32-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm1, %xmm1
+; AVX512VL-32-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}{1to2}, %xmm0, %xmm0
+; AVX512VL-32-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
+; AVX512VL-32-NEXT:    retl
+;
 ; AVX512VL-64-LABEL: uitofp_v2i64_v2f64:
 ; AVX512VL-64:       # %bb.0:
-; AVX512VL-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512VL-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512VL-64-NEXT:    vmovq %xmm0, %rax
-; AVX512VL-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512VL-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX512VL-64-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512VL-64-NEXT:    vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
+; AVX512VL-64-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm1, %xmm1
+; AVX512VL-64-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX512VL-64-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
+; AVX512VL-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to2}, %xmm0, %xmm0
+; AVX512VL-64-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
 ; AVX512VL-64-NEXT:    retq
 ;
 ; AVX512DQ-LABEL: uitofp_v2i64_v2f64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtuqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1497,3 +1365,6 @@ define <2 x double> @uitofp_v2i64_v2f64(<2 x i64> %x) #0 {
 }
 
 attributes #0 = { strictfp }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; AVX512DQ-32: {{.*}}
+; AVX512DQ-64: {{.*}}
diff --git a/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll b/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
index 7e446d5366387..8524afaef7137 100644
--- a/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
@@ -419,7 +419,7 @@ define <8 x float> @uitofp_v8i32_v8f32(<8 x i32> %x) #0 {
 ;
 ; AVX512F-LABEL: uitofp_v8i32_v8f32:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512F-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512F-NEXT:    ret{{[l|q]}}
@@ -431,7 +431,7 @@ define <8 x float> @uitofp_v8i32_v8f32(<8 x i32> %x) #0 {
 ;
 ; AVX512DQ-LABEL: uitofp_v8i32_v8f32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -603,7 +603,7 @@ define <4 x double> @uitofp_v4i32_v4f64(<4 x i32> %x) #0 {
 ;
 ; AVX512F-LABEL: uitofp_v4i32_v4f64:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512F-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512F-NEXT:    vcvtudq2pd %ymm0, %zmm0
 ; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512F-NEXT:    ret{{[l|q]}}
@@ -615,7 +615,7 @@ define <4 x double> @uitofp_v4i32_v4f64(<4 x i32> %x) #0 {
 ;
 ; AVX512DQ-LABEL: uitofp_v4i32_v4f64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512DQ-NEXT:    vcvtudq2pd %ymm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -732,7 +732,7 @@ define <4 x double> @sitofp_v4i64_v4f64(<4 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: sitofp_v4i64_v4f64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -748,152 +748,107 @@ define <4 x double> @sitofp_v4i64_v4f64(<4 x i64> %x) #0 {
 }
 
 define <4 x double> @uitofp_v4i64_v4f64(<4 x i64> %x) #0 {
-; AVX-32-LABEL: uitofp_v4i64_v4f64:
-; AVX-32:       # %bb.0:
-; AVX-32-NEXT:    pushl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_offset 8
-; AVX-32-NEXT:    .cfi_offset %ebp, -8
-; AVX-32-NEXT:    movl %esp, %ebp
-; AVX-32-NEXT:    .cfi_def_cfa_register %ebp
-; AVX-32-NEXT:    andl $-8, %esp
-; AVX-32-NEXT:    subl $64, %esp
-; AVX-32-NEXT:    vmovlps %xmm0, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vshufps {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; AVX-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vshufps {{.*#+}} xmm2 = xmm1[2,3,2,3]
-; AVX-32-NEXT:    vmovlps %xmm2, {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    vextractps $1, %xmm0, %eax
-; AVX-32-NEXT:    shrl $31, %eax
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-32-NEXT:    fstpl (%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vextractps $3, %xmm0, %eax
-; AVX-32-NEXT:    shrl $31, %eax
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vextractps $1, %xmm1, %eax
-; AVX-32-NEXT:    shrl $31, %eax
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vextractps $3, %xmm1, %eax
-; AVX-32-NEXT:    shrl $31, %eax
-; AVX-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; AVX-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; AVX-32-NEXT:    wait
-; AVX-32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-32-NEXT:    vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
-; AVX-32-NEXT:    vmovsd {{.*#+}} xmm1 = mem[0],zero
-; AVX-32-NEXT:    vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
-; AVX-32-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX-32-NEXT:    movl %ebp, %esp
-; AVX-32-NEXT:    popl %ebp
-; AVX-32-NEXT:    .cfi_def_cfa %esp, 4
-; AVX-32-NEXT:    retl
+; AVX1-32-LABEL: uitofp_v4i64_v4f64:
+; AVX1-32:       # %bb.0:
+; AVX1-32-NEXT:    vxorps %xmm1, %xmm1, %xmm1
+; AVX1-32-NEXT:    vblendps {{.*#+}} ymm2 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX1-32-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}, %ymm2, %ymm2
+; AVX1-32-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
+; AVX1-32-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[0,2,1,3,4,6,5,7]
+; AVX1-32-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0
+; AVX1-32-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0
+; AVX1-32-NEXT:    vaddpd %ymm0, %ymm2, %ymm0
+; AVX1-32-NEXT:    retl
 ;
 ; AVX1-64-LABEL: uitofp_v4i64_v4f64:
 ; AVX1-64:       # %bb.0:
-; AVX1-64-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX1-64-NEXT:    vpextrd $2, %xmm1, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
-; AVX1-64-NEXT:    vmovd %xmm1, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-64-NEXT:    vunpcklpd {{.*#+}} xmm2 = xmm3[0],xmm2[0]
-; AVX1-64-NEXT:    vextractps $2, %xmm0, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-64-NEXT:    vmovq %xmm0, %rax
-; AVX1-64-NEXT:    movl %eax, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm4
-; AVX1-64-NEXT:    vunpcklpd {{.*#+}} xmm3 = xmm4[0],xmm3[0]
-; AVX1-64-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
-; AVX1-64-NEXT:    vpextrd $3, %xmm1, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-64-NEXT:    vpextrd $1, %xmm1, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
-; AVX1-64-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
-; AVX1-64-NEXT:    vpextrd $3, %xmm0, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-64-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX1-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
-; AVX1-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; AVX1-64-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX1-64-NEXT:    vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
-; AVX1-64-NEXT:    vaddpd %ymm2, %ymm0, %ymm0
+; AVX1-64-NEXT:    vxorps %xmm1, %xmm1, %xmm1
+; AVX1-64-NEXT:    vblendps {{.*#+}} ymm2 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX1-64-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
+; AVX1-64-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
+; AVX1-64-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[0,2,1,3,4,6,5,7]
+; AVX1-64-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX1-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX1-64-NEXT:    vaddpd %ymm0, %ymm2, %ymm0
 ; AVX1-64-NEXT:    retq
 ;
+; AVX2-32-LABEL: uitofp_v4i64_v4f64:
+; AVX2-32:       # %bb.0:
+; AVX2-32-NEXT:    vpsrlq $32, %ymm0, %ymm1
+; AVX2-32-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}, %ymm1, %ymm1
+; AVX2-32-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
+; AVX2-32-NEXT:    vsubpd %ymm2, %ymm1, %ymm1
+; AVX2-32-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
+; AVX2-32-NEXT:    vpblendd {{.*#+}} ymm0 = ymm0[0],ymm2[1],ymm0[2],ymm2[3],ymm0[4],ymm2[5],ymm0[6],ymm2[7]
+; AVX2-32-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0
+; AVX2-32-NEXT:    vaddpd %ymm1, %ymm0, %ymm0
+; AVX2-32-NEXT:    retl
+;
 ; AVX2-64-LABEL: uitofp_v4i64_v4f64:
 ; AVX2-64:       # %bb.0:
-; AVX2-64-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX2-64-NEXT:    vextractps $3, %xmm1, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
-; AVX2-64-NEXT:    vextractps $1, %xmm1, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX2-64-NEXT:    vunpcklpd {{.*#+}} xmm2 = xmm3[0],xmm2[0]
-; AVX2-64-NEXT:    vextractps $3, %xmm0, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX2-64-NEXT:    vextractps $1, %xmm0, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm4
-; AVX2-64-NEXT:    vunpcklpd {{.*#+}} xmm3 = xmm4[0],xmm3[0]
-; AVX2-64-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
-; AVX2-64-NEXT:    vbroadcastsd {{.*#+}} ymm3 = [4.294967296E+9,4.294967296E+9,4.294967296E+9,4.294967296E+9]
-; AVX2-64-NEXT:    vmulpd %ymm3, %ymm2, %ymm2
-; AVX2-64-NEXT:    vextractps $2, %xmm1, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX2-64-NEXT:    vmovd %xmm1, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
-; AVX2-64-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
-; AVX2-64-NEXT:    vextractps $2, %xmm0, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX2-64-NEXT:    vmovq %xmm0, %rax
-; AVX2-64-NEXT:    movl %eax, %eax
-; AVX2-64-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
-; AVX2-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; AVX2-64-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX2-64-NEXT:    vaddpd %ymm0, %ymm2, %ymm0
+; AVX2-64-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX2-64-NEXT:    vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX2-64-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200]
+; AVX2-64-NEXT:    vpor %ymm2, %ymm1, %ymm1
+; AVX2-64-NEXT:    vpsrlq $32, %ymm0, %ymm0
+; AVX2-64-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4985484787499139072,4985484787499139072,4985484787499139072,4985484787499139072]
+; AVX2-64-NEXT:    vpor %ymm2, %ymm0, %ymm0
+; AVX2-64-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
+; AVX2-64-NEXT:    vsubpd %ymm2, %ymm0, %ymm0
+; AVX2-64-NEXT:    vaddpd %ymm0, %ymm1, %ymm0
 ; AVX2-64-NEXT:    retq
 ;
+; AVX512F-32-LABEL: uitofp_v4i64_v4f64:
+; AVX512F-32:       # %bb.0:
+; AVX512F-32-NEXT:    vpsrlq $32, %ymm0, %ymm1
+; AVX512F-32-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}, %ymm1, %ymm1
+; AVX512F-32-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
+; AVX512F-32-NEXT:    vsubpd %ymm2, %ymm1, %ymm1
+; AVX512F-32-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
+; AVX512F-32-NEXT:    vpblendd {{.*#+}} ymm0 = ymm0[0],ymm2[1],ymm0[2],ymm2[3],ymm0[4],ymm2[5],ymm0[6],ymm2[7]
+; AVX512F-32-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0
+; AVX512F-32-NEXT:    vaddpd %ymm1, %ymm0, %ymm0
+; AVX512F-32-NEXT:    retl
+;
 ; AVX512F-64-LABEL: uitofp_v4i64_v4f64:
 ; AVX512F-64:       # %bb.0:
-; AVX512F-64-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512F-64-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512F-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512F-64-NEXT:    vmovq %xmm1, %rax
-; AVX512F-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512F-64-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
-; AVX512F-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512F-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512F-64-NEXT:    vmovq %xmm0, %rax
-; AVX512F-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512F-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
-; AVX512F-64-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX512F-64-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512F-64-NEXT:    vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX512F-64-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200]
+; AVX512F-64-NEXT:    vpor %ymm2, %ymm1, %ymm1
+; AVX512F-64-NEXT:    vpsrlq $32, %ymm0, %ymm0
+; AVX512F-64-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4985484787499139072,4985484787499139072,4985484787499139072,4985484787499139072]
+; AVX512F-64-NEXT:    vpor %ymm2, %ymm0, %ymm0
+; AVX512F-64-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
+; AVX512F-64-NEXT:    vsubpd %ymm2, %ymm0, %ymm0
+; AVX512F-64-NEXT:    vaddpd %ymm0, %ymm1, %ymm0
 ; AVX512F-64-NEXT:    retq
 ;
+; AVX512VL-32-LABEL: uitofp_v4i64_v4f64:
+; AVX512VL-32:       # %bb.0:
+; AVX512VL-32-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512VL-32-NEXT:    vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX512VL-32-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %ymm1, %ymm1
+; AVX512VL-32-NEXT:    vpsrlq $32, %ymm0, %ymm0
+; AVX512VL-32-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %ymm0, %ymm0
+; AVX512VL-32-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}{1to4}, %ymm0, %ymm0
+; AVX512VL-32-NEXT:    vaddpd %ymm0, %ymm1, %ymm0
+; AVX512VL-32-NEXT:    retl
+;
 ; AVX512VL-64-LABEL: uitofp_v4i64_v4f64:
 ; AVX512VL-64:       # %bb.0:
-; AVX512VL-64-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512VL-64-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512VL-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512VL-64-NEXT:    vmovq %xmm1, %rax
-; AVX512VL-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512VL-64-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
-; AVX512VL-64-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512VL-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512VL-64-NEXT:    vmovq %xmm0, %rax
-; AVX512VL-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512VL-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
-; AVX512VL-64-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX512VL-64-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512VL-64-NEXT:    vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX512VL-64-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm1, %ymm1
+; AVX512VL-64-NEXT:    vpsrlq $32, %ymm0, %ymm0
+; AVX512VL-64-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm0
+; AVX512VL-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %ymm0, %ymm0
+; AVX512VL-64-NEXT:    vaddpd %ymm0, %ymm1, %ymm0
 ; AVX512VL-64-NEXT:    retq
 ;
 ; AVX512DQ-LABEL: uitofp_v4i64_v4f64:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtuqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    ret{{[l|q]}}
@@ -1014,7 +969,7 @@ define <4 x float> @sitofp_v4i64_v4f32(<4 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: sitofp_v4i64_v4f32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtqq2ps %zmm0, %ymm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -1177,7 +1132,7 @@ define <4 x float> @uitofp_v4i64_v4f32(<4 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: uitofp_v4i64_v4f32:
 ; AVX512DQ:       # %bb.0:
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtuqq2ps %zmm0, %ymm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512DQ-NEXT:    vzeroupper
diff --git a/llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll b/llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll
index 59294dd17fbca..ec8690f7f2b89 100644
--- a/llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll
+++ b/llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll
@@ -362,120 +362,22 @@ define <8 x double> @sitofp_v8i64_v8f64(<8 x i64> %x) #0 {
 define <8 x double> @uitofp_v8i64_v8f64(<8 x i64> %x) #0 {
 ; NODQ-32-LABEL: uitofp_v8i64_v8f64:
 ; NODQ-32:       # %bb.0:
-; NODQ-32-NEXT:    pushl %ebp
-; NODQ-32-NEXT:    .cfi_def_cfa_offset 8
-; NODQ-32-NEXT:    .cfi_offset %ebp, -8
-; NODQ-32-NEXT:    movl %esp, %ebp
-; NODQ-32-NEXT:    .cfi_def_cfa_register %ebp
-; NODQ-32-NEXT:    andl $-8, %esp
-; NODQ-32-NEXT:    subl $128, %esp
-; NODQ-32-NEXT:    vextractf32x4 $2, %zmm0, %xmm3
-; NODQ-32-NEXT:    vmovlps %xmm3, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vshufps {{.*#+}} xmm1 = xmm3[2,3,2,3]
-; NODQ-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vextractf32x4 $3, %zmm0, %xmm2
-; NODQ-32-NEXT:    vmovlps %xmm2, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vshufps {{.*#+}} xmm1 = xmm2[2,3,2,3]
-; NODQ-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vmovlps %xmm0, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vshufps {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; NODQ-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; NODQ-32-NEXT:    vmovlps %xmm1, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vshufps {{.*#+}} xmm4 = xmm1[2,3,2,3]
-; NODQ-32-NEXT:    vmovlps %xmm4, {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    vextractps $1, %xmm3, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $3, %xmm3, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $1, %xmm2, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $3, %xmm2, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $1, %xmm0, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl (%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $3, %xmm0, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $1, %xmm1, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vextractps $3, %xmm1, %eax
-; NODQ-32-NEXT:    shrl $31, %eax
-; NODQ-32-NEXT:    fildll {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    fadds {{\.?LCPI[0-9]+_[0-9]+}}(,%eax,4)
-; NODQ-32-NEXT:    fstpl {{[0-9]+}}(%esp)
-; NODQ-32-NEXT:    wait
-; NODQ-32-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; NODQ-32-NEXT:    vmovhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
-; NODQ-32-NEXT:    vmovsd {{.*#+}} xmm1 = mem[0],zero
-; NODQ-32-NEXT:    vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
-; NODQ-32-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; NODQ-32-NEXT:    vmovsd {{.*#+}} xmm1 = mem[0],zero
-; NODQ-32-NEXT:    vmovhps {{.*#+}} xmm1 = xmm1[0,1],mem[0,1]
-; NODQ-32-NEXT:    vmovsd {{.*#+}} xmm2 = mem[0],zero
-; NODQ-32-NEXT:    vmovhps {{.*#+}} xmm2 = xmm2[0,1],mem[0,1]
-; NODQ-32-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
-; NODQ-32-NEXT:    vinsertf64x4 $1, %ymm0, %zmm1, %zmm0
-; NODQ-32-NEXT:    movl %ebp, %esp
-; NODQ-32-NEXT:    popl %ebp
-; NODQ-32-NEXT:    .cfi_def_cfa %esp, 4
+; NODQ-32-NEXT:    vpbroadcastq {{.*#+}} zmm1 = [0,1127219200,0,1127219200,0,1127219200,0,1127219200,0,1127219200,0,1127219200,0,1127219200,0,1127219200]
+; NODQ-32-NEXT:    vpternlogq {{.*#+}} zmm1 = zmm1 | (zmm0 & m64bcst)
+; NODQ-32-NEXT:    vpsrlq $32, %zmm0, %zmm0
+; NODQ-32-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}{1to8}, %zmm0, %zmm0
+; NODQ-32-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}{1to8}, %zmm0, %zmm0
+; NODQ-32-NEXT:    vaddpd %zmm0, %zmm1, %zmm0
 ; NODQ-32-NEXT:    retl
 ;
 ; NODQ-64-LABEL: uitofp_v8i64_v8f64:
 ; NODQ-64:       # %bb.0:
-; NODQ-64-NEXT:    vextracti32x4 $3, %zmm0, %xmm1
-; NODQ-64-NEXT:    vpextrq $1, %xmm1, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; NODQ-64-NEXT:    vmovq %xmm1, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; NODQ-64-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
-; NODQ-64-NEXT:    vextracti32x4 $2, %zmm0, %xmm2
-; NODQ-64-NEXT:    vpextrq $1, %xmm2, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm3
-; NODQ-64-NEXT:    vmovq %xmm2, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; NODQ-64-NEXT:    vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0]
-; NODQ-64-NEXT:    vinsertf128 $1, %xmm1, %ymm2, %ymm1
-; NODQ-64-NEXT:    vextracti128 $1, %ymm0, %xmm2
-; NODQ-64-NEXT:    vpextrq $1, %xmm2, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm3
-; NODQ-64-NEXT:    vmovq %xmm2, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; NODQ-64-NEXT:    vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm3[0]
-; NODQ-64-NEXT:    vpextrq $1, %xmm0, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm3
-; NODQ-64-NEXT:    vmovq %xmm0, %rax
-; NODQ-64-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; NODQ-64-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; NODQ-64-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
-; NODQ-64-NEXT:    vinsertf64x4 $1, %ymm1, %zmm0, %zmm0
+; NODQ-64-NEXT:    vpbroadcastq {{.*#+}} zmm1 = [4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200]
+; NODQ-64-NEXT:    vpternlogq {{.*#+}} zmm1 = zmm1 | (zmm0 & m64bcst)
+; NODQ-64-NEXT:    vpsrlq $32, %zmm0, %zmm0
+; NODQ-64-NEXT:    vporq {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm0
+; NODQ-64-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to8}, %zmm0, %zmm0
+; NODQ-64-NEXT:    vaddpd %zmm0, %zmm1, %zmm0
 ; NODQ-64-NEXT:    retq
 ;
 ; DQ-LABEL: uitofp_v8i64_v8f64:
diff --git a/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-flags.ll b/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-flags.ll
index d77934adf4cd1..54fcbd46f261f 100644
--- a/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-flags.ll
+++ b/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-flags.ll
@@ -1,30 +1,23 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc -O3 -mtriple=x86_64-pc-linux -stop-after=finalize-isel < %s | FileCheck %s
 
 define <1 x float> @constrained_vector_fadd_v1f32() #0 {
-; CHECK-LABEL: name: constrained_vector_fadd_v1f32
-; CHECK: [[MOVSSrm_alt:%[0-9]+]]:fr32 = MOVSSrm_alt $rip, 1, $noreg, %const.0, $noreg :: (load (s32) from constant-pool)
-; CHECK: [[ADDSSrm:%[0-9]+]]:fr32 = ADDSSrm [[MOVSSrm_alt]], $rip, 1, $noreg, %const.1, $noreg, implicit $mxcsr :: (load (s32) from constant-pool)
-; CHECK: $xmm0 = COPY [[ADDSSrm]]
-; CHECK: RET 0, $xmm0
+  ; CHECK-LABEL: name: constrained_vector_fadd_v1f32
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOVSSrm_alt:%[0-9]+]]:fr32 = MOVSSrm_alt $rip, 1, $noreg, %const.0, $noreg :: (load (s32) from constant-pool)
+  ; CHECK-NEXT:   $xmm0 = COPY [[MOVSSrm_alt]]
+  ; CHECK-NEXT:   RET 0, $xmm0
 entry:
   %add = call <1 x float> @llvm.experimental.constrained.fadd.v1f32(<1 x float> <float 0x7FF0000000000000>, <1 x float> <float 1.0>, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret <1 x float> %add
 }
 
 define <3 x float> @constrained_vector_fadd_v3f32() #0 {
-; CHECK-LABEL: name: constrained_vector_fadd_v3f32
-; CHECK: [[FsFLD0SS:%[0-9]+]]:fr32 = FsFLD0SS
-; CHECK: [[MOVSSrm_alt:%[0-9]+]]:fr32 = MOVSSrm_alt $rip, 1, $noreg, %const.0, $noreg :: (load (s32) from constant-pool)
-; CHECK: [[ADDSSrr:%[0-9]+]]:fr32 = ADDSSrr [[MOVSSrm_alt]], killed [[FsFLD0SS]], implicit $mxcsr
-; CHECK: [[ADDSSrm:%[0-9]+]]:fr32 = ADDSSrm [[MOVSSrm_alt]], $rip, 1, $noreg, %const.1, $noreg, implicit $mxcsr :: (load (s32) from constant-pool)
-; CHECK: [[ADDSSrm1:%[0-9]+]]:fr32 = ADDSSrm [[MOVSSrm_alt]], $rip, 1, $noreg, %const.2, $noreg, implicit $mxcsr :: (load (s32) from constant-pool)
-; CHECK: [[COPY:%[0-9]+]]:vr128 = COPY killed [[ADDSSrm1]]
-; CHECK: [[COPY1:%[0-9]+]]:vr128 = COPY killed [[ADDSSrm]]
-; CHECK: [[UNPCKLPSrr:%[0-9]+]]:vr128 = UNPCKLPSrr [[COPY1]], killed [[COPY]]
-; CHECK: [[COPY2:%[0-9]+]]:vr128 = COPY killed [[ADDSSrr]]
-; CHECK: [[UNPCKLPDrr:%[0-9]+]]:vr128 = UNPCKLPDrr [[UNPCKLPSrr]], killed [[COPY2]]
-; CHECK: $xmm0 = COPY [[UNPCKLPDrr]]
-; CHECK: RET 0, $xmm0
+  ; CHECK-LABEL: name: constrained_vector_fadd_v3f32
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[V_SETALLONES:%[0-9]+]]:vr128 = V_SETALLONES
+  ; CHECK-NEXT:   $xmm0 = COPY [[V_SETALLONES]]
+  ; CHECK-NEXT:   RET 0, $xmm0
 entry:
   %add = call <3 x float> @llvm.experimental.constrained.fadd.v3f32(
            <3 x float> <float 0xFFFFFFFFE0000000, float 0xFFFFFFFFE0000000,
@@ -36,13 +29,12 @@ entry:
 }
 
 define <4 x double> @constrained_vector_fadd_v4f64() #0 {
-; CHECK-LABEL: name: constrained_vector_fadd_v4f64
-; CHECK: [[MOVAPDrm:%[0-9]+]]:vr128 = MOVAPDrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
-; CHECK: [[ADDPDrm:%[0-9]+]]:vr128 = ADDPDrm [[MOVAPDrm]], $rip, 1, $noreg, %const.1, $noreg, implicit $mxcsr :: (load (s128) from constant-pool)
-; CHECK: [[ADDPDrm1:%[0-9]+]]:vr128 = ADDPDrm [[MOVAPDrm]], $rip, 1, $noreg, %const.2, $noreg, implicit $mxcsr :: (load (s128) from constant-pool)
-; CHECK: $xmm0 = COPY [[ADDPDrm1]]
-; CHECK: $xmm1 = COPY [[ADDPDrm]]
-; CHECK: RET 0, $xmm0, $xmm1
+  ; CHECK-LABEL: name: constrained_vector_fadd_v4f64
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   [[MOVAPDrm:%[0-9]+]]:vr128 = MOVAPDrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
+  ; CHECK-NEXT:   $xmm0 = COPY [[MOVAPDrm]]
+  ; CHECK-NEXT:   $xmm1 = COPY [[MOVAPDrm]]
+  ; CHECK-NEXT:   RET 0, $xmm0, $xmm1
 entry:
   %add = call <4 x double> @llvm.experimental.constrained.fadd.v4f64(
            <4 x double> <double 0x7FEFFFFFFFFFFFFF, double 0x7FEFFFFFFFFFFFFF,
diff --git a/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-fma.ll b/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-fma.ll
index ff208678c9bc7..0bf912c3a2cba 100644
--- a/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-fma.ll
+++ b/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-fma.ll
@@ -4,9 +4,7 @@
 define <1 x float> @constrained_vector_fma_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovss {{.*#+}} xmm1 = [5.0E-1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = [2.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
+; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = [5.75E+0,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <1 x float> @llvm.experimental.constrained.fma.v1f32(
@@ -21,9 +19,7 @@ entry:
 define <2 x double> @constrained_vector_fma_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovapd {{.*#+}} xmm1 = [1.5E+0,5.0E-1]
-; CHECK-NEXT:    vmovapd {{.*#+}} xmm0 = [3.5E+0,2.5E+0]
-; CHECK-NEXT:    vfmadd213pd {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
+; CHECK-NEXT:    vmovaps {{.*#+}} xmm0 = [1.075E+1,5.75E+0]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <2 x double> @llvm.experimental.constrained.fma.v2f64(
@@ -38,17 +34,7 @@ entry:
 define <3 x float> @constrained_vector_fma_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = [5.0E-1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vmovss {{.*#+}} xmm1 = [3.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vfmadd213ss {{.*#+}} xmm1 = (xmm0 * xmm1) + mem
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = [2.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vmovss {{.*#+}} xmm2 = [5.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vfmadd213ss {{.*#+}} xmm2 = (xmm0 * xmm2) + mem
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = [1.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vmovss {{.*#+}} xmm3 = [4.5E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    vfmadd213ss {{.*#+}} xmm3 = (xmm0 * xmm3) + mem
-; CHECK-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm3[0],xmm2[2,3]
-; CHECK-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; CHECK-NEXT:    vmovaps {{.*#+}} xmm0 = [2.225E+1,1.425E+1,8.25E+0,u]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <3 x float> @llvm.experimental.constrained.fma.v3f32(
@@ -63,13 +49,7 @@ entry:
 define <3 x double> @constrained_vector_fma_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovsd {{.*#+}} xmm0 = [5.0E-1,0.0E+0]
-; CHECK-NEXT:    vmovsd {{.*#+}} xmm1 = [3.5E+0,0.0E+0]
-; CHECK-NEXT:    vfmadd213sd {{.*#+}} xmm1 = (xmm0 * xmm1) + mem
-; CHECK-NEXT:    vmovapd {{.*#+}} xmm0 = [2.5E+0,1.5E+0]
-; CHECK-NEXT:    vmovapd {{.*#+}} xmm2 = [5.5E+0,4.5E+0]
-; CHECK-NEXT:    vfmadd213pd {{.*#+}} xmm2 = (xmm0 * xmm2) + mem
-; CHECK-NEXT:    vinsertf128 $1, %xmm1, %ymm2, %ymm0
+; CHECK-NEXT:    vmovaps {{.*#+}} ymm0 = [2.225E+1,1.425E+1,8.25E+0,u]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <3 x double> @llvm.experimental.constrained.fma.v3f64(
@@ -84,9 +64,7 @@ entry:
 define <4 x double> @constrained_vector_fma_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovapd {{.*#+}} ymm1 = [3.5E+0,2.5E+0,1.5E+0,5.0E-1]
-; CHECK-NEXT:    vmovapd {{.*#+}} ymm0 = [7.5E+0,6.5E+0,5.5E+0,4.5E+0]
-; CHECK-NEXT:    vfmadd213pd {{.*#+}} ymm0 = (ymm1 * ymm0) + mem
+; CHECK-NEXT:    vmovaps {{.*#+}} ymm0 = [3.775E+1,2.675E+1,1.775E+1,1.075E+1]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <4 x double> @llvm.experimental.constrained.fma.v4f64(
@@ -101,9 +79,7 @@ entry:
 define <4 x float> @constrained_vector_fma_v4f32() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v4f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovaps {{.*#+}} xmm1 = [3.5E+0,2.5E+0,1.5E+0,5.0E-1]
-; CHECK-NEXT:    vmovaps {{.*#+}} xmm0 = [7.5E+0,6.5E+0,5.5E+0,4.5E+0]
-; CHECK-NEXT:    vfmadd213ps {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
+; CHECK-NEXT:    vmovaps {{.*#+}} xmm0 = [3.775E+1,2.675E+1,1.775E+1,1.075E+1]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <4 x float> @llvm.experimental.constrained.fma.v4f32(
@@ -118,9 +94,7 @@ entry:
 define <8 x float> @constrained_vector_fma_v8f32() #0 {
 ; CHECK-LABEL: constrained_vector_fma_v8f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vmovaps {{.*#+}} ymm1 = [3.5E+0,2.5E+0,1.5E+0,5.0E-1,7.5E+0,6.5E+0,5.5E+0,4.5E+0]
-; CHECK-NEXT:    vmovaps {{.*#+}} ymm0 = [7.5E+0,6.5E+0,5.5E+0,4.5E+0,1.15E+1,1.05E+1,9.5E+0,8.5E+0]
-; CHECK-NEXT:    vfmadd213ps {{.*#+}} ymm0 = (ymm1 * ymm0) + mem
+; CHECK-NEXT:    vmovaps {{.*#+}} ymm0 = [3.775E+1,2.675E+1,1.775E+1,1.075E+1,1.0175E+2,8.275E+1,6.575E+1,5.075E+1]
 ; CHECK-NEXT:    retq
 entry:
   %fma = call <8 x float> @llvm.experimental.constrained.fma.v8f32(
diff --git a/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll b/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
index 6aec5278fa75a..62c918b646a56 100644
--- a/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
@@ -7,14 +7,12 @@
 define <1 x float> @constrained_vector_fdiv_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fdiv_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    divss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.00000001E-1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fdiv_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vdivss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [1.00000001E-1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %div = call <1 x float> @llvm.experimental.constrained.fdiv.v1f32(
@@ -28,14 +26,12 @@ entry:
 define <2 x double> @constrained_vector_fdiv_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fdiv_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
-; CHECK-NEXT:    divpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.0000000000000001E-1,2.0000000000000001E-1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fdiv_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovapd {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
-; AVX-NEXT:    vdivpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [1.0000000000000001E-1,2.0000000000000001E-1]
 ; AVX-NEXT:    retq
 entry:
   %div = call <2 x double> @llvm.experimental.constrained.fdiv.v2f64(
@@ -49,28 +45,12 @@ entry:
 define <3 x float> @constrained_vector_fdiv_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fdiv_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    divss %xmm1, %xmm2
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    divss %xmm1, %xmm0
-; CHECK-NEXT:    movss {{.*#+}} xmm3 = [2.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    divss %xmm1, %xmm3
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.00000001E-1,2.00000003E-1,3.00000012E-1,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fdiv_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vdivss %xmm0, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vdivss %xmm0, %xmm2, %xmm2
-; AVX-NEXT:    vmovss {{.*#+}} xmm3 = [2.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vdivss %xmm0, %xmm3, %xmm0
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [1.00000001E-1,2.00000003E-1,3.00000012E-1,u]
 ; AVX-NEXT:    retq
 entry:
   %div = call <3 x float> @llvm.experimental.constrained.fdiv.v3f32(
@@ -84,24 +64,15 @@ entry:
 define <3 x double> @constrained_vector_fdiv_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fdiv_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
-; CHECK-NEXT:    divpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
-; CHECK-NEXT:    divsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    movsd %xmm1, -{{[0-9]+}}(%rsp)
-; CHECK-NEXT:    movapd %xmm0, %xmm1
-; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
-; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [1.0000000000000001E-1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [2.0000000000000001E-1,0.0E+0]
+; CHECK-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fdiv_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [3.0E+0,0.0E+0]
-; AVX-NEXT:    vdivsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
-; AVX-NEXT:    vmovapd {{.*#+}} xmm1 = [1.0E+0,2.0E+0]
-; AVX-NEXT:    vdivpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [1.0000000000000001E-1,2.0000000000000001E-1,2.9999999999999999E-1,u]
 ; AVX-NEXT:    retq
 entry:
   %div = call <3 x double> @llvm.experimental.constrained.fdiv.v3f64(
@@ -115,25 +86,14 @@ entry:
 define <4 x double> @constrained_vector_fdiv_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fdiv_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm2 = [1.0E+1,1.0E+1]
-; CHECK-NEXT:    movapd {{.*#+}} xmm1 = [3.0E+0,4.0E+0]
-; CHECK-NEXT:    divpd %xmm2, %xmm1
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
-; CHECK-NEXT:    divpd %xmm2, %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.0000000000000001E-1,2.0000000000000001E-1]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [2.9999999999999999E-1,4.0000000000000002E-1]
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fdiv_v4f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovapd {{.*#+}} ymm0 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
-; AVX1-NEXT:    vdivpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fdiv_v4f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [1.0E+1,1.0E+1,1.0E+1,1.0E+1]
-; AVX512-NEXT:    vmovapd {{.*#+}} ymm1 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
-; AVX512-NEXT:    vdivpd %ymm0, %ymm1, %ymm0
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fdiv_v4f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [1.0000000000000001E-1,2.0000000000000001E-1,2.9999999999999999E-1,4.0000000000000002E-1]
+; AVX-NEXT:    retq
 entry:
   %div = call <4 x double> @llvm.experimental.constrained.fdiv.v4f64(
            <4 x double> <double 1.000000e+00, double 2.000000e+00,
@@ -148,24 +108,12 @@ entry:
 define <1 x float> @constrained_vector_frem_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_frem_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmodf at PLT
-; CHECK-NEXT:    popq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_frem_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmodf at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
   %rem = call <1 x float> @llvm.experimental.constrained.frem.v1f32(
@@ -179,36 +127,12 @@ entry:
 define <2 x double> @constrained_vector_frem_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_frem_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [2.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_frem_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 32
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [2.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    addq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %rem = call <2 x double> @llvm.experimental.constrained.frem.v2f64(
@@ -222,49 +146,12 @@ entry:
 define <3 x float> @constrained_vector_frem_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_frem_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmodf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmodf at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [2.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmodf at PLT
-; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0,3.0E+0,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_frem_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmodf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmodf at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [2.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [1.0E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmodf at PLT
-; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0,3.0E+0,u]
 ; AVX-NEXT:    retq
 entry:
   %rem = call <3 x float> @llvm.experimental.constrained.frem.v3f32(
@@ -278,52 +165,15 @@ entry:
 define <3 x double> @constrained_vector_frem_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_frem_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [2.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [3.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
-; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [2.0E+0,0.0E+0]
+; CHECK-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
-; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
-; CHECK-NEXT:    # xmm1 = mem[0],zero
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_frem_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [2.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovups %ymm0, (%rsp) # 32-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [3.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    vzeroupper
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vmovups (%rsp), %ymm1 # 32-byte Reload
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [1.0E+0,2.0E+0,3.0E+0,u]
 ; AVX-NEXT:    retq
 entry:
   %rem = call <3 x double> @llvm.experimental.constrained.frem.v3f64(
@@ -337,59 +187,13 @@ entry:
 define <4 x double> @constrained_vector_frem_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_frem_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [2.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [3.0E+0,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmod at PLT
-; CHECK-NEXT:    movaps %xmm0, %xmm1
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.0E+0,2.0E+0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [3.0E+0,4.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_frem_v4f64:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [3.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [2.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.0E+0,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [1.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmod at PLT
-; AVX-NEXT:    vunpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vinsertf128 $1, (%rsp), %ymm0, %ymm0 # 16-byte Folded Reload
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
 ; AVX-NEXT:    retq
   %rem = call <4 x double> @llvm.experimental.constrained.frem.v4f64(
            <4 x double> <double 1.000000e+00, double 2.000000e+00,
@@ -405,13 +209,11 @@ define <1 x float> @constrained_vector_fmul_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fmul_v1f32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fmul_v1f32:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %mul = call <1 x float> @llvm.experimental.constrained.fmul.v1f32(
@@ -425,15 +227,13 @@ entry:
 define <2 x double> @constrained_vector_fmul_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fmul_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; CHECK-NEXT:    mulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [+Inf,+Inf]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fmul_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovddup {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
+; AVX-NEXT:    vmovddup {{.*#+}} xmm0 = [+Inf,+Inf]
 ; AVX-NEXT:    # xmm0 = mem[0,0]
-; AVX-NEXT:    vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %mul = call <2 x double> @llvm.experimental.constrained.fmul.v2f64(
@@ -447,24 +247,12 @@ entry:
 define <3 x float> @constrained_vector_fmul_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fmul_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [1.0E+2,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    mulss %xmm1, %xmm2
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    mulss %xmm1, %xmm0
-; CHECK-NEXT:    mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [+Inf,+Inf,+Inf,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fmul_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
-; AVX-NEXT:    vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm2
-; AVX-NEXT:    vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [+Inf,+Inf,+Inf,+Inf]
 ; AVX-NEXT:    retq
 entry:
   %mul = call <3 x float> @llvm.experimental.constrained.fmul.v3f32(
@@ -479,25 +267,15 @@ entry:
 define <3 x double> @constrained_vector_fmul_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fmul_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; CHECK-NEXT:    mulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [1.7976931348623157E+308,0.0E+0]
-; CHECK-NEXT:    mulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    movsd %xmm1, -{{[0-9]+}}(%rsp)
-; CHECK-NEXT:    movapd %xmm0, %xmm1
-; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
-; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [1.7976931348623157E+308,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [+Inf,0.0E+0]
+; CHECK-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fmul_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [1.7976931348623157E+308,0.0E+0]
-; AVX-NEXT:    vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
-; AVX-NEXT:    vmovddup {{.*#+}} xmm1 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; AVX-NEXT:    # xmm1 = mem[0,0]
-; AVX-NEXT:    vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [1.7976931348623157E+308,+Inf,+Inf,u]
 ; AVX-NEXT:    retq
 entry:
   %mul = call <3 x double> @llvm.experimental.constrained.fmul.v3f64(
@@ -512,16 +290,13 @@ entry:
 define <4 x double> @constrained_vector_fmul_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fmul_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; CHECK-NEXT:    movapd {{.*#+}} xmm1 = [4.0E+0,5.0E+0]
-; CHECK-NEXT:    mulpd %xmm0, %xmm1
-; CHECK-NEXT:    mulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [+Inf,+Inf]
+; CHECK-NEXT:    movaps %xmm0, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fmul_v4f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [1.7976931348623157E+308,1.7976931348623157E+308,1.7976931348623157E+308,1.7976931348623157E+308]
-; AVX-NEXT:    vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [+Inf,+Inf,+Inf,+Inf]
 ; AVX-NEXT:    retq
 entry:
   %mul = call <4 x double> @llvm.experimental.constrained.fmul.v4f64(
@@ -538,13 +313,11 @@ define <1 x float> @constrained_vector_fadd_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fadd_v1f32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    addss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fadd_v1f32:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vaddss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %add = call <1 x float> @llvm.experimental.constrained.fadd.v1f32(
@@ -558,15 +331,13 @@ entry:
 define <2 x double> @constrained_vector_fadd_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fadd_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; CHECK-NEXT:    addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fadd_v2f64:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vmovddup {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
 ; AVX-NEXT:    # xmm0 = mem[0,0]
-; AVX-NEXT:    vaddpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %add = call <2 x double> @llvm.experimental.constrained.fadd.v2f64(
@@ -580,25 +351,12 @@ entry:
 define <3 x float> @constrained_vector_fadd_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fadd_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    addss %xmm2, %xmm1
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [2.0E+0,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    addss %xmm2, %xmm0
-; CHECK-NEXT:    addss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    pcmpeqd %xmm0, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fadd_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vaddss %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vaddss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm2
-; AVX-NEXT:    vaddss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vpcmpeqd %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %add = call <3 x float> @llvm.experimental.constrained.fadd.v3f32(
@@ -613,25 +371,15 @@ entry:
 define <3 x double> @constrained_vector_fadd_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fadd_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; CHECK-NEXT:    addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    xorpd %xmm1, %xmm1
-; CHECK-NEXT:    addsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    movsd %xmm1, -{{[0-9]+}}(%rsp)
-; CHECK-NEXT:    movapd %xmm0, %xmm1
-; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
-; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [1.7976931348623157E+308,0.0E+0]
+; CHECK-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
+; CHECK-NEXT:    movaps %xmm0, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fadd_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vxorpd %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vaddsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
-; AVX-NEXT:    vmovddup {{.*#+}} xmm1 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; AVX-NEXT:    # xmm1 = mem[0,0]
-; AVX-NEXT:    vaddpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [1.7976931348623157E+308,1.7976931348623157E+308,1.7976931348623157E+308,1.7976931348623157E+308]
 ; AVX-NEXT:    retq
 entry:
   %add = call <3 x double> @llvm.experimental.constrained.fadd.v3f64(
@@ -646,16 +394,13 @@ entry:
 define <4 x double> @constrained_vector_fadd_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fadd_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
-; CHECK-NEXT:    movapd {{.*#+}} xmm1 = [2.0E+0,2.0000000000000001E-1]
-; CHECK-NEXT:    addpd %xmm0, %xmm1
-; CHECK-NEXT:    addpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [1.7976931348623157E+308,1.7976931348623157E+308]
+; CHECK-NEXT:    movaps %xmm0, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fadd_v4f64:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [1.7976931348623157E+308,1.7976931348623157E+308,1.7976931348623157E+308,1.7976931348623157E+308]
-; AVX-NEXT:    vaddpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
 ; AVX-NEXT:    retq
 entry:
   %add = call <4 x double> @llvm.experimental.constrained.fadd.v4f64(
@@ -672,13 +417,11 @@ define <1 x float> @constrained_vector_fsub_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fsub_v1f32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fsub_v1f32:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [+Inf,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vsubss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %sub = call <1 x float> @llvm.experimental.constrained.fsub.v1f32(
@@ -692,15 +435,13 @@ entry:
 define <2 x double> @constrained_vector_fsub_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fsub_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
-; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fsub_v2f64:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vmovddup {{.*#+}} xmm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
 ; AVX-NEXT:    # xmm0 = mem[0,0]
-; AVX-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %sub = call <2 x double> @llvm.experimental.constrained.fsub.v2f64(
@@ -714,26 +455,12 @@ entry:
 define <3 x float> @constrained_vector_fsub_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fsub_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movaps %xmm1, %xmm2
-; CHECK-NEXT:    subss %xmm0, %xmm2
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    subss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; CHECK-NEXT:    pcmpeqd %xmm0, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fsub_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vsubss %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vsubss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm2
-; AVX-NEXT:    vsubss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vpcmpeqd %xmm0, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %sub = call <3 x float> @llvm.experimental.constrained.fsub.v3f32(
@@ -748,27 +475,15 @@ entry:
 define <3 x double> @constrained_vector_fsub_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fsub_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xorpd %xmm0, %xmm0
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [-1.7976931348623157E+308,0.0E+0]
-; CHECK-NEXT:    subsd %xmm0, %xmm1
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
-; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    movsd %xmm1, -{{[0-9]+}}(%rsp)
-; CHECK-NEXT:    movapd %xmm0, %xmm1
-; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
-; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [-1.7976931348623157E+308,0.0E+0]
+; CHECK-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
+; CHECK-NEXT:    movaps %xmm0, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fsub_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vxorpd %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [-1.7976931348623157E+308,0.0E+0]
-; AVX-NEXT:    vsubsd %xmm0, %xmm1, %xmm0
-; AVX-NEXT:    vmovddup {{.*#+}} xmm1 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
-; AVX-NEXT:    # xmm1 = mem[0,0]
-; AVX-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308,-1.7976931348623157E+308,-1.7976931348623157E+308]
 ; AVX-NEXT:    retq
 entry:
   %sub = call <3 x double> @llvm.experimental.constrained.fsub.v3f64(
@@ -783,16 +498,13 @@ entry:
 define <4 x double> @constrained_vector_fsub_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fsub_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movapd {{.*#+}} xmm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
-; CHECK-NEXT:    movapd %xmm0, %xmm1
-; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308]
+; CHECK-NEXT:    movaps %xmm0, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fsub_v4f64:
 ; AVX:       # %bb.0: # %entry
 ; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [-1.7976931348623157E+308,-1.7976931348623157E+308,-1.7976931348623157E+308,-1.7976931348623157E+308]
-; AVX-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
 ; AVX-NEXT:    retq
 entry:
   %sub = call <4 x double> @llvm.experimental.constrained.fsub.v4f64(
@@ -846,26 +558,12 @@ entry:
 define <3 x float> @constrained_vector_sqrt_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_sqrt_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    sqrtss %xmm0, %xmm1
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    sqrtss %xmm0, %xmm0
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    sqrtss %xmm2, %xmm2
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    sqrtps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sqrt_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vsqrtss %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vsqrtss %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vsqrtss %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vsqrtps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; AVX-NEXT:    retq
 entry:
   %sqrt = call <3 x float> @llvm.experimental.constrained.sqrt.v3f32(
@@ -878,9 +576,9 @@ entry:
 define <3 x double> @constrained_vector_sqrt_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_sqrt_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    sqrtsd %xmm0, %xmm1
 ; CHECK-NEXT:    sqrtpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.2200000000000003E+1,0.0E+0]
+; CHECK-NEXT:    sqrtsd %xmm1, %xmm1
 ; CHECK-NEXT:    movsd %xmm1, -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    movapd %xmm0, %xmm1
 ; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
@@ -890,10 +588,7 @@ define <3 x double> @constrained_vector_sqrt_v3f64() #0 {
 ;
 ; AVX-LABEL: constrained_vector_sqrt_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    vsqrtsd %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vsqrtpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vsqrtpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %sqrt = call <3 x double> @llvm.experimental.constrained.sqrt.v3f64(
@@ -906,8 +601,8 @@ entry:
 define <4 x double> @constrained_vector_sqrt_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_sqrt_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    sqrtpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
 ; CHECK-NEXT:    sqrtpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    sqrtpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sqrt_v4f64:
@@ -1000,48 +695,48 @@ entry:
 define <3 x float> @constrained_vector_pow_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_pow_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    movss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq powf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    movss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq powf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    movss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq powf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_pow_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq powf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq powf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [3.0E+0,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq powf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -1058,11 +753,11 @@ define <3 x double> @constrained_vector_pow_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq pow at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [3.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq pow at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
@@ -1072,9 +767,9 @@ define <3 x double> @constrained_vector_pow_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -1183,24 +878,12 @@ entry:
 define <1 x float> @constrained_vector_powi_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_powi_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powisf2 at PLT
-; CHECK-NEXT:    popq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [7.4088E+4,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_powi_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powisf2 at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [7.4088E+4,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %powi = call <1 x float> @llvm.experimental.constrained.powi.v1f32(
@@ -1214,36 +897,12 @@ entry:
 define <2 x double> @constrained_vector_powi_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_powi_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [7.461846100000001E+4,7.5151448000000004E+4]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_powi_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 32
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    addq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [7.461846100000001E+4,7.5151448000000004E+4]
 ; AVX-NEXT:    retq
 entry:
   %powi = call <2 x double> @llvm.experimental.constrained.powi.v2f64(
@@ -1257,49 +916,12 @@ entry:
 define <3 x float> @constrained_vector_powi_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_powi_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powisf2 at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powisf2 at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powisf2 at PLT
-; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [7.4088E+4,7.9507E+4,8.5184E+4,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_powi_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powisf2 at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powisf2 at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powisf2 at PLT
-; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [7.4088E+4,7.9507E+4,8.5184E+4,u]
 ; AVX-NEXT:    retq
 entry:
   %powi = call <3 x float> @llvm.experimental.constrained.powi.v3f32(
@@ -1313,52 +935,15 @@ entry:
 define <3 x double> @constrained_vector_powi_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_powi_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
-; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [7.4088E+4,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [7.461846100000001E+4,0.0E+0]
+; CHECK-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
-; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
-; CHECK-NEXT:    # xmm1 = mem[0],zero
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_powi_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovups %ymm0, (%rsp) # 32-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    vzeroupper
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vmovups (%rsp), %ymm1 # 32-byte Reload
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [7.4088E+4,7.461846100000001E+4,7.5151448000000004E+4,u]
 ; AVX-NEXT:    retq
 entry:
   %powi = call <3 x double> @llvm.experimental.constrained.powi.v3f64(
@@ -1372,59 +957,13 @@ entry:
 define <4 x double> @constrained_vector_powi_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_powi_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2399999999999999E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2299999999999997E+1,0.0E+0]
-; CHECK-NEXT:    movl $3, %edi
-; CHECK-NEXT:    callq __powidf2 at PLT
-; CHECK-NEXT:    movaps %xmm0, %xmm1
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [7.461846100000001E+4,7.5151448000000004E+4]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [7.568696699999999E+4,7.622502399999999E+4]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_powi_v4f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2399999999999999E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2299999999999997E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    movl $3, %edi
-; AVX-NEXT:    callq __powidf2 at PLT
-; AVX-NEXT:    vunpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vinsertf128 $1, (%rsp), %ymm0, %ymm0 # 16-byte Folded Reload
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [7.461846100000001E+4,7.5151448000000004E+4,7.568696699999999E+4,7.622502399999999E+4]
 ; AVX-NEXT:    retq
 entry:
   %powi = call <4 x double> @llvm.experimental.constrained.powi.v4f64(
@@ -1505,42 +1044,42 @@ entry:
 define <3 x float> @constrained_vector_sin_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_sin_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq sinf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq sinf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq sinf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sin_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq sinf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq sinf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq sinf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -1556,10 +1095,10 @@ define <3 x double> @constrained_vector_sin_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq sin at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq sin at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -1567,9 +1106,9 @@ define <3 x double> @constrained_vector_sin_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -1730,42 +1269,42 @@ entry:
 define <3 x float> @constrained_vector_cos_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_cos_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq cosf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq cosf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq cosf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_cos_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq cosf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq cosf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq cosf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -1781,10 +1320,10 @@ define <3 x double> @constrained_vector_cos_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq cos at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq cos at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -1792,9 +1331,9 @@ define <3 x double> @constrained_vector_cos_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -1955,42 +1494,42 @@ entry:
 define <3 x float> @constrained_vector_exp_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_exp_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq expf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq expf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq expf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_exp_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq expf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq expf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq expf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -2006,10 +1545,10 @@ define <3 x double> @constrained_vector_exp_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq exp at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq exp at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -2017,9 +1556,9 @@ define <3 x double> @constrained_vector_exp_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -2180,42 +1719,42 @@ entry:
 define <3 x float> @constrained_vector_exp2_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_exp2_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq exp2f at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq exp2f at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq exp2f at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_exp2_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq exp2f at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq exp2f at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq exp2f at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -2231,10 +1770,10 @@ define <3 x double> @constrained_vector_exp2_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq exp2 at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq exp2 at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -2242,9 +1781,9 @@ define <3 x double> @constrained_vector_exp2_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -2405,42 +1944,42 @@ entry:
 define <3 x float> @constrained_vector_log_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_log_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq logf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq logf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq logf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_log_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq logf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq logf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq logf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -2456,10 +1995,10 @@ define <3 x double> @constrained_vector_log_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq log at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq log at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -2467,9 +2006,9 @@ define <3 x double> @constrained_vector_log_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -2630,42 +2169,42 @@ entry:
 define <3 x float> @constrained_vector_log10_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_log10_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq log10f at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq log10f at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq log10f at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_log10_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq log10f at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq log10f at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq log10f at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -2681,10 +2220,10 @@ define <3 x double> @constrained_vector_log10_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq log10 at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq log10 at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -2692,9 +2231,9 @@ define <3 x double> @constrained_vector_log10_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -2855,42 +2394,42 @@ entry:
 define <3 x float> @constrained_vector_log2_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_log2_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq log2f at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq log2f at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq log2f at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_log2_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq log2f at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq log2f at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq log2f at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -2906,10 +2445,10 @@ define <3 x double> @constrained_vector_log2_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq log2 at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq log2 at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -2917,9 +2456,9 @@ define <3 x double> @constrained_vector_log2_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -3099,37 +2638,31 @@ entry:
 define <3 x float> @constrained_vector_rint_v3f32_var(ptr %a) #0 {
 ; CHECK-LABEL: constrained_vector_rint_v3f32_var:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
-; CHECK-NEXT:    callq rintf at PLT
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    callq rintf at PLT
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
 ; CHECK-NEXT:    callq rintf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq rintf at PLT
-; CHECK-NEXT:    unpcklps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_rint_v3f32_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $4, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $4, %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vroundss $4, %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vroundps $4, (%rdi), %xmm0
 ; AVX-NEXT:    retq
  entry:
   %b = load <3 x float>, ptr %a
@@ -3145,10 +2678,10 @@ define <3 x double> @constrained_vector_rint_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq rint at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq rint at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -3156,9 +2689,9 @@ define <3 x double> @constrained_vector_rint_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -3166,10 +2699,7 @@ define <3 x double> @constrained_vector_rint_v3f64() #0 {
 ;
 ; AVX-LABEL: constrained_vector_rint_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    vroundsd $4, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $4, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $4, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %rint = call <3 x double> @llvm.experimental.constrained.rint.v3f64(
@@ -3188,10 +2718,10 @@ define <3 x double> @constrained_vector_rint_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq rint at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq rint at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
@@ -3200,9 +2730,9 @@ define <3 x double> @constrained_vector_rint_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -3210,10 +2740,7 @@ define <3 x double> @constrained_vector_rint_v3f64_var(ptr %a) #0 {
 ;
 ; AVX-LABEL: constrained_vector_rint_v3f64_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vroundsd $4, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $4, (%rdi), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $4, (%rdi), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x double>, ptr %a
@@ -3395,37 +2922,31 @@ entry:
 define <3 x float> @constrained_vector_nearbyint_v3f32_var(ptr %a) #0 {
 ; CHECK-LABEL: constrained_vector_nearbyint_v3f32_var:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
-; CHECK-NEXT:    callq nearbyintf at PLT
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    callq nearbyintf at PLT
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
 ; CHECK-NEXT:    callq nearbyintf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq nearbyintf at PLT
-; CHECK-NEXT:    unpcklps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_nearbyint_v3f32_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $12, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $12, %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vroundss $12, %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vroundps $12, (%rdi), %xmm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x float>, ptr %a
@@ -3441,10 +2962,10 @@ define <3 x double> @constrained_vector_nearby_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq nearbyint at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq nearbyint at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -3452,9 +2973,9 @@ define <3 x double> @constrained_vector_nearby_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -3462,10 +2983,7 @@ define <3 x double> @constrained_vector_nearby_v3f64() #0 {
 ;
 ; AVX-LABEL: constrained_vector_nearby_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    vroundsd $12, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $12, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $12, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %nearby = call <3 x double> @llvm.experimental.constrained.nearbyint.v3f64(
@@ -3484,10 +3002,10 @@ define <3 x double> @constrained_vector_nearbyint_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq nearbyint at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq nearbyint at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
@@ -3496,9 +3014,9 @@ define <3 x double> @constrained_vector_nearbyint_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -3506,10 +3024,7 @@ define <3 x double> @constrained_vector_nearbyint_v3f64_var(ptr %a) #0 {
 ;
 ; AVX-LABEL: constrained_vector_nearbyint_v3f64_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vroundsd $12, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $12, (%rdi), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $12, (%rdi), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x double>, ptr %a
@@ -3606,24 +3121,12 @@ entry:
 define <1 x float> @constrained_vector_maxnum_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_maxnum_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmaxf at PLT
-; CHECK-NEXT:    popq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_maxnum_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmaxf at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
   %max = call <1 x float> @llvm.experimental.constrained.maxnum.v1f32(
@@ -3635,36 +3138,12 @@ entry:
 define <2 x double> @constrained_vector_maxnum_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_maxnum_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.3E+1,4.2E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_maxnum_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 32
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    addq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.3E+1,4.2E+1]
 ; AVX-NEXT:    retq
 entry:
   %max = call <2 x double> @llvm.experimental.constrained.maxnum.v2f64(
@@ -3677,49 +3156,12 @@ entry:
 define <3 x float> @constrained_vector_maxnum_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_maxnum_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.5E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmaxf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmaxf at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fmaxf at PLT
-; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.3E+1,4.4E+1,4.5E+1,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_maxnum_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.5E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmaxf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmaxf at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fmaxf at PLT
-; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.3E+1,4.4E+1,4.5E+1,u]
 ; AVX-NEXT:    retq
 entry:
   %max = call <3 x float> @llvm.experimental.constrained.maxnum.v3f32(
@@ -3732,52 +3174,15 @@ entry:
 define <3 x double> @constrained_vector_max_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_max_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
-; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.4E+1,0.0E+0]
+; CHECK-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
-; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
-; CHECK-NEXT:    # xmm1 = mem[0],zero
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_max_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovups %ymm0, (%rsp) # 32-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    vzeroupper
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vmovups (%rsp), %ymm1 # 32-byte Reload
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [4.3E+1,4.4E+1,4.5E+1,u]
 ; AVX-NEXT:    retq
 entry:
   %max = call <3 x double> @llvm.experimental.constrained.maxnum.v3f64(
@@ -3790,59 +3195,13 @@ entry:
 define <4 x double> @constrained_vector_maxnum_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_maxnum_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.7E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.3E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.6E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    callq fmax at PLT
-; CHECK-NEXT:    movaps %xmm0, %xmm1
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.4E+1,4.5E+1]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [4.6E+1,4.7E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_maxnum_v4f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.7E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.3E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.6E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmax at PLT
-; AVX-NEXT:    vunpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vinsertf128 $1, (%rsp), %ymm0, %ymm0 # 16-byte Folded Reload
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [4.4E+1,4.5E+1,4.6E+1,4.7E+1]
 ; AVX-NEXT:    retq
 entry:
   %max = call <4 x double> @llvm.experimental.constrained.maxnum.v4f64(
@@ -3857,24 +3216,12 @@ entry:
 define <1 x float> @constrained_vector_minnum_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_minnum_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pushq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fminf at PLT
-; CHECK-NEXT:    popq %rax
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_minnum_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fminf at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    retq
  entry:
   %min = call <1 x float> @llvm.experimental.constrained.minnum.v1f32(
@@ -3886,36 +3233,12 @@ define <1 x float> @constrained_vector_minnum_v1f32() #0 {
 define <2 x double> @constrained_vector_minnum_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_minnum_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.1E+1,4.0E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_minnum_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 32
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    addq $24, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.1E+1,4.0E+1]
 ; AVX-NEXT:    retq
 entry:
   %min = call <2 x double> @llvm.experimental.constrained.minnum.v2f64(
@@ -3928,49 +3251,12 @@ entry:
 define <3 x float> @constrained_vector_minnum_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_minnum_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.5E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fminf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fminf at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    callq fminf at PLT
-; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.1E+1,4.2E+1,4.3E+1,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_minnum_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.5E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fminf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.1E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fminf at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    callq fminf at PLT
-; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.1E+1,4.2E+1,4.3E+1,u]
 ; AVX-NEXT:    retq
 entry:
   %min = call <3 x float> @llvm.experimental.constrained.minnum.v3f32(
@@ -3983,52 +3269,15 @@ entry:
 define <3 x double> @constrained_vector_min_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_min_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.0E+1,0.0E+0]
 ; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
-; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
-; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
-; CHECK-NEXT:    # xmm1 = mem[0],zero
-; CHECK-NEXT:    addq $24, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_min_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.3E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovups %ymm0, (%rsp) # 32-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    vzeroupper
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vmovups (%rsp), %ymm1 # 32-byte Reload
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [4.0E+1,4.1E+1,4.2E+1,u]
 ; AVX-NEXT:    retq
 entry:
  %min = call <3 x double> @llvm.experimental.constrained.minnum.v3f64(
@@ -4041,59 +3290,13 @@ entry:
 define <4 x double> @constrained_vector_minnum_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_minnum_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    unpcklpd (%rsp), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.7E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.3E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.6E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; CHECK-NEXT:    callq fmin at PLT
-; CHECK-NEXT:    movaps %xmm0, %xmm1
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.0E+1,4.1E+1]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [4.2E+1,4.3E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_minnum_v4f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.7E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.3E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.6E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.2E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.5E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.1E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.4E+1,0.0E+0]
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.0E+1,0.0E+0]
-; AVX-NEXT:    callq fmin at PLT
-; AVX-NEXT:    vunpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vinsertf128 $1, (%rsp), %ymm0, %ymm0 # 16-byte Folded Reload
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [4.0E+1,4.1E+1,4.2E+1,4.3E+1]
 ; AVX-NEXT:    retq
 entry:
   %min = call <4 x double> @llvm.experimental.constrained.minnum.v4f64(
@@ -4108,12 +3311,12 @@ entry:
 define <1 x i32> @constrained_vector_fptosi_v1i32_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v1i32_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v1i32_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
+; AVX-NEXT:    movl $42, %eax
 ; AVX-NEXT:    retq
 entry:
   %result = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f32(
@@ -4125,13 +3328,18 @@ entry:
 define <2 x i32> @constrained_vector_fptosi_v2i32_v2f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v2i32_v2f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttps2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [42,43,0,0]
 ; CHECK-NEXT:    retq
 ;
-; AVX-LABEL: constrained_vector_fptosi_v2i32_v2f32:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttps2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; AVX-NEXT:    retq
+; AVX1-LABEL: constrained_vector_fptosi_v2i32_v2f32:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = [42,43,0,0]
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: constrained_vector_fptosi_v2i32_v2f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbd {{.*#+}} xmm0 = [42,43,0,0]
+; AVX512-NEXT:    retq
 entry:
   %result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32(
                                 <2 x float><float 42.0, float 43.0>,
@@ -4142,25 +3350,18 @@ entry:
 define <3 x i32> @constrained_vector_fptosi_v3i32_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v3i32_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %edx
-; CHECK-NEXT:    movd %edx, %xmm1
-; CHECK-NEXT:    movd %ecx, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43,44,u]
 ; CHECK-NEXT:    retq
 ;
-; AVX-LABEL: constrained_vector_fptosi_v3i32_v3f32:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; AVX-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; AVX-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %edx
-; AVX-NEXT:    vmovd %edx, %xmm0
-; AVX-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX-NEXT:    vpinsrd $2, %eax, %xmm0, %xmm0
-; AVX-NEXT:    retq
+; AVX1-LABEL: constrained_vector_fptosi_v3i32_v3f32:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vmovaps {{.*#+}} xmm0 = [42,43,44,u]
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: constrained_vector_fptosi_v3i32_v3f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbd {{.*#+}} xmm0 = [42,43,44,0]
+; AVX512-NEXT:    retq
 entry:
   %result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f32(
                                 <3 x float><float 42.0, float 43.0,
@@ -4172,13 +3373,18 @@ entry:
 define <4 x i32> @constrained_vector_fptosi_v4i32_v4f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v4i32_v4f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttps2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43,44,45]
 ; CHECK-NEXT:    retq
 ;
-; AVX-LABEL: constrained_vector_fptosi_v4i32_v4f32:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttps2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; AVX-NEXT:    retq
+; AVX1-LABEL: constrained_vector_fptosi_v4i32_v4f32:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vmovaps {{.*#+}} xmm0 = [42,43,44,45]
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: constrained_vector_fptosi_v4i32_v4f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbd {{.*#+}} xmm0 = [42,43,44,45]
+; AVX512-NEXT:    retq
 entry:
   %result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32(
                                 <4 x float><float 42.0, float 43.0,
@@ -4190,12 +3396,12 @@ entry:
 define <1 x i64> @constrained_vector_fptosi_v1i64_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v1i64_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v1i64_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
+; AVX-NEXT:    movl $42, %eax
 ; AVX-NEXT:    retq
 entry:
   %result = call <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f32(
@@ -4207,37 +3413,18 @@ entry:
 define <2 x i64> @constrained_vector_fptosi_v2i64_v2f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v2i64_v2f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm1
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptosi_v2i64_v2f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm0
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
+; AVX1-NEXT:    vmovaps {{.*#+}} xmm0 = [42,43]
 ; AVX1-NEXT:    retq
 ;
-; AVX512F-LABEL: constrained_vector_fptosi_v2i64_v2f32:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptosi_v2i64_v2f32:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vcvttps2qq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0
-; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
-; AVX512DQ-NEXT:    vzeroupper
-; AVX512DQ-NEXT:    retq
+; AVX512-LABEL: constrained_vector_fptosi_v2i64_v2f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} xmm0 = [42,43]
+; AVX512-NEXT:    retq
 entry:
   %result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f32(
                                 <2 x float><float 42.0, float 43.0>,
@@ -4248,33 +3435,19 @@ entry:
 define <3 x i64> @constrained_vector_fptosi_v3i64_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v3i64_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
+; CHECK-NEXT:    movl $42, %eax
+; CHECK-NEXT:    movl $43, %edx
+; CHECK-NEXT:    movl $44, %ecx
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptosi_v3i64_v3f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX1-NEXT:    vmovq %rdx, %xmm0
-; AVX1-NEXT:    vmovq %rcx, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX1-NEXT:    vmovaps {{.*#+}} ymm0 = [42,43,44,u]
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_fptosi_v3i64_v3f32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX512-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX512-NEXT:    vmovq %rdx, %xmm0
-; AVX512-NEXT:    vmovq %rcx, %xmm1
-; AVX512-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512-NEXT:    vmovq %rax, %xmm1
-; AVX512-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} ymm0 = [42,43,44,0]
 ; AVX512-NEXT:    retq
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f32(
@@ -4287,54 +3460,19 @@ entry:
 define <4 x i64> @constrained_vector_fptosi_v4i64_v4f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v4i64_v4f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm1
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm2
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [44,45]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptosi_v4i64_v4f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm0
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm2
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX1-NEXT:    vmovaps {{.*#+}} ymm0 = [42,43,44,45]
 ; AVX1-NEXT:    retq
 ;
-; AVX512F-LABEL: constrained_vector_fptosi_v4i64_v4f32:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm2
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; AVX512F-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptosi_v4i64_v4f32:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps {{.*#+}} xmm0 = [4.2E+1,4.3E+1,4.4E+1,4.5E+1]
-; AVX512DQ-NEXT:    vcvttps2qq %ymm0, %zmm0
-; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
-; AVX512DQ-NEXT:    retq
+; AVX512-LABEL: constrained_vector_fptosi_v4i64_v4f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} ymm0 = [42,43,44,45]
+; AVX512-NEXT:    retq
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32(
                                 <4 x float><float 42.0, float 43.0,
@@ -4346,12 +3484,12 @@ entry:
 define <1 x i32> @constrained_vector_fptosi_v1i32_v1f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v1i32_v1f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v1i32_v1f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
+; AVX-NEXT:    movl $42, %eax
 ; AVX-NEXT:    retq
 entry:
   %result = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f64(
@@ -4364,12 +3502,12 @@ entry:
 define <2 x i32> @constrained_vector_fptosi_v2i32_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v2i32_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttpd2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42,u,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v2i32_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttpd2dqx {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [42,42,42,42]
 ; AVX-NEXT:    retq
 entry:
   %result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f64(
@@ -4381,24 +3519,12 @@ entry:
 define <3 x i32> @constrained_vector_fptosi_v3i32_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v3i32_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %edx
-; CHECK-NEXT:    movd %edx, %xmm1
-; CHECK-NEXT:    movd %ecx, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42,42,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v3i32_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; AVX-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; AVX-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %edx
-; AVX-NEXT:    vmovd %edx, %xmm0
-; AVX-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX-NEXT:    vpinsrd $2, %eax, %xmm0, %xmm0
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [42,42,42,42]
 ; AVX-NEXT:    retq
 entry:
   %result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f64(
@@ -4411,14 +3537,12 @@ entry:
 define <4 x i32> @constrained_vector_fptosi_v4i32_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v4i32_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttpd2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    cvttpd2dq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42,42,42]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v4i32_v4f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttpd2dqy {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [42,42,42,42]
 ; AVX-NEXT:    retq
 entry:
   %result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f64(
@@ -4431,12 +3555,12 @@ entry:
 define <1 x i64> @constrained_vector_fptosi_v1i64_v1f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v1i64_v1f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptosi_v1i64_v1f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
+; AVX-NEXT:    movl $42, %eax
 ; AVX-NEXT:    retq
 entry:
   %result = call <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f64(
@@ -4448,38 +3572,19 @@ entry:
 define <2 x i64> @constrained_vector_fptosi_v2i64_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v2i64_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm1
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptosi_v2i64_v2f64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm0
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
+; AVX1-NEXT:    vmovddup {{.*#+}} xmm0 = [42,42]
+; AVX1-NEXT:    # xmm0 = mem[0,0]
 ; AVX1-NEXT:    retq
 ;
-; AVX512F-LABEL: constrained_vector_fptosi_v2i64_v2f64:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptosi_v2i64_v2f64:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps {{.*#+}} xmm0 = [4.2100000000000001E+1,4.2200000000000003E+1]
-; AVX512DQ-NEXT:    vcvttpd2qq %zmm0, %zmm0
-; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
-; AVX512DQ-NEXT:    vzeroupper
-; AVX512DQ-NEXT:    retq
+; AVX512-LABEL: constrained_vector_fptosi_v2i64_v2f64:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} xmm0 = [42,42]
+; AVX512-NEXT:    retq
 entry:
   %result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(
                                 <2 x double><double 42.1, double 42.2>,
@@ -4490,34 +3595,15 @@ entry:
 define <3 x i64> @constrained_vector_fptosi_v3i64_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v3i64_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
+; CHECK-NEXT:    movl $42, %eax
+; CHECK-NEXT:    movl $42, %edx
+; CHECK-NEXT:    movl $42, %ecx
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptosi_v3i64_v3f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX1-NEXT:    vmovq %rdx, %xmm0
-; AVX1-NEXT:    vmovq %rcx, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptosi_v3i64_v3f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX512-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX512-NEXT:    vmovq %rdx, %xmm0
-; AVX512-NEXT:    vmovq %rcx, %xmm1
-; AVX512-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512-NEXT:    vmovq %rax, %xmm1
-; AVX512-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptosi_v3i64_v3f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f64(
                                 <3 x double><double 42.1, double 42.2,
@@ -4529,54 +3615,14 @@ entry:
 define <4 x i64> @constrained_vector_fptosi_v4i64_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptosi_v4i64_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm1
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm2
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movq %rax, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42]
+; CHECK-NEXT:    movaps %xmm0, %xmm1
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptosi_v4i64_v4f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm0
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vmovq %rax, %xmm2
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX1-NEXT:    retq
-;
-; AVX512F-LABEL: constrained_vector_fptosi_v4i64_v4f64:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm2
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; AVX512F-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptosi_v4i64_v4f64:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps {{.*#+}} ymm0 = [4.2100000000000001E+1,4.2200000000000003E+1,4.2299999999999997E+1,4.2399999999999999E+1]
-; AVX512DQ-NEXT:    vcvttpd2qq %zmm0, %zmm0
-; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
-; AVX512DQ-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptosi_v4i64_v4f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f64(
                                 <4 x double><double 42.1, double 42.2,
@@ -4588,20 +3634,13 @@ entry:
 define <1 x i32> @constrained_vector_fptoui_v1i32_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v1i32_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    # kill: def $eax killed $eax killed $rax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v1i32_v1f32:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    # kill: def $eax killed $eax killed $rax
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v1i32_v1f32:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v1i32_v1f32:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    movl $42, %eax
+; AVX-NEXT:    retq
 entry:
   %result = call <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f32(
                                <1 x float><float 42.0>,
@@ -4612,27 +3651,17 @@ entry:
 define <2 x i32> @constrained_vector_fptoui_v2i32_v2f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v2i32_v2f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [42,43,0,0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v2i32_v2f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX1-NEXT:    vmovd %ecx, %xmm0
-; AVX1-NEXT:    vpinsrd $1, %eax, %xmm0, %xmm0
+; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = [42,43,0,0]
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_fptoui_v2i32_v2f32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,4.3E+1,0.0E+0,0.0E+0]
-; AVX512-NEXT:    vcvttps2udq %zmm0, %zmm0
-; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
-; AVX512-NEXT:    vzeroupper
+; AVX512-NEXT:    vpmovsxbd {{.*#+}} xmm0 = [42,43,0,0]
 ; AVX512-NEXT:    retq
 entry:
   %result = call <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f32(
@@ -4644,34 +3673,17 @@ entry:
 define <3 x i32> @constrained_vector_fptoui_v3i32_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v3i32_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; CHECK-NEXT:    cvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; CHECK-NEXT:    movd %edx, %xmm1
-; CHECK-NEXT:    movd %ecx, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43,44,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v3i32_v3f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX1-NEXT:    vcvttss2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX1-NEXT:    vmovd %edx, %xmm0
-; AVX1-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX1-NEXT:    vpinsrd $2, %eax, %xmm0, %xmm0
+; AVX1-NEXT:    vmovaps {{.*#+}} xmm0 = [42,43,44,u]
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_fptoui_v3i32_v3f32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %edx
-; AVX512-NEXT:    vmovd %edx, %xmm0
-; AVX512-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX512-NEXT:    vpinsrd $2, %eax, %xmm0, %xmm0
+; AVX512-NEXT:    vpmovsxbd {{.*#+}} xmm0 = [42,43,44,0]
 ; AVX512-NEXT:    retq
 entry:
   %result = call <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f32(
@@ -4684,38 +3696,17 @@ entry:
 define <4 x i32> @constrained_vector_fptoui_v4i32_v4f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v4i32_v4f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
-; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [4.2E+1,4.3E+1,4.4E+1,4.5E+1]
-; CHECK-NEXT:    movaps %xmm1, %xmm2
-; CHECK-NEXT:    cmpltps %xmm0, %xmm2
-; CHECK-NEXT:    movaps %xmm2, %xmm3
-; CHECK-NEXT:    andnps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3
-; CHECK-NEXT:    andnps %xmm0, %xmm2
-; CHECK-NEXT:    subps %xmm2, %xmm1
-; CHECK-NEXT:    cvttps2dq %xmm1, %xmm0
-; CHECK-NEXT:    xorps %xmm3, %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43,44,45]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v4i32_v4f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vbroadcastss {{.*#+}} xmm0 = [2.14748365E+9,2.14748365E+9,2.14748365E+9,2.14748365E+9]
-; AVX1-NEXT:    vmovaps {{.*#+}} xmm1 = [4.2E+1,4.3E+1,4.4E+1,4.5E+1]
-; AVX1-NEXT:    vcmpltps %xmm0, %xmm1, %xmm2
-; AVX1-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    vbroadcastss {{.*#+}} xmm4 = [2147483648,2147483648,2147483648,2147483648]
-; AVX1-NEXT:    vblendvps %xmm2, %xmm3, %xmm4, %xmm4
-; AVX1-NEXT:    vblendvps %xmm2, %xmm3, %xmm0, %xmm0
-; AVX1-NEXT:    vsubps %xmm0, %xmm1, %xmm0
-; AVX1-NEXT:    vcvttps2dq %xmm0, %xmm0
-; AVX1-NEXT:    vxorps %xmm4, %xmm0, %xmm0
+; AVX1-NEXT:    vmovaps {{.*#+}} xmm0 = [42,43,44,45]
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_fptoui_v4i32_v4f32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovaps {{.*#+}} xmm0 = [4.2E+1,4.3E+1,4.4E+1,4.5E+1]
-; AVX512-NEXT:    vcvttps2udq %zmm0, %zmm0
-; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
-; AVX512-NEXT:    vzeroupper
+; AVX512-NEXT:    vpmovsxbd {{.*#+}} xmm0 = [42,43,44,45]
 ; AVX512-NEXT:    retq
 entry:
   %result = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f32(
@@ -4728,44 +3719,13 @@ entry:
 define <1 x i64> @constrained_vector_fptoui_v1i64_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v1i64_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm0, %xmm2
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    ja .LBB121_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movaps %xmm2, %xmm1
-; CHECK-NEXT:  .LBB121_2: # %entry
-; CHECK-NEXT:    subss %xmm1, %xmm0
-; CHECK-NEXT:    cvttss2si %xmm0, %rcx
-; CHECK-NEXT:    setbe %al
-; CHECK-NEXT:    movzbl %al, %eax
-; CHECK-NEXT:    shlq $63, %rax
-; CHECK-NEXT:    xorq %rcx, %rax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v1i64_v1f32:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vmovss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm0, %xmm1
-; AVX1-NEXT:    vxorps %xmm2, %xmm2, %xmm2
-; AVX1-NEXT:    ja .LBB121_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovaps %xmm1, %xmm2
-; AVX1-NEXT:  .LBB121_2: # %entry
-; AVX1-NEXT:    vsubss %xmm2, %xmm0, %xmm0
-; AVX1-NEXT:    vcvttss2si %xmm0, %rcx
-; AVX1-NEXT:    setbe %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v1i64_v1f32:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v1i64_v1f32:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    movl $42, %eax
+; AVX-NEXT:    retq
 entry:
   %result = call <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f32(
                                <1 x float><float 42.0>,
@@ -4776,87 +3736,18 @@ entry:
 define <2 x i64> @constrained_vector_fptoui_v2i64_v2f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v2i64_v2f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm2, %xmm1
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    xorps %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB122_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movaps %xmm1, %xmm3
-; CHECK-NEXT:  .LBB122_2: # %entry
-; CHECK-NEXT:    subss %xmm3, %xmm2
-; CHECK-NEXT:    cvttss2si %xmm2, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm2
-; CHECK-NEXT:    movss {{.*#+}} xmm3 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm3, %xmm1
-; CHECK-NEXT:    ja .LBB122_4
-; CHECK-NEXT:  # %bb.3: # %entry
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:  .LBB122_4: # %entry
-; CHECK-NEXT:    subss %xmm0, %xmm3
-; CHECK-NEXT:    cvttss2si %xmm3, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm0
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v2i64_v2f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vmovss {{.*#+}} xmm0 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm2, %xmm0
-; AVX1-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB122_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm3
-; AVX1-NEXT:  .LBB122_2: # %entry
-; AVX1-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttss2si %xmm2, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm2
-; AVX1-NEXT:    vmovss {{.*#+}} xmm3 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm3, %xmm0
-; AVX1-NEXT:    ja .LBB122_4
-; AVX1-NEXT:  # %bb.3: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm1
-; AVX1-NEXT:  .LBB122_4: # %entry
-; AVX1-NEXT:    vsubss %xmm1, %xmm3, %xmm0
-; AVX1-NEXT:    vcvttss2si %xmm0, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm0
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; AVX1-NEXT:    vmovaps {{.*#+}} xmm0 = [42,43]
 ; AVX1-NEXT:    retq
 ;
-; AVX512F-LABEL: constrained_vector_fptoui_v2i64_v2f32:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptoui_v2i64_v2f32:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vcvttps2uqq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0
-; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
-; AVX512DQ-NEXT:    vzeroupper
-; AVX512DQ-NEXT:    retq
+; AVX512-LABEL: constrained_vector_fptoui_v2i64_v2f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} xmm0 = [42,43]
+; AVX512-NEXT:    retq
 entry:
   %result = call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f32(
                                 <2 x float><float 42.0, float 43.0>,
@@ -4867,107 +3758,19 @@ entry:
 define <3 x i64> @constrained_vector_fptoui_v3i64_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v3i64_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm2, %xmm1
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    xorps %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB123_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movaps %xmm1, %xmm3
-; CHECK-NEXT:  .LBB123_2: # %entry
-; CHECK-NEXT:    subss %xmm3, %xmm2
-; CHECK-NEXT:    cvttss2si %xmm2, %rcx
-; CHECK-NEXT:    setbe %al
-; CHECK-NEXT:    movzbl %al, %eax
-; CHECK-NEXT:    shlq $63, %rax
-; CHECK-NEXT:    xorq %rcx, %rax
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm2, %xmm1
-; CHECK-NEXT:    xorps %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB123_4
-; CHECK-NEXT:  # %bb.3: # %entry
-; CHECK-NEXT:    movaps %xmm1, %xmm3
-; CHECK-NEXT:  .LBB123_4: # %entry
-; CHECK-NEXT:    subss %xmm3, %xmm2
-; CHECK-NEXT:    cvttss2si %xmm2, %rcx
-; CHECK-NEXT:    setbe %dl
-; CHECK-NEXT:    movzbl %dl, %edx
-; CHECK-NEXT:    shlq $63, %rdx
-; CHECK-NEXT:    xorq %rcx, %rdx
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm2, %xmm1
-; CHECK-NEXT:    ja .LBB123_6
-; CHECK-NEXT:  # %bb.5: # %entry
-; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:  .LBB123_6: # %entry
-; CHECK-NEXT:    subss %xmm0, %xmm2
-; CHECK-NEXT:    cvttss2si %xmm2, %rsi
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rsi, %rcx
+; CHECK-NEXT:    movl $42, %eax
+; CHECK-NEXT:    movl $43, %edx
+; CHECK-NEXT:    movl $44, %ecx
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v3i64_v3f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovss {{.*#+}} xmm2 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vmovss {{.*#+}} xmm0 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm2, %xmm0
-; AVX1-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB123_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm3
-; AVX1-NEXT:  .LBB123_2: # %entry
-; AVX1-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttss2si %xmm2, %rcx
-; AVX1-NEXT:    setbe %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
-; AVX1-NEXT:    vmovss {{.*#+}} xmm2 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm2, %xmm0
-; AVX1-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB123_4
-; AVX1-NEXT:  # %bb.3: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm3
-; AVX1-NEXT:  .LBB123_4: # %entry
-; AVX1-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttss2si %xmm2, %rdx
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rdx, %rcx
-; AVX1-NEXT:    vmovss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm2, %xmm0
-; AVX1-NEXT:    ja .LBB123_6
-; AVX1-NEXT:  # %bb.5: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm1
-; AVX1-NEXT:  .LBB123_6: # %entry
-; AVX1-NEXT:    vsubss %xmm1, %xmm2, %xmm0
-; AVX1-NEXT:    vcvttss2si %xmm0, %rdx
-; AVX1-NEXT:    setbe %sil
-; AVX1-NEXT:    movzbl %sil, %esi
-; AVX1-NEXT:    shlq $63, %rsi
-; AVX1-NEXT:    xorq %rdx, %rsi
-; AVX1-NEXT:    vmovq %rsi, %xmm0
-; AVX1-NEXT:    vmovq %rcx, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX1-NEXT:    vmovaps {{.*#+}} ymm0 = [42,43,44,u]
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_fptoui_v3i64_v3f32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX512-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX512-NEXT:    vmovq %rdx, %xmm0
-; AVX512-NEXT:    vmovq %rcx, %xmm1
-; AVX512-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512-NEXT:    vmovq %rax, %xmm1
-; AVX512-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} ymm0 = [42,43,44,0]
 ; AVX512-NEXT:    retq
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f32(
@@ -4980,152 +3783,19 @@ entry:
 define <4 x i64> @constrained_vector_fptoui_v4i64_v4f32() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v4i64_v4f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm0, %xmm2
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    xorps %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB124_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movaps %xmm2, %xmm3
-; CHECK-NEXT:  .LBB124_2: # %entry
-; CHECK-NEXT:    subss %xmm3, %xmm0
-; CHECK-NEXT:    cvttss2si %xmm0, %rcx
-; CHECK-NEXT:    setbe %al
-; CHECK-NEXT:    movzbl %al, %eax
-; CHECK-NEXT:    shlq $63, %rax
-; CHECK-NEXT:    xorq %rcx, %rax
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm0, %xmm2
-; CHECK-NEXT:    xorps %xmm4, %xmm4
-; CHECK-NEXT:    ja .LBB124_4
-; CHECK-NEXT:  # %bb.3: # %entry
-; CHECK-NEXT:    movaps %xmm2, %xmm4
-; CHECK-NEXT:  .LBB124_4: # %entry
-; CHECK-NEXT:    movq %rax, %xmm3
-; CHECK-NEXT:    subss %xmm4, %xmm0
-; CHECK-NEXT:    cvttss2si %xmm0, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm0
-; CHECK-NEXT:    movss {{.*#+}} xmm4 = [4.5E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm4, %xmm2
-; CHECK-NEXT:    xorps %xmm5, %xmm5
-; CHECK-NEXT:    ja .LBB124_6
-; CHECK-NEXT:  # %bb.5: # %entry
-; CHECK-NEXT:    movaps %xmm2, %xmm5
-; CHECK-NEXT:  .LBB124_6: # %entry
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; CHECK-NEXT:    subss %xmm5, %xmm4
-; CHECK-NEXT:    cvttss2si %xmm4, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm3
-; CHECK-NEXT:    movss {{.*#+}} xmm4 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    comiss %xmm4, %xmm2
-; CHECK-NEXT:    ja .LBB124_8
-; CHECK-NEXT:  # %bb.7: # %entry
-; CHECK-NEXT:    movaps %xmm2, %xmm1
-; CHECK-NEXT:  .LBB124_8: # %entry
-; CHECK-NEXT:    subss %xmm1, %xmm4
-; CHECK-NEXT:    cvttss2si %xmm4, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,43]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [44,45]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v4i64_v4f32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovss {{.*#+}} xmm2 = [4.5E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vmovss {{.*#+}} xmm0 = [9.22337203E+18,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm2, %xmm0
-; AVX1-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:    vxorps %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB124_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm3
-; AVX1-NEXT:  .LBB124_2: # %entry
-; AVX1-NEXT:    vsubss %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttss2si %xmm2, %rcx
-; AVX1-NEXT:    setbe %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
-; AVX1-NEXT:    vmovss {{.*#+}} xmm3 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm3, %xmm0
-; AVX1-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX1-NEXT:    ja .LBB124_4
-; AVX1-NEXT:  # %bb.3: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm4
-; AVX1-NEXT:  .LBB124_4: # %entry
-; AVX1-NEXT:    vmovq %rax, %xmm2
-; AVX1-NEXT:    vsubss %xmm4, %xmm3, %xmm3
-; AVX1-NEXT:    vcvttss2si %xmm3, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm3
-; AVX1-NEXT:    vmovss {{.*#+}} xmm4 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm4, %xmm0
-; AVX1-NEXT:    vxorps %xmm5, %xmm5, %xmm5
-; AVX1-NEXT:    ja .LBB124_6
-; AVX1-NEXT:  # %bb.5: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm5
-; AVX1-NEXT:  .LBB124_6: # %entry
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
-; AVX1-NEXT:    vsubss %xmm5, %xmm4, %xmm3
-; AVX1-NEXT:    vcvttss2si %xmm3, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm3
-; AVX1-NEXT:    vmovss {{.*#+}} xmm4 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX1-NEXT:    vcomiss %xmm4, %xmm0
-; AVX1-NEXT:    ja .LBB124_8
-; AVX1-NEXT:  # %bb.7: # %entry
-; AVX1-NEXT:    vmovaps %xmm0, %xmm1
-; AVX1-NEXT:  .LBB124_8: # %entry
-; AVX1-NEXT:    vsubss %xmm1, %xmm4, %xmm0
-; AVX1-NEXT:    vcvttss2si %xmm0, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm0
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
+; AVX1-NEXT:    vmovaps {{.*#+}} ymm0 = [42,43,44,45]
 ; AVX1-NEXT:    retq
 ;
-; AVX512F-LABEL: constrained_vector_fptoui_v4i64_v4f32:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vcvttss2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm2
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; AVX512F-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptoui_v4i64_v4f32:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps {{.*#+}} xmm0 = [4.2E+1,4.3E+1,4.4E+1,4.5E+1]
-; AVX512DQ-NEXT:    vcvttps2uqq %ymm0, %zmm0
-; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
-; AVX512DQ-NEXT:    retq
+; AVX512-LABEL: constrained_vector_fptoui_v4i64_v4f32:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} ymm0 = [42,43,44,45]
+; AVX512-NEXT:    retq
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f32(
                                 <4 x float><float 42.0, float 43.0,
@@ -5137,20 +3807,13 @@ entry:
 define <1 x i32> @constrained_vector_fptoui_v1i32_v1f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v1i32_v1f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    # kill: def $eax killed $eax killed $rax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v1i32_v1f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    # kill: def $eax killed $eax killed $rax
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v1i32_v1f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v1i32_v1f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    movl $42, %eax
+; AVX-NEXT:    retq
 entry:
   %result = call <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f64(
                                <1 x double><double 42.1>,
@@ -5161,28 +3824,13 @@ entry:
 define <2 x i32> @constrained_vector_fptoui_v2i32_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v2i32_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42,u,u]
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v2i32_v2f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX1-NEXT:    vmovd %ecx, %xmm0
-; AVX1-NEXT:    vpinsrd $1, %eax, %xmm0, %xmm0
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v2i32_v2f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovaps {{.*#+}} ymm0 = [4.2100000000000001E+1,4.2200000000000003E+1,0.0E+0,0.0E+0]
-; AVX512-NEXT:    vcvttpd2udq %zmm0, %ymm0
-; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v2i32_v2f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f64(
                                 <2 x double><double 42.1, double 42.2>,
@@ -5193,35 +3841,13 @@ entry:
 define <3 x i32> @constrained_vector_fptoui_v3i32_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v3i32_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; CHECK-NEXT:    movd %edx, %xmm1
-; CHECK-NEXT:    movd %ecx, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42,42,u]
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v3i32_v3f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX1-NEXT:    vcvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX1-NEXT:    vmovd %edx, %xmm0
-; AVX1-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX1-NEXT:    vpinsrd $2, %eax, %xmm0, %xmm0
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v3i32_v3f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %eax
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ecx
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %edx
-; AVX512-NEXT:    vmovd %edx, %xmm0
-; AVX512-NEXT:    vpinsrd $1, %ecx, %xmm0, %xmm0
-; AVX512-NEXT:    vpinsrd $2, %eax, %xmm0, %xmm0
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v3i32_v3f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f64(
                                 <3 x double><double 42.1, double 42.2,
@@ -5233,44 +3859,13 @@ entry:
 define <4 x i32> @constrained_vector_fptoui_v4i32_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v4i32_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm0
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm1
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm2
-; CHECK-NEXT:    cvttsd2si {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; CHECK-NEXT:    movd %eax, %xmm0
-; CHECK-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42,42,42]
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v4i32_v4f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [2.147483648E+9,2.147483648E+9,2.147483648E+9,2.147483648E+9]
-; AVX1-NEXT:    vmovapd {{.*#+}} ymm1 = [4.2100000000000001E+1,4.2200000000000003E+1,4.2299999999999997E+1,4.2399999999999999E+1]
-; AVX1-NEXT:    vcmpltpd %ymm0, %ymm1, %ymm2
-; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm3
-; AVX1-NEXT:    vshufps {{.*#+}} xmm3 = xmm2[0,2],xmm3[0,2]
-; AVX1-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX1-NEXT:    vbroadcastss {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
-; AVX1-NEXT:    vblendvps %xmm3, %xmm4, %xmm5, %xmm3
-; AVX1-NEXT:    vxorps %xmm4, %xmm4, %xmm4
-; AVX1-NEXT:    vblendvpd %ymm2, %ymm4, %ymm0, %ymm0
-; AVX1-NEXT:    vsubpd %ymm0, %ymm1, %ymm0
-; AVX1-NEXT:    vcvttpd2dq %ymm0, %xmm0
-; AVX1-NEXT:    vxorpd %xmm3, %xmm0, %xmm0
-; AVX1-NEXT:    vzeroupper
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v4i32_v4f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovaps {{.*#+}} ymm0 = [4.2100000000000001E+1,4.2200000000000003E+1,4.2299999999999997E+1,4.2399999999999999E+1]
-; AVX512-NEXT:    vcvttpd2udq %zmm0, %ymm0
-; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v4i32_v4f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastss {{.*#+}} xmm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f64(
                                 <4 x double><double 42.1, double 42.2,
@@ -5282,44 +3877,13 @@ entry:
 define <1 x i64> @constrained_vector_fptoui_v1i64_v1f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v1i64_v1f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
-; CHECK-NEXT:    comisd %xmm0, %xmm2
-; CHECK-NEXT:    xorpd %xmm1, %xmm1
-; CHECK-NEXT:    ja .LBB129_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movapd %xmm2, %xmm1
-; CHECK-NEXT:  .LBB129_2: # %entry
-; CHECK-NEXT:    subsd %xmm1, %xmm0
-; CHECK-NEXT:    cvttsd2si %xmm0, %rcx
-; CHECK-NEXT:    setbe %al
-; CHECK-NEXT:    movzbl %al, %eax
-; CHECK-NEXT:    shlq $63, %rax
-; CHECK-NEXT:    xorq %rcx, %rax
+; CHECK-NEXT:    movl $42, %eax
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v1i64_v1f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm0, %xmm1
-; AVX1-NEXT:    vxorpd %xmm2, %xmm2, %xmm2
-; AVX1-NEXT:    ja .LBB129_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovapd %xmm1, %xmm2
-; AVX1-NEXT:  .LBB129_2: # %entry
-; AVX1-NEXT:    vsubsd %xmm2, %xmm0, %xmm0
-; AVX1-NEXT:    vcvttsd2si %xmm0, %rcx
-; AVX1-NEXT:    setbe %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v1i64_v1f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v1i64_v1f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    movl $42, %eax
+; AVX-NEXT:    retq
 entry:
   %result = call <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f64(
                                <1 x double><double 42.1>,
@@ -5330,88 +3894,19 @@ entry:
 define <2 x i64> @constrained_vector_fptoui_v2i64_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v2i64_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; CHECK-NEXT:    comisd %xmm2, %xmm1
-; CHECK-NEXT:    xorpd %xmm0, %xmm0
-; CHECK-NEXT:    xorpd %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB130_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movapd %xmm1, %xmm3
-; CHECK-NEXT:  .LBB130_2: # %entry
-; CHECK-NEXT:    subsd %xmm3, %xmm2
-; CHECK-NEXT:    cvttsd2si %xmm2, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm2
-; CHECK-NEXT:    movsd {{.*#+}} xmm3 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    comisd %xmm3, %xmm1
-; CHECK-NEXT:    ja .LBB130_4
-; CHECK-NEXT:  # %bb.3: # %entry
-; CHECK-NEXT:    movapd %xmm1, %xmm0
-; CHECK-NEXT:  .LBB130_4: # %entry
-; CHECK-NEXT:    subsd %xmm0, %xmm3
-; CHECK-NEXT:    cvttsd2si %xmm3, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm0
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_fptoui_v2i64_v2f64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm2 = [4.2200000000000003E+1,0.0E+0]
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm2, %xmm0
-; AVX1-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:    vxorpd %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB130_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm3
-; AVX1-NEXT:  .LBB130_2: # %entry
-; AVX1-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttsd2si %xmm2, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm2
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm3 = [4.2100000000000001E+1,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm3, %xmm0
-; AVX1-NEXT:    ja .LBB130_4
-; AVX1-NEXT:  # %bb.3: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm1
-; AVX1-NEXT:  .LBB130_4: # %entry
-; AVX1-NEXT:    vsubsd %xmm1, %xmm3, %xmm0
-; AVX1-NEXT:    vcvttsd2si %xmm0, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm0
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; AVX1-NEXT:    vmovddup {{.*#+}} xmm0 = [42,42]
+; AVX1-NEXT:    # xmm0 = mem[0,0]
 ; AVX1-NEXT:    retq
 ;
-; AVX512F-LABEL: constrained_vector_fptoui_v2i64_v2f64:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    retq
-;
-; AVX512DQ-LABEL: constrained_vector_fptoui_v2i64_v2f64:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps {{.*#+}} xmm0 = [4.2100000000000001E+1,4.2200000000000003E+1]
-; AVX512DQ-NEXT:    vcvttpd2uqq %zmm0, %zmm0
-; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
-; AVX512DQ-NEXT:    vzeroupper
-; AVX512DQ-NEXT:    retq
+; AVX512-LABEL: constrained_vector_fptoui_v2i64_v2f64:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpmovsxbq {{.*#+}} xmm0 = [42,42]
+; AVX512-NEXT:    retq
 entry:
   %result = call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(
                                 <2 x double><double 42.1, double 42.2>,
@@ -5422,108 +3917,15 @@ entry:
 define <3 x i64> @constrained_vector_fptoui_v3i64_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v3i64_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [9.2233720368547758E+18,0.0E+0]
-; CHECK-NEXT:    comisd %xmm2, %xmm1
-; CHECK-NEXT:    xorpd %xmm0, %xmm0
-; CHECK-NEXT:    xorpd %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB131_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movapd %xmm1, %xmm3
-; CHECK-NEXT:  .LBB131_2: # %entry
-; CHECK-NEXT:    subsd %xmm3, %xmm2
-; CHECK-NEXT:    cvttsd2si %xmm2, %rcx
-; CHECK-NEXT:    setbe %al
-; CHECK-NEXT:    movzbl %al, %eax
-; CHECK-NEXT:    shlq $63, %rax
-; CHECK-NEXT:    xorq %rcx, %rax
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    comisd %xmm2, %xmm1
-; CHECK-NEXT:    xorpd %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB131_4
-; CHECK-NEXT:  # %bb.3: # %entry
-; CHECK-NEXT:    movapd %xmm1, %xmm3
-; CHECK-NEXT:  .LBB131_4: # %entry
-; CHECK-NEXT:    subsd %xmm3, %xmm2
-; CHECK-NEXT:    cvttsd2si %xmm2, %rcx
-; CHECK-NEXT:    setbe %dl
-; CHECK-NEXT:    movzbl %dl, %edx
-; CHECK-NEXT:    shlq $63, %rdx
-; CHECK-NEXT:    xorq %rcx, %rdx
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [4.2299999999999997E+1,0.0E+0]
-; CHECK-NEXT:    comisd %xmm2, %xmm1
-; CHECK-NEXT:    ja .LBB131_6
-; CHECK-NEXT:  # %bb.5: # %entry
-; CHECK-NEXT:    movapd %xmm1, %xmm0
-; CHECK-NEXT:  .LBB131_6: # %entry
-; CHECK-NEXT:    subsd %xmm0, %xmm2
-; CHECK-NEXT:    cvttsd2si %xmm2, %rsi
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rsi, %rcx
+; CHECK-NEXT:    movl $42, %eax
+; CHECK-NEXT:    movl $42, %edx
+; CHECK-NEXT:    movl $42, %ecx
 ; CHECK-NEXT:    retq
 ;
-; AVX1-LABEL: constrained_vector_fptoui_v3i64_v3f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm2 = [4.2299999999999997E+1,0.0E+0]
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm2, %xmm0
-; AVX1-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:    vxorpd %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB131_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm3
-; AVX1-NEXT:  .LBB131_2: # %entry
-; AVX1-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttsd2si %xmm2, %rcx
-; AVX1-NEXT:    setbe %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm2 = [4.2100000000000001E+1,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm2, %xmm0
-; AVX1-NEXT:    vxorpd %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB131_4
-; AVX1-NEXT:  # %bb.3: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm3
-; AVX1-NEXT:  .LBB131_4: # %entry
-; AVX1-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttsd2si %xmm2, %rdx
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rdx, %rcx
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm2 = [4.2200000000000003E+1,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm2, %xmm0
-; AVX1-NEXT:    ja .LBB131_6
-; AVX1-NEXT:  # %bb.5: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm1
-; AVX1-NEXT:  .LBB131_6: # %entry
-; AVX1-NEXT:    vsubsd %xmm1, %xmm2, %xmm0
-; AVX1-NEXT:    vcvttsd2si %xmm0, %rdx
-; AVX1-NEXT:    setbe %sil
-; AVX1-NEXT:    movzbl %sil, %esi
-; AVX1-NEXT:    shlq $63, %rsi
-; AVX1-NEXT:    xorq %rdx, %rsi
-; AVX1-NEXT:    vmovq %rsi, %xmm0
-; AVX1-NEXT:    vmovq %rcx, %xmm1
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX1-NEXT:    vmovq %rax, %xmm1
-; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX1-NEXT:    retq
-;
-; AVX512-LABEL: constrained_vector_fptoui_v3i64_v3f64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rcx
-; AVX512-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rdx
-; AVX512-NEXT:    vmovq %rdx, %xmm0
-; AVX512-NEXT:    vmovq %rcx, %xmm1
-; AVX512-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512-NEXT:    vmovq %rax, %xmm1
-; AVX512-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
-; AVX512-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v3i64_v3f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f64(
                                 <3 x double><double 42.1, double 42.2,
@@ -5535,152 +3937,14 @@ entry:
 define <4 x i64> @constrained_vector_fptoui_v4i64_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptoui_v4i64_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [9.2233720368547758E+18,0.0E+0]
-; CHECK-NEXT:    comisd %xmm0, %xmm2
-; CHECK-NEXT:    xorpd %xmm1, %xmm1
-; CHECK-NEXT:    xorpd %xmm3, %xmm3
-; CHECK-NEXT:    ja .LBB132_2
-; CHECK-NEXT:  # %bb.1: # %entry
-; CHECK-NEXT:    movapd %xmm2, %xmm3
-; CHECK-NEXT:  .LBB132_2: # %entry
-; CHECK-NEXT:    subsd %xmm3, %xmm0
-; CHECK-NEXT:    cvttsd2si %xmm0, %rcx
-; CHECK-NEXT:    setbe %al
-; CHECK-NEXT:    movzbl %al, %eax
-; CHECK-NEXT:    shlq $63, %rax
-; CHECK-NEXT:    xorq %rcx, %rax
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    comisd %xmm0, %xmm2
-; CHECK-NEXT:    xorpd %xmm4, %xmm4
-; CHECK-NEXT:    ja .LBB132_4
-; CHECK-NEXT:  # %bb.3: # %entry
-; CHECK-NEXT:    movapd %xmm2, %xmm4
-; CHECK-NEXT:  .LBB132_4: # %entry
-; CHECK-NEXT:    movq %rax, %xmm3
-; CHECK-NEXT:    subsd %xmm4, %xmm0
-; CHECK-NEXT:    cvttsd2si %xmm0, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm0
-; CHECK-NEXT:    movsd {{.*#+}} xmm4 = [4.2399999999999999E+1,0.0E+0]
-; CHECK-NEXT:    comisd %xmm4, %xmm2
-; CHECK-NEXT:    xorpd %xmm5, %xmm5
-; CHECK-NEXT:    ja .LBB132_6
-; CHECK-NEXT:  # %bb.5: # %entry
-; CHECK-NEXT:    movapd %xmm2, %xmm5
-; CHECK-NEXT:  .LBB132_6: # %entry
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; CHECK-NEXT:    subsd %xmm5, %xmm4
-; CHECK-NEXT:    cvttsd2si %xmm4, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm3
-; CHECK-NEXT:    movsd {{.*#+}} xmm4 = [4.2299999999999997E+1,0.0E+0]
-; CHECK-NEXT:    comisd %xmm4, %xmm2
-; CHECK-NEXT:    ja .LBB132_8
-; CHECK-NEXT:  # %bb.7: # %entry
-; CHECK-NEXT:    movapd %xmm2, %xmm1
-; CHECK-NEXT:  .LBB132_8: # %entry
-; CHECK-NEXT:    subsd %xmm1, %xmm4
-; CHECK-NEXT:    cvttsd2si %xmm4, %rax
-; CHECK-NEXT:    setbe %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    shlq $63, %rcx
-; CHECK-NEXT:    xorq %rax, %rcx
-; CHECK-NEXT:    movq %rcx, %xmm1
-; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
-; CHECK-NEXT:    retq
-;
-; AVX1-LABEL: constrained_vector_fptoui_v4i64_v4f64:
-; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm2 = [4.2399999999999999E+1,0.0E+0]
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = [9.2233720368547758E+18,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm2, %xmm0
-; AVX1-NEXT:    vxorpd %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:    vxorpd %xmm3, %xmm3, %xmm3
-; AVX1-NEXT:    ja .LBB132_2
-; AVX1-NEXT:  # %bb.1: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm3
-; AVX1-NEXT:  .LBB132_2: # %entry
-; AVX1-NEXT:    vsubsd %xmm3, %xmm2, %xmm2
-; AVX1-NEXT:    vcvttsd2si %xmm2, %rcx
-; AVX1-NEXT:    setbe %al
-; AVX1-NEXT:    movzbl %al, %eax
-; AVX1-NEXT:    shlq $63, %rax
-; AVX1-NEXT:    xorq %rcx, %rax
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm3 = [4.2299999999999997E+1,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm3, %xmm0
-; AVX1-NEXT:    vxorpd %xmm4, %xmm4, %xmm4
-; AVX1-NEXT:    ja .LBB132_4
-; AVX1-NEXT:  # %bb.3: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm4
-; AVX1-NEXT:  .LBB132_4: # %entry
-; AVX1-NEXT:    vmovq %rax, %xmm2
-; AVX1-NEXT:    vsubsd %xmm4, %xmm3, %xmm3
-; AVX1-NEXT:    vcvttsd2si %xmm3, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm3
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm4 = [4.2200000000000003E+1,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm4, %xmm0
-; AVX1-NEXT:    vxorpd %xmm5, %xmm5, %xmm5
-; AVX1-NEXT:    ja .LBB132_6
-; AVX1-NEXT:  # %bb.5: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm5
-; AVX1-NEXT:  .LBB132_6: # %entry
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
-; AVX1-NEXT:    vsubsd %xmm5, %xmm4, %xmm3
-; AVX1-NEXT:    vcvttsd2si %xmm3, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm3
-; AVX1-NEXT:    vmovsd {{.*#+}} xmm4 = [4.2100000000000001E+1,0.0E+0]
-; AVX1-NEXT:    vcomisd %xmm4, %xmm0
-; AVX1-NEXT:    ja .LBB132_8
-; AVX1-NEXT:  # %bb.7: # %entry
-; AVX1-NEXT:    vmovapd %xmm0, %xmm1
-; AVX1-NEXT:  .LBB132_8: # %entry
-; AVX1-NEXT:    vsubsd %xmm1, %xmm4, %xmm0
-; AVX1-NEXT:    vcvttsd2si %xmm0, %rax
-; AVX1-NEXT:    setbe %cl
-; AVX1-NEXT:    movzbl %cl, %ecx
-; AVX1-NEXT:    shlq $63, %rcx
-; AVX1-NEXT:    xorq %rax, %rcx
-; AVX1-NEXT:    vmovq %rcx, %xmm0
-; AVX1-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
-; AVX1-NEXT:    retq
-;
-; AVX512F-LABEL: constrained_vector_fptoui_v4i64_v4f64:
-; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm0
-; AVX512F-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
-; AVX512F-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm1
-; AVX512F-NEXT:    vcvttsd2usi {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %rax
-; AVX512F-NEXT:    vmovq %rax, %xmm2
-; AVX512F-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
-; AVX512F-NEXT:    vinserti128 $1, %xmm0, %ymm1, %ymm0
-; AVX512F-NEXT:    retq
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [42,42]
+; CHECK-NEXT:    movaps %xmm0, %xmm1
+; CHECK-NEXT:    retq
 ;
-; AVX512DQ-LABEL: constrained_vector_fptoui_v4i64_v4f64:
-; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps {{.*#+}} ymm0 = [4.2100000000000001E+1,4.2200000000000003E+1,4.2299999999999997E+1,4.2399999999999999E+1]
-; AVX512DQ-NEXT:    vcvttpd2uqq %zmm0, %zmm0
-; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
-; AVX512DQ-NEXT:    retq
+; AVX-LABEL: constrained_vector_fptoui_v4i64_v4f64:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vbroadcastsd {{.*#+}} ymm0 = [42,42,42,42]
+; AVX-NEXT:    retq
 entry:
   %result = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f64(
                                 <4 x double><double 42.1, double 42.2,
@@ -5693,14 +3957,12 @@ entry:
 define <1 x float> @constrained_vector_fptrunc_v1f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptrunc_v1f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    cvtsd2ss %xmm0, %xmm0
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.20999985E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptrunc_v1f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    vcvtsd2ss %xmm0, %xmm0, %xmm0
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.20999985E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %result = call <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(
@@ -5713,12 +3975,12 @@ entry:
 define <2 x float> @constrained_vector_fptrunc_v2f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptrunc_v2f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvtpd2ps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.20999985E+1,4.22000008E+1,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptrunc_v2f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvtpd2psx {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.20999985E+1,4.22000008E+1,0.0E+0,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %result = call <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
@@ -5731,26 +3993,12 @@ entry:
 define <3 x float> @constrained_vector_fptrunc_v3f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptrunc_v3f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2299999999999997E+1,0.0E+0]
-; CHECK-NEXT:    cvtsd2ss %xmm0, %xmm1
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    cvtsd2ss %xmm0, %xmm0
-; CHECK-NEXT:    movsd {{.*#+}} xmm2 = [4.2200000000000003E+1,0.0E+0]
-; CHECK-NEXT:    cvtsd2ss %xmm2, %xmm2
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.20999985E+1,4.22000008E+1,4.22999992E+1,u]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptrunc_v3f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2299999999999997E+1,0.0E+0]
-; AVX-NEXT:    vcvtsd2ss %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovsd {{.*#+}} xmm1 = [4.2100000000000001E+1,0.0E+0]
-; AVX-NEXT:    vcvtsd2ss %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovsd {{.*#+}} xmm2 = [4.2200000000000003E+1,0.0E+0]
-; AVX-NEXT:    vcvtsd2ss %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.20999985E+1,4.22000008E+1,4.22999992E+1,u]
 ; AVX-NEXT:    retq
 entry:
   %result = call <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64(
@@ -5764,14 +4012,12 @@ entry:
 define <4 x float> @constrained_vector_fptrunc_v4f64() #0 {
 ; CHECK-LABEL: constrained_vector_fptrunc_v4f64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvtpd2ps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    cvtpd2ps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.20999985E+1,4.22000008E+1,4.22999992E+1,4.24000015E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fptrunc_v4f64:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvtpd2psy {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.20999985E+1,4.22000008E+1,4.22999992E+1,4.24000015E+1]
 ; AVX-NEXT:    retq
 entry:
   %result = call <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(
@@ -5785,14 +4031,12 @@ entry:
 define <1 x double> @constrained_vector_fpext_v1f32() #0 {
 ; CHECK-LABEL: constrained_vector_fpext_v1f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    cvtss2sd %xmm0, %xmm0
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fpext_v1f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
+; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; AVX-NEXT:    retq
 entry:
   %result = call <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(
@@ -5804,12 +4048,12 @@ entry:
 define <2 x double> @constrained_vector_fpext_v2f32() #0 {
 ; CHECK-LABEL: constrained_vector_fpext_v2f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvtps2pd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.2E+1,4.3E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fpext_v2f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvtps2pd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; AVX-NEXT:    vmovaps {{.*#+}} xmm0 = [4.2E+1,4.3E+1]
 ; AVX-NEXT:    retq
 entry:
   %result = call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(
@@ -5821,27 +4065,15 @@ entry:
 define <3 x double> @constrained_vector_fpext_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_fpext_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    cvtss2sd %xmm0, %xmm1
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    cvtss2sd %xmm0, %xmm0
-; CHECK-NEXT:    movss {{.*#+}} xmm2 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    cvtss2sd %xmm2, %xmm2
-; CHECK-NEXT:    movsd %xmm2, -{{[0-9]+}}(%rsp)
-; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [4.3E+1,0.0E+0]
+; CHECK-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}(%rip)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fpext_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vcvtss2sd %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vcvtss2sd %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vcvtss2sd %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vmovlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [4.2E+1,4.3E+1,4.4E+1,u]
 ; AVX-NEXT:    retq
 entry:
   %result = call <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32(
@@ -5854,13 +4086,13 @@ entry:
 define <4 x double> @constrained_vector_fpext_v4f32() #0 {
 ; CHECK-LABEL: constrained_vector_fpext_v4f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvtps2pd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    cvtps2pd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    movaps {{.*#+}} xmm0 = [4.2E+1,4.3E+1]
+; CHECK-NEXT:    movaps {{.*#+}} xmm1 = [4.4E+1,4.5E+1]
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_fpext_v4f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vcvtps2pd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
+; AVX-NEXT:    vmovaps {{.*#+}} ymm0 = [4.2E+1,4.3E+1,4.4E+1,4.5E+1]
 ; AVX-NEXT:    retq
 entry:
   %result = call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(
@@ -5928,37 +4160,31 @@ entry:
 define <3 x float> @constrained_vector_ceil_v3f32_var(ptr %a) #0 {
 ; CHECK-LABEL: constrained_vector_ceil_v3f32_var:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
-; CHECK-NEXT:    callq ceilf at PLT
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    callq ceilf at PLT
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
 ; CHECK-NEXT:    callq ceilf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq ceilf at PLT
-; CHECK-NEXT:    unpcklps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_ceil_v3f32_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $10, %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vroundss $10, %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vroundps $10, (%rdi), %xmm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x float>, ptr %a
@@ -5977,10 +4203,10 @@ define <3 x double> @constrained_vector_ceil_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq ceil at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq ceil at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
@@ -5989,9 +4215,9 @@ define <3 x double> @constrained_vector_ceil_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -5999,10 +4225,7 @@ define <3 x double> @constrained_vector_ceil_v3f64_var(ptr %a) #0 {
 ;
 ; AVX-LABEL: constrained_vector_ceil_v3f64_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $10, (%rdi), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $10, (%rdi), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x double>, ptr %a
@@ -6071,37 +4294,31 @@ entry:
 define <3 x float> @constrained_vector_floor_v3f32_var(ptr %a) #0 {
 ; CHECK-LABEL: constrained_vector_floor_v3f32_var:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
-; CHECK-NEXT:    callq floorf at PLT
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    callq floorf at PLT
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
 ; CHECK-NEXT:    callq floorf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq floorf at PLT
-; CHECK-NEXT:    unpcklps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_floor_v3f32_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $9, %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vroundss $9, %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vroundps $9, (%rdi), %xmm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x float>, ptr %a
@@ -6120,10 +4337,10 @@ define <3 x double> @constrained_vector_floor_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq floor at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq floor at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
@@ -6132,9 +4349,9 @@ define <3 x double> @constrained_vector_floor_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -6142,10 +4359,7 @@ define <3 x double> @constrained_vector_floor_v3f64_var(ptr %a) #0 {
 ;
 ; AVX-LABEL: constrained_vector_floor_v3f64_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $9, (%rdi), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $9, (%rdi), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x double>, ptr %a
@@ -6166,15 +4380,26 @@ define <1 x float> @constrained_vector_round_v1f32_var(ptr %a) #0 {
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
-; AVX-LABEL: constrained_vector_round_v1f32_var:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    callq roundf at PLT
-; AVX-NEXT:    popq %rax
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX1-LABEL: constrained_vector_round_v1f32_var:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; AVX1-NEXT:    vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; AVX1-NEXT:    vbroadcastss {{.*#+}} xmm2 = [4.9999997E-1,4.9999997E-1,4.9999997E-1,4.9999997E-1]
+; AVX1-NEXT:    vorps %xmm2, %xmm1, %xmm1
+; AVX1-NEXT:    vaddss %xmm1, %xmm0, %xmm0
+; AVX1-NEXT:    vroundss $11, %xmm0, %xmm0, %xmm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: constrained_vector_round_v1f32_var:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; AVX512-NEXT:    vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; AVX512-NEXT:    vandps %xmm1, %xmm0, %xmm1
+; AVX512-NEXT:    vbroadcastss {{.*#+}} xmm2 = [4.9999997E-1,4.9999997E-1,4.9999997E-1,4.9999997E-1]
+; AVX512-NEXT:    vorps %xmm2, %xmm1, %xmm1
+; AVX512-NEXT:    vaddss %xmm1, %xmm0, %xmm0
+; AVX512-NEXT:    vroundss $11, %xmm0, %xmm0, %xmm0
+; AVX512-NEXT:    retq
 entry:
   %b = load <1 x float>, ptr %a
   %round = call <1 x float> @llvm.experimental.constrained.round.v1f32(
@@ -6204,20 +4429,11 @@ define <2 x double> @constrained_vector_round_v2f64_var(ptr %a) #0 {
 ;
 ; AVX-LABEL: constrained_vector_round_v2f64_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vmovsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    callq round at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
-; AVX-NEXT:    # xmm0 = mem[0],zero
-; AVX-NEXT:    callq round at PLT
-; AVX-NEXT:    vunpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    addq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 8
+; AVX-NEXT:    vmovapd (%rdi), %xmm0
+; AVX-NEXT:    vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; AVX-NEXT:    vorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; AVX-NEXT:    vaddpd %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vroundpd $11, %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <2 x double>, ptr %a
@@ -6230,55 +4446,47 @@ entry:
 define <3 x float> @constrained_vector_round_v3f32_var(ptr %a) #0 {
 ; CHECK-LABEL: constrained_vector_round_v3f32_var:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
-; CHECK-NEXT:    callq roundf at PLT
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    callq roundf at PLT
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
 ; CHECK-NEXT:    callq roundf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq roundf at PLT
-; CHECK-NEXT:    unpcklps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
-; AVX-LABEL: constrained_vector_round_v3f32_var:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rbx
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    subq $48, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 64
-; AVX-NEXT:    .cfi_offset %rbx, -16
-; AVX-NEXT:    movq %rdi, %rbx
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    callq roundf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Spill
-; AVX-NEXT:    callq roundf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; AVX-NEXT:    vmovss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
-; AVX-NEXT:    # xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    callq roundf at PLT
-; AVX-NEXT:    vmovaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $48, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    popq %rbx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; AVX1-LABEL: constrained_vector_round_v3f32_var:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vmovaps (%rdi), %xmm0
+; AVX1-NEXT:    vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; AVX1-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; AVX1-NEXT:    vaddps %xmm1, %xmm0, %xmm0
+; AVX1-NEXT:    vroundps $11, %xmm0, %xmm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: constrained_vector_round_v3f32_var:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vmovaps (%rdi), %xmm0
+; AVX512-NEXT:    vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; AVX512-NEXT:    vandps %xmm1, %xmm0, %xmm1
+; AVX512-NEXT:    vbroadcastss {{.*#+}} xmm2 = [4.9999997E-1,4.9999997E-1,4.9999997E-1,4.9999997E-1]
+; AVX512-NEXT:    vorps %xmm2, %xmm1, %xmm1
+; AVX512-NEXT:    vaddps %xmm1, %xmm0, %xmm0
+; AVX512-NEXT:    vroundps $11, %xmm0, %xmm0
+; AVX512-NEXT:    retq
 entry:
   %b = load <3 x float>, ptr %a
   %round = call <3 x float> @llvm.experimental.constrained.round.v3f32(
@@ -6297,10 +4505,10 @@ define <3 x double> @constrained_vector_round_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq round at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq round at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
@@ -6309,43 +4517,33 @@ define <3 x double> @constrained_vector_round_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
-; CHECK-NEXT:    # xmm1 = mem[0],zero
-; CHECK-NEXT:    addq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 8
-; CHECK-NEXT:    retq
-;
-; AVX-LABEL: constrained_vector_round_v3f64_var:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    pushq %rbx
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    subq $48, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 64
-; AVX-NEXT:    .cfi_offset %rbx, -16
-; AVX-NEXT:    movq %rdi, %rbx
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vmovsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    callq round at PLT
-; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
-; AVX-NEXT:    # xmm0 = mem[0],zero
-; AVX-NEXT:    callq round at PLT
-; AVX-NEXT:    vunpcklpd (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0],mem[0]
-; AVX-NEXT:    vmovups %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vzeroupper
-; AVX-NEXT:    callq round at PLT
-; AVX-NEXT:    vmovups {{[-0-9]+}}(%r{{[sb]}}p), %ymm1 # 32-byte Reload
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; AVX-NEXT:    addq $48, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 16
-; AVX-NEXT:    popq %rbx
-; AVX-NEXT:    .cfi_def_cfa_offset 8
-; AVX-NEXT:    retq
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    # xmm1 = mem[0],zero
+; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    retq
+;
+; AVX1-LABEL: constrained_vector_round_v3f64_var:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vmovapd (%rdi), %ymm0
+; AVX1-NEXT:    vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm1
+; AVX1-NEXT:    vorpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
+; AVX1-NEXT:    vaddpd %ymm1, %ymm0, %ymm0
+; AVX1-NEXT:    vroundpd $11, %ymm0, %ymm0
+; AVX1-NEXT:    retq
+;
+; AVX512-LABEL: constrained_vector_round_v3f64_var:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vmovapd (%rdi), %ymm0
+; AVX512-NEXT:    vbroadcastsd {{.*#+}} ymm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
+; AVX512-NEXT:    vandpd %ymm1, %ymm0, %ymm1
+; AVX512-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [4.9999999999999994E-1,4.9999999999999994E-1,4.9999999999999994E-1,4.9999999999999994E-1]
+; AVX512-NEXT:    vorpd %ymm2, %ymm1, %ymm1
+; AVX512-NEXT:    vaddpd %ymm1, %ymm0, %ymm0
+; AVX512-NEXT:    vroundpd $11, %ymm0, %ymm0
+; AVX512-NEXT:    retq
 entry:
   %b = load <3 x double>, ptr %a
   %round = call <3 x double> @llvm.experimental.constrained.round.v3f64(
@@ -6412,37 +4610,31 @@ entry:
 define <3 x float> @constrained_vector_trunc_v3f32_var(ptr %a) #0 {
 ; CHECK-LABEL: constrained_vector_trunc_v3f32_var:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $56, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    subq $40, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
-; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
-; CHECK-NEXT:    callq truncf at PLT
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    callq truncf at PLT
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
 ; CHECK-NEXT:    callq truncf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movaps %xmm1, (%rsp) # 16-byte Spill
+; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq truncf at PLT
-; CHECK-NEXT:    unpcklps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0]
-; CHECK-NEXT:    addq $56, %rsp
+; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_trunc_v3f32_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $11, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; AVX-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
-; AVX-NEXT:    vroundss $11, %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vroundss $11, %xmm2, %xmm2, %xmm2
-; AVX-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    vroundps $11, (%rdi), %xmm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x float>, ptr %a
@@ -6461,10 +4653,10 @@ define <3 x double> @constrained_vector_trunc_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rdi), %xmm0
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq trunc at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
+; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    callq trunc at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
@@ -6473,9 +4665,9 @@ define <3 x double> @constrained_vector_trunc_v3f64_var(ptr %a) #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $40, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -6483,10 +4675,7 @@ define <3 x double> @constrained_vector_trunc_v3f64_var(ptr %a) #0 {
 ;
 ; AVX-LABEL: constrained_vector_trunc_v3f64_var:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
-; AVX-NEXT:    vroundsd $11, %xmm0, %xmm0, %xmm0
-; AVX-NEXT:    vroundpd $11, (%rdi), %xmm1
-; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
+; AVX-NEXT:    vroundpd $11, (%rdi), %ymm0
 ; AVX-NEXT:    retq
 entry:
   %b = load <3 x double>, ptr %a
@@ -6589,13 +4778,11 @@ entry:
 define <2 x float> @constrained_vector_sitofp_v2f32_v2i32(<2 x i32> %x) #0 {
 ; CHECK-LABEL: constrained_vector_sitofp_v2f32_v2i32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movq {{.*#+}} xmm0 = xmm0[0],zero
 ; CHECK-NEXT:    cvtdq2ps %xmm0, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sitofp_v2f32_v2i32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
 ; AVX-NEXT:    vcvtdq2ps %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
@@ -6639,7 +4826,7 @@ define <2 x double> @constrained_vector_sitofp_v2f64_v2i64(<2 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: constrained_vector_sitofp_v2f64_v2i64:
 ; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -6665,14 +4852,31 @@ define <2 x float> @constrained_vector_sitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
 ; CHECK-NEXT:    retq
 ;
-; AVX-LABEL: constrained_vector_sitofp_v2f32_v2i64:
-; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
-; AVX-NEXT:    vmovq %xmm0, %rax
-; AVX-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
-; AVX-NEXT:    retq
+; AVX1-LABEL: constrained_vector_sitofp_v2f32_v2i64:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
+; AVX1-NEXT:    vmovq %xmm0, %rax
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
+; AVX1-NEXT:    retq
+;
+; AVX512F-LABEL: constrained_vector_sitofp_v2f32_v2i64:
+; AVX512F:       # %bb.0: # %entry
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
+; AVX512F-NEXT:    retq
+;
+; AVX512DQ-LABEL: constrained_vector_sitofp_v2f32_v2i64:
+; AVX512DQ:       # %bb.0: # %entry
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtqq2ps %zmm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
+; AVX512DQ-NEXT:    vzeroupper
+; AVX512DQ-NEXT:    retq
 entry:
   %result = call <2 x float>
            @llvm.experimental.constrained.sitofp.v2f32.v2i64(<2 x i64> %x,
@@ -6684,32 +4888,20 @@ entry:
 define <3 x double> @constrained_vector_sitofp_v3f64_v3i32(<3 x i32> %x) #0 {
 ; CHECK-LABEL: constrained_vector_sitofp_v3f64_v3i32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    cvtsi2sd %eax, %xmm2
-; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]
-; CHECK-NEXT:    movd %xmm1, %eax
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2sd %eax, %xmm1
+; CHECK-NEXT:    cvtdq2pd %xmm0, %xmm2
 ; CHECK-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2sd %eax, %xmm0
-; CHECK-NEXT:    movsd %xmm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    cvtdq2pd %xmm0, %xmm0
+; CHECK-NEXT:    movlps %xmm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movaps %xmm2, %xmm1
+; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm2[1]
 ; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movapd %xmm2, %xmm0
+; CHECK-NEXT:    movaps %xmm2, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sitofp_v3f64_v3i32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vextractps $2, %xmm0, %eax
-; AVX-NEXT:    vcvtsi2sd %eax, %xmm15, %xmm1
-; AVX-NEXT:    vmovd %xmm0, %eax
-; AVX-NEXT:    vcvtsi2sd %eax, %xmm15, %xmm2
-; AVX-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX-NEXT:    vcvtsi2sd %eax, %xmm15, %xmm0
-; AVX-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; AVX-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX-NEXT:    vcvtdq2pd %xmm0, %ymm0
 ; AVX-NEXT:    retq
 entry:
   %result = call <3 x double>
@@ -6722,31 +4914,12 @@ entry:
 define <3 x float> @constrained_vector_sitofp_v3f32_v3i32(<3 x i32> %x) #0 {
 ; CHECK-LABEL: constrained_vector_sitofp_v3f32_v3i32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; CHECK-NEXT:    movd %xmm1, %eax
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2ss %eax, %xmm1
-; CHECK-NEXT:    pshufd {{.*#+}} xmm2 = xmm0[1,1,1,1]
-; CHECK-NEXT:    movd %xmm2, %eax
-; CHECK-NEXT:    xorps %xmm2, %xmm2
-; CHECK-NEXT:    cvtsi2ss %eax, %xmm2
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2ss %eax, %xmm0
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    cvtdq2ps %xmm0, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sitofp_v3f32_v3i32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    vextractps $2, %xmm0, %eax
-; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm1
-; AVX-NEXT:    vmovd %xmm0, %eax
-; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm2
-; AVX-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX-NEXT:    vcvtsi2ss %eax, %xmm15, %xmm0
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX-NEXT:    vcvtdq2ps %xmm0, %xmm0
 ; AVX-NEXT:    retq
 entry:
   %result = call <3 x float>
@@ -6759,8 +4932,8 @@ entry:
 define <3 x double> @constrained_vector_sitofp_v3f64_v3i64(<3 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_sitofp_v3f64_v3i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvtsi2sd %rsi, %xmm1
 ; CHECK-NEXT:    cvtsi2sd %rdi, %xmm0
+; CHECK-NEXT:    cvtsi2sd %rsi, %xmm1
 ; CHECK-NEXT:    cvtsi2sd %rdx, %xmm2
 ; CHECK-NEXT:    movsd %xmm2, -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
@@ -6770,28 +4943,41 @@ define <3 x double> @constrained_vector_sitofp_v3f64_v3i64(<3 x i64> %x) #0 {
 ; AVX1-LABEL: constrained_vector_sitofp_v3f64_v3i64:
 ; AVX1:       # %bb.0: # %entry
 ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
+; AVX1-NEXT:    vpextrq $1, %xmm1, %rax
+; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
 ; AVX1-NEXT:    vmovq %xmm1, %rax
 ; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
-; AVX1-NEXT:    vmovq %xmm0, %rax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
+; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
 ; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
+; AVX1-NEXT:    vmovq %xmm0, %rax
 ; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
+; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
 ; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
 ; AVX1-NEXT:    retq
 ;
-; AVX512-LABEL: constrained_vector_sitofp_v3f64_v3i64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512-NEXT:    vmovq %xmm1, %rax
-; AVX512-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovq %xmm0, %rax
-; AVX512-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
-; AVX512-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
-; AVX512-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; AVX512-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX512-NEXT:    retq
+; AVX512F-LABEL: constrained_vector_sitofp_v3f64_v3i64:
+; AVX512F:       # %bb.0: # %entry
+; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm1
+; AVX512F-NEXT:    vpextrq $1, %xmm1, %rax
+; AVX512F-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
+; AVX512F-NEXT:    vmovq %xmm1, %rax
+; AVX512F-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
+; AVX512F-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
+; AVX512F-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
+; AVX512F-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX512F-NEXT:    retq
+;
+; AVX512DQ-LABEL: constrained_vector_sitofp_v3f64_v3i64:
+; AVX512DQ:       # %bb.0: # %entry
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtqq2pd %zmm0, %zmm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
+; AVX512DQ-NEXT:    retq
 entry:
   %result = call <3 x double>
            @llvm.experimental.constrained.sitofp.v3f64.v3i64(<3 x i64> %x,
@@ -6803,40 +4989,55 @@ entry:
 define <3 x float> @constrained_vector_sitofp_v3f32_v3i64(<3 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_sitofp_v3f32_v3i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm1
+; CHECK-NEXT:    cvtsi2ss %rsi, %xmm1
 ; CHECK-NEXT:    cvtsi2ss %rdi, %xmm0
-; CHECK-NEXT:    cvtsi2ss %rsi, %xmm2
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; CHECK-NEXT:    xorps %xmm1, %xmm1
+; CHECK-NEXT:    cvtsi2ss %rdx, %xmm1
 ; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_sitofp_v3f32_v3i64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX1-NEXT:    vmovq %xmm1, %rax
+; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
 ; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
 ; AVX1-NEXT:    vmovq %xmm0, %rax
 ; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm2
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
+; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm0
+; AVX1-NEXT:    vmovq %xmm0, %rax
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm2
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
 ; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
 ; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
 ; AVX1-NEXT:    vzeroupper
 ; AVX1-NEXT:    retq
 ;
-; AVX512-LABEL: constrained_vector_sitofp_v3f32_v3i64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512-NEXT:    vmovq %xmm1, %rax
-; AVX512-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovq %xmm0, %rax
-; AVX512-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm2
-; AVX512-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX512F-LABEL: constrained_vector_sitofp_v3f32_v3i64:
+; AVX512F:       # %bb.0: # %entry
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm2
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
+; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm0
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm2
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
+; AVX512F-NEXT:    vzeroupper
+; AVX512F-NEXT:    retq
+;
+; AVX512DQ-LABEL: constrained_vector_sitofp_v3f32_v3i64:
+; AVX512DQ:       # %bb.0: # %entry
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtqq2ps %zmm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
+; AVX512DQ-NEXT:    vzeroupper
+; AVX512DQ-NEXT:    retq
 entry:
   %result = call <3 x float>
            @llvm.experimental.constrained.sitofp.v3f32.v3i64(<3 x i64> %x,
@@ -6939,7 +5140,7 @@ define <4 x double> @constrained_vector_sitofp_v4f64_v4i64(<4 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: constrained_vector_sitofp_v4f64_v4i64:
 ; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    retq
@@ -7009,7 +5210,7 @@ define <4 x float> @constrained_vector_sitofp_v4f32_v4i64(<4 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: constrained_vector_sitofp_v4f32_v4i64:
 ; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtqq2ps %zmm0, %ymm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -7075,34 +5276,21 @@ entry:
 define <1 x double> @constrained_vector_uitofp_v1f64_v1i64(<1 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v1f64_v1i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movq %rdi, %rax
-; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %edi, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rdi, %rdi
-; CHECK-NEXT:    cmovnsq %rdi, %rcx
-; CHECK-NEXT:    cvtsi2sd %rcx, %xmm0
-; CHECK-NEXT:    jns .LBB175_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addsd %xmm0, %xmm0
-; CHECK-NEXT:  .LBB175_2: # %entry
+; CHECK-NEXT:    movq %rdi, %xmm1
+; CHECK-NEXT:    punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
+; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; CHECK-NEXT:    movapd %xmm1, %xmm0
+; CHECK-NEXT:    unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
+; CHECK-NEXT:    addsd %xmm1, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v1f64_v1i64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    movq %rdi, %rax
-; AVX1-NEXT:    shrq %rax
-; AVX1-NEXT:    movl %edi, %ecx
-; AVX1-NEXT:    andl $1, %ecx
-; AVX1-NEXT:    orq %rax, %rcx
-; AVX1-NEXT:    testq %rdi, %rdi
-; AVX1-NEXT:    cmovnsq %rdi, %rcx
-; AVX1-NEXT:    vcvtsi2sd %rcx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB175_2
-; AVX1-NEXT:  # %bb.1:
-; AVX1-NEXT:    vaddsd %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB175_2: # %entry
+; AVX1-NEXT:    vmovq %rdi, %xmm0
+; AVX1-NEXT:    vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
+; AVX1-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-NEXT:    vshufpd {{.*#+}} xmm1 = xmm0[1,0]
+; AVX1-NEXT:    vaddsd %xmm0, %xmm1, %xmm0
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v1f64_v1i64:
@@ -7120,34 +5308,34 @@ entry:
 define <1 x float> @constrained_vector_uitofp_v1f32_v1i64(<1 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v1f32_v1i64:
 ; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    testq %rdi, %rdi
+; CHECK-NEXT:    js .LBB176_1
+; CHECK-NEXT:  # %bb.2: # %entry
+; CHECK-NEXT:    cvtsi2ss %rdi, %xmm0
+; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB176_1:
 ; CHECK-NEXT:    movq %rdi, %rax
 ; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %edi, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rdi, %rdi
-; CHECK-NEXT:    cmovnsq %rdi, %rcx
-; CHECK-NEXT:    cvtsi2ss %rcx, %xmm0
-; CHECK-NEXT:    jns .LBB176_2
-; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    andl $1, %edi
+; CHECK-NEXT:    orq %rax, %rdi
+; CHECK-NEXT:    cvtsi2ss %rdi, %xmm0
 ; CHECK-NEXT:    addss %xmm0, %xmm0
-; CHECK-NEXT:  .LBB176_2: # %entry
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v1f32_v1i64:
 ; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    testq %rdi, %rdi
+; AVX1-NEXT:    js .LBB176_1
+; AVX1-NEXT:  # %bb.2: # %entry
+; AVX1-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
+; AVX1-NEXT:    retq
+; AVX1-NEXT:  .LBB176_1:
 ; AVX1-NEXT:    movq %rdi, %rax
 ; AVX1-NEXT:    shrq %rax
-; AVX1-NEXT:    movl %edi, %ecx
-; AVX1-NEXT:    andl $1, %ecx
-; AVX1-NEXT:    orq %rax, %rcx
-; AVX1-NEXT:    testq %rdi, %rdi
-; AVX1-NEXT:    cmovnsq %rdi, %rcx
-; AVX1-NEXT:    vcvtsi2ss %rcx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB176_2
-; AVX1-NEXT:  # %bb.1:
+; AVX1-NEXT:    andl $1, %edi
+; AVX1-NEXT:    orq %rax, %rdi
+; AVX1-NEXT:    vcvtsi2ss %rdi, %xmm15, %xmm0
 ; AVX1-NEXT:    vaddss %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB176_2: # %entry
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v1f32_v1i64:
@@ -7183,7 +5371,7 @@ define <2 x double> @constrained_vector_uitofp_v2f64_v2i32(<2 x i32> %x) #0 {
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v2f64_v2i32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512-NEXT:    vcvtudq2pd %ymm0, %zmm0
 ; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512-NEXT:    vzeroupper
@@ -7219,7 +5407,7 @@ define <2 x float> @constrained_vector_uitofp_v2f32_v2i32(<2 x i32> %x) #0 {
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v2f32_v2i32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovq {{.*#+}} xmm0 = xmm0[0],zero
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512-NEXT:    vzeroupper
@@ -7235,82 +5423,40 @@ entry:
 define <2 x double> @constrained_vector_uitofp_v2f64_v2i64(<2 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v2f64_v2i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movdqa %xmm0, %xmm1
-; CHECK-NEXT:    movq %xmm0, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2sd %rdx, %xmm0
-; CHECK-NEXT:    jns .LBB179_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addsd %xmm0, %xmm0
-; CHECK-NEXT:  .LBB179_2: # %entry
-; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
-; CHECK-NEXT:    movq %xmm1, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2sd %rdx, %xmm1
-; CHECK-NEXT:    jns .LBB179_4
-; CHECK-NEXT:  # %bb.3:
-; CHECK-NEXT:    addsd %xmm1, %xmm1
-; CHECK-NEXT:  .LBB179_4: # %entry
-; CHECK-NEXT:    unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movdqa {{.*#+}} xmm1 = [4294967295,4294967295]
+; CHECK-NEXT:    pand %xmm0, %xmm1
+; CHECK-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; CHECK-NEXT:    psrlq $32, %xmm0
+; CHECK-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    addpd %xmm1, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v2f64_v2i64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm1
-; AVX1-NEXT:    jns .LBB179_2
-; AVX1-NEXT:  # %bb.1:
-; AVX1-NEXT:    vaddsd %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:  .LBB179_2: # %entry
-; AVX1-NEXT:    vmovq %xmm0, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB179_4
-; AVX1-NEXT:  # %bb.3:
-; AVX1-NEXT:    vaddsd %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB179_4: # %entry
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX1-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX1-NEXT:    vpblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3],xmm0[4,5],xmm1[6,7]
+; AVX1-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; AVX1-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX1-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
 ; AVX1-NEXT:    retq
 ;
 ; AVX512F-LABEL: constrained_vector_uitofp_v2f64_v2i64:
 ; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512F-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512F-NEXT:    vmovq %xmm0, %rax
-; AVX512F-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512F-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; AVX512F-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512F-NEXT:    vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
+; AVX512F-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; AVX512F-NEXT:    vpsrlq $32, %xmm0, %xmm0
+; AVX512F-NEXT:    vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX512F-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX512F-NEXT:    vaddpd %xmm0, %xmm1, %xmm0
 ; AVX512F-NEXT:    retq
 ;
 ; AVX512DQ-LABEL: constrained_vector_uitofp_v2f64_v2i64:
 ; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtuqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -7328,34 +5474,40 @@ define <2 x float> @constrained_vector_uitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movdqa %xmm0, %xmm1
 ; CHECK-NEXT:    movq %xmm0, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
 ; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
+; CHECK-NEXT:    js .LBB180_1
+; CHECK-NEXT:  # %bb.2: # %entry
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm0
-; CHECK-NEXT:    jns .LBB180_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addss %xmm0, %xmm0
-; CHECK-NEXT:  .LBB180_2: # %entry
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm0
 ; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
 ; CHECK-NEXT:    movq %xmm1, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    jns .LBB180_5
+; CHECK-NEXT:  .LBB180_4:
 ; CHECK-NEXT:    movq %rax, %rcx
 ; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
+; CHECK-NEXT:    andl $1, %eax
+; CHECK-NEXT:    orq %rcx, %rax
 ; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm1
-; CHECK-NEXT:    jns .LBB180_4
-; CHECK-NEXT:  # %bb.3:
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm1
 ; CHECK-NEXT:    addss %xmm1, %xmm1
-; CHECK-NEXT:  .LBB180_4: # %entry
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB180_1:
+; CHECK-NEXT:    movq %rax, %rcx
+; CHECK-NEXT:    shrq %rcx
+; CHECK-NEXT:    andl $1, %eax
+; CHECK-NEXT:    orq %rcx, %rax
+; CHECK-NEXT:    xorps %xmm0, %xmm0
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm0
+; CHECK-NEXT:    addss %xmm0, %xmm0
+; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
+; CHECK-NEXT:    movq %xmm1, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    js .LBB180_4
+; CHECK-NEXT:  .LBB180_5: # %entry
+; CHECK-NEXT:    xorps %xmm1, %xmm1
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm1
 ; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
 ; CHECK-NEXT:    retq
 ;
@@ -7377,14 +5529,22 @@ define <2 x float> @constrained_vector_uitofp_v2f32_v2i64(<2 x i64> %x) #0 {
 ; AVX1-NEXT:    vblendvps %xmm0, %xmm2, %xmm1, %xmm0
 ; AVX1-NEXT:    retq
 ;
-; AVX512-LABEL: constrained_vector_uitofp_v2f32_v2i64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovq %xmm0, %rax
-; AVX512-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
-; AVX512-NEXT:    retq
+; AVX512F-LABEL: constrained_vector_uitofp_v2f32_v2i64:
+; AVX512F:       # %bb.0: # %entry
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
+; AVX512F-NEXT:    retq
+;
+; AVX512DQ-LABEL: constrained_vector_uitofp_v2f32_v2i64:
+; AVX512DQ:       # %bb.0: # %entry
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtuqq2ps %zmm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
+; AVX512DQ-NEXT:    vzeroupper
+; AVX512DQ-NEXT:    retq
 entry:
   %result = call <2 x float>
            @llvm.experimental.constrained.uitofp.v2f32.v2i64(<2 x i64> %x,
@@ -7396,17 +5556,18 @@ entry:
 define <3 x double> @constrained_vector_uitofp_v3f64_v3i32(<3 x i32> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v3f64_v3i32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    cvtsi2sd %rax, %xmm2
-; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[1,1,1,1]
-; CHECK-NEXT:    movd %xmm1, %eax
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2sd %rax, %xmm1
-; CHECK-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2sd %rax, %xmm0
+; CHECK-NEXT:    xorpd %xmm1, %xmm1
+; CHECK-NEXT:    movapd %xmm0, %xmm2
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
+; CHECK-NEXT:    movapd {{.*#+}} xmm3 = [4.503599627370496E+15,4.503599627370496E+15]
+; CHECK-NEXT:    orpd %xmm3, %xmm2
+; CHECK-NEXT:    subpd %xmm3, %xmm2
+; CHECK-NEXT:    unpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; CHECK-NEXT:    orpd %xmm3, %xmm0
+; CHECK-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    movsd %xmm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movapd %xmm2, %xmm1
+; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm2[1]
 ; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    movapd %xmm2, %xmm0
@@ -7414,26 +5575,20 @@ define <3 x double> @constrained_vector_uitofp_v3f64_v3i32(<3 x i32> %x) #0 {
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v3f64_v3i32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vextractps $2, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
-; AVX1-NEXT:    vmovd %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
-; AVX1-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
+; AVX1-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX1-NEXT:    vpunpckhdq {{.*#+}} xmm1 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
+; AVX1-NEXT:    vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
 ; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX1-NEXT:    vbroadcastsd {{.*#+}} ymm1 = [4.503599627370496E+15,4.503599627370496E+15,4.503599627370496E+15,4.503599627370496E+15]
+; AVX1-NEXT:    vorpd %ymm1, %ymm0, %ymm0
+; AVX1-NEXT:    vsubpd %ymm1, %ymm0, %ymm0
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v3f64_v3i32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vextractps $2, %xmm0, %eax
-; AVX512-NEXT:    vcvtusi2sd %eax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovd %xmm0, %eax
-; AVX512-NEXT:    vcvtusi2sd %eax, %xmm15, %xmm2
-; AVX512-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX512-NEXT:    vcvtusi2sd %eax, %xmm15, %xmm0
-; AVX512-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; AVX512-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
+; AVX512-NEXT:    vcvtudq2pd %ymm0, %zmm0
+; AVX512-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512-NEXT:    retq
 entry:
   %result = call <3 x double>
@@ -7443,46 +5598,33 @@ entry:
   ret <3 x double> %result
 }
 
-define <3 x float> @constrained_vector_uitofp_v3f32_v3i32(<3 x i32> %x) #0 {
-; CHECK-LABEL: constrained_vector_uitofp_v3f32_v3i32:
-; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
-; CHECK-NEXT:    movd %xmm1, %eax
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2ss %rax, %xmm1
-; CHECK-NEXT:    pshufd {{.*#+}} xmm2 = xmm0[1,1,1,1]
-; CHECK-NEXT:    movd %xmm2, %eax
-; CHECK-NEXT:    xorps %xmm2, %xmm2
-; CHECK-NEXT:    cvtsi2ss %rax, %xmm2
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2ss %rax, %xmm0
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+define <3 x float> @constrained_vector_uitofp_v3f32_v3i32(<3 x i32> %x) #0 {
+; CHECK-LABEL: constrained_vector_uitofp_v3f32_v3i32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movdqa {{.*#+}} xmm1 = [65535,65535,65535,65535]
+; CHECK-NEXT:    pand %xmm0, %xmm1
+; CHECK-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
+; CHECK-NEXT:    psrld $16, %xmm0
+; CHECK-NEXT:    por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    subps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    addps %xmm1, %xmm0
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v3f32_v3i32:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vextractps $2, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
-; AVX1-NEXT:    vmovd %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm2
-; AVX1-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
-; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX1-NEXT:    vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
+; AVX1-NEXT:    vpsrld $16, %xmm0, %xmm0
+; AVX1-NEXT:    vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
+; AVX1-NEXT:    vsubps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX1-NEXT:    vaddps %xmm0, %xmm1, %xmm0
 ; AVX1-NEXT:    retq
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v3f32_v3i32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vextractps $2, %xmm0, %eax
-; AVX512-NEXT:    vcvtusi2ss %eax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovd %xmm0, %eax
-; AVX512-NEXT:    vcvtusi2ss %eax, %xmm15, %xmm2
-; AVX512-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX512-NEXT:    vcvtusi2ss %eax, %xmm15, %xmm0
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
+; AVX512-NEXT:    vcvtudq2ps %zmm0, %zmm0
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; AVX512-NEXT:    vzeroupper
 ; AVX512-NEXT:    retq
 entry:
   %result = call <3 x float>
@@ -7495,105 +5637,66 @@ entry:
 define <3 x double> @constrained_vector_uitofp_v3f64_v3i64(<3 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v3f64_v3i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movq %rdi, %rax
-; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %edi, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rdi, %rdi
-; CHECK-NEXT:    cmovnsq %rdi, %rcx
-; CHECK-NEXT:    cvtsi2sd %rcx, %xmm0
-; CHECK-NEXT:    jns .LBB183_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addsd %xmm0, %xmm0
-; CHECK-NEXT:  .LBB183_2: # %entry
-; CHECK-NEXT:    movq %rsi, %rax
-; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %esi, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rsi, %rsi
-; CHECK-NEXT:    cmovnsq %rsi, %rcx
-; CHECK-NEXT:    cvtsi2sd %rcx, %xmm1
-; CHECK-NEXT:    jns .LBB183_4
-; CHECK-NEXT:  # %bb.3:
-; CHECK-NEXT:    addsd %xmm1, %xmm1
-; CHECK-NEXT:  .LBB183_4: # %entry
-; CHECK-NEXT:    movq %rdx, %rax
-; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %edx, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rdx, %rdx
-; CHECK-NEXT:    cmovnsq %rdx, %rcx
-; CHECK-NEXT:    cvtsi2sd %rcx, %xmm2
-; CHECK-NEXT:    jns .LBB183_6
-; CHECK-NEXT:  # %bb.5:
-; CHECK-NEXT:    addsd %xmm2, %xmm2
-; CHECK-NEXT:  .LBB183_6: # %entry
-; CHECK-NEXT:    movsd %xmm2, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movq %rsi, %xmm1
+; CHECK-NEXT:    movq %rdi, %xmm0
+; CHECK-NEXT:    punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    movq %rdx, %xmm1
+; CHECK-NEXT:    movdqa {{.*#+}} xmm2 = [4294967295,4294967295]
+; CHECK-NEXT:    movdqa %xmm0, %xmm3
+; CHECK-NEXT:    pand %xmm2, %xmm3
+; CHECK-NEXT:    movdqa {{.*#+}} xmm4 = [4841369599423283200,4841369599423283200]
+; CHECK-NEXT:    por %xmm4, %xmm3
+; CHECK-NEXT:    psrlq $32, %xmm0
+; CHECK-NEXT:    movdqa {{.*#+}} xmm5 = [4985484787499139072,4985484787499139072]
+; CHECK-NEXT:    por %xmm5, %xmm0
+; CHECK-NEXT:    subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; CHECK-NEXT:    addpd %xmm3, %xmm0
+; CHECK-NEXT:    movdqa %xmm1, %xmm3
+; CHECK-NEXT:    psrlq $32, %xmm3
+; CHECK-NEXT:    por %xmm5, %xmm3
+; CHECK-NEXT:    pand %xmm2, %xmm1
+; CHECK-NEXT:    por %xmm4, %xmm1
+; CHECK-NEXT:    subsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm3
+; CHECK-NEXT:    addsd %xmm1, %xmm3
+; CHECK-NEXT:    movsd %xmm3, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movapd %xmm0, %xmm1
+; CHECK-NEXT:    unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
 ; CHECK-NEXT:    fldl -{{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v3f64_v3i64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX1-NEXT:    vmovq %xmm1, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm1
-; AVX1-NEXT:    jns .LBB183_2
-; AVX1-NEXT:  # %bb.1:
-; AVX1-NEXT:    vaddsd %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:  .LBB183_2: # %entry
-; AVX1-NEXT:    vmovq %xmm0, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm2
-; AVX1-NEXT:    jns .LBB183_4
-; AVX1-NEXT:  # %bb.3:
-; AVX1-NEXT:    vaddsd %xmm2, %xmm2, %xmm2
-; AVX1-NEXT:  .LBB183_4: # %entry
-; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2sd %rdx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB183_6
-; AVX1-NEXT:  # %bb.5:
-; AVX1-NEXT:    vaddsd %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB183_6: # %entry
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX1-NEXT:    vxorps %xmm1, %xmm1, %xmm1
+; AVX1-NEXT:    vblendps {{.*#+}} ymm2 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX1-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
+; AVX1-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
+; AVX1-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[0,2,1,3,4,6,5,7]
+; AVX1-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX1-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX1-NEXT:    vaddpd %ymm0, %ymm2, %ymm0
 ; AVX1-NEXT:    retq
 ;
-; AVX512-LABEL: constrained_vector_uitofp_v3f64_v3i64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512-NEXT:    vmovq %xmm1, %rax
-; AVX512-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovq %xmm0, %rax
-; AVX512-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm2[0],xmm0[0]
-; AVX512-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX512-NEXT:    retq
+; AVX512F-LABEL: constrained_vector_uitofp_v3f64_v3i64:
+; AVX512F:       # %bb.0: # %entry
+; AVX512F-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512F-NEXT:    vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX512F-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200]
+; AVX512F-NEXT:    vpor %ymm2, %ymm1, %ymm1
+; AVX512F-NEXT:    vpsrlq $32, %ymm0, %ymm0
+; AVX512F-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4985484787499139072,4985484787499139072,4985484787499139072,4985484787499139072]
+; AVX512F-NEXT:    vpor %ymm2, %ymm0, %ymm0
+; AVX512F-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
+; AVX512F-NEXT:    vsubpd %ymm2, %ymm0, %ymm0
+; AVX512F-NEXT:    vaddpd %ymm0, %ymm1, %ymm0
+; AVX512F-NEXT:    retq
+;
+; AVX512DQ-LABEL: constrained_vector_uitofp_v3f64_v3i64:
+; AVX512DQ:       # %bb.0: # %entry
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtuqq2pd %zmm0, %zmm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
+; AVX512DQ-NEXT:    retq
 entry:
   %result = call <3 x double>
            @llvm.experimental.constrained.uitofp.v3f64.v3i64(<3 x i64> %x,
@@ -7605,106 +5708,107 @@ entry:
 define <3 x float> @constrained_vector_uitofp_v3f32_v3i64(<3 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v3f32_v3i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movq %rdx, %rax
-; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %edx, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rdx, %rdx
-; CHECK-NEXT:    cmovnsq %rdx, %rcx
-; CHECK-NEXT:    cvtsi2ss %rcx, %xmm1
-; CHECK-NEXT:    jns .LBB184_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addss %xmm1, %xmm1
-; CHECK-NEXT:  .LBB184_2: # %entry
+; CHECK-NEXT:    testq %rsi, %rsi
+; CHECK-NEXT:    js .LBB184_1
+; CHECK-NEXT:  # %bb.2: # %entry
+; CHECK-NEXT:    cvtsi2ss %rsi, %xmm1
+; CHECK-NEXT:    testq %rdi, %rdi
+; CHECK-NEXT:    jns .LBB184_5
+; CHECK-NEXT:  .LBB184_4:
 ; CHECK-NEXT:    movq %rdi, %rax
 ; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %edi, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rdi, %rdi
-; CHECK-NEXT:    cmovnsq %rdi, %rcx
-; CHECK-NEXT:    cvtsi2ss %rcx, %xmm0
-; CHECK-NEXT:    jns .LBB184_4
-; CHECK-NEXT:  # %bb.3:
+; CHECK-NEXT:    andl $1, %edi
+; CHECK-NEXT:    orq %rax, %rdi
+; CHECK-NEXT:    cvtsi2ss %rdi, %xmm0
 ; CHECK-NEXT:    addss %xmm0, %xmm0
-; CHECK-NEXT:  .LBB184_4: # %entry
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; CHECK-NEXT:    testq %rdx, %rdx
+; CHECK-NEXT:    jns .LBB184_8
+; CHECK-NEXT:  .LBB184_7:
+; CHECK-NEXT:    movq %rdx, %rax
+; CHECK-NEXT:    shrq %rax
+; CHECK-NEXT:    andl $1, %edx
+; CHECK-NEXT:    orq %rax, %rdx
+; CHECK-NEXT:    xorps %xmm1, %xmm1
+; CHECK-NEXT:    cvtsi2ss %rdx, %xmm1
+; CHECK-NEXT:    addss %xmm1, %xmm1
+; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
+; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB184_1:
 ; CHECK-NEXT:    movq %rsi, %rax
 ; CHECK-NEXT:    shrq %rax
-; CHECK-NEXT:    movl %esi, %ecx
-; CHECK-NEXT:    andl $1, %ecx
-; CHECK-NEXT:    orq %rax, %rcx
-; CHECK-NEXT:    testq %rsi, %rsi
-; CHECK-NEXT:    cmovnsq %rsi, %rcx
-; CHECK-NEXT:    cvtsi2ss %rcx, %xmm2
-; CHECK-NEXT:    jns .LBB184_6
-; CHECK-NEXT:  # %bb.5:
-; CHECK-NEXT:    addss %xmm2, %xmm2
-; CHECK-NEXT:  .LBB184_6: # %entry
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
+; CHECK-NEXT:    andl $1, %esi
+; CHECK-NEXT:    orq %rax, %rsi
+; CHECK-NEXT:    cvtsi2ss %rsi, %xmm1
+; CHECK-NEXT:    addss %xmm1, %xmm1
+; CHECK-NEXT:    testq %rdi, %rdi
+; CHECK-NEXT:    js .LBB184_4
+; CHECK-NEXT:  .LBB184_5: # %entry
+; CHECK-NEXT:    cvtsi2ss %rdi, %xmm0
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; CHECK-NEXT:    testq %rdx, %rdx
+; CHECK-NEXT:    js .LBB184_7
+; CHECK-NEXT:  .LBB184_8: # %entry
+; CHECK-NEXT:    xorps %xmm1, %xmm1
+; CHECK-NEXT:    cvtsi2ss %rdx, %xmm1
 ; CHECK-NEXT:    movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v3f32_v3i64:
 ; AVX1:       # %bb.0: # %entry
 ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX1-NEXT:    vmovq %xmm1, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2ss %rdx, %xmm15, %xmm1
-; AVX1-NEXT:    jns .LBB184_2
-; AVX1-NEXT:  # %bb.1:
-; AVX1-NEXT:    vaddss %xmm1, %xmm1, %xmm1
-; AVX1-NEXT:  .LBB184_2: # %entry
-; AVX1-NEXT:    vmovq %xmm0, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2ss %rdx, %xmm15, %xmm2
-; AVX1-NEXT:    jns .LBB184_4
-; AVX1-NEXT:  # %bb.3:
-; AVX1-NEXT:    vaddss %xmm2, %xmm2, %xmm2
-; AVX1-NEXT:  .LBB184_4: # %entry
+; AVX1-NEXT:    vpxor %xmm2, %xmm2, %xmm2
+; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm2, %xmm3
+; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm2, %xmm2
+; AVX1-NEXT:    vpackssdw %xmm3, %xmm2, %xmm2
+; AVX1-NEXT:    vpsrlq $1, %xmm0, %xmm3
+; AVX1-NEXT:    vpsrlq $1, %xmm1, %xmm4
+; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm3, %ymm3
+; AVX1-NEXT:    vandpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm4
+; AVX1-NEXT:    vorpd %ymm4, %ymm3, %ymm3
+; AVX1-NEXT:    vblendvpd %xmm0, %xmm3, %xmm0, %xmm0
 ; AVX1-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX1-NEXT:    movq %rax, %rcx
-; AVX1-NEXT:    shrq %rcx
-; AVX1-NEXT:    movl %eax, %edx
-; AVX1-NEXT:    andl $1, %edx
-; AVX1-NEXT:    orq %rcx, %rdx
-; AVX1-NEXT:    testq %rax, %rax
-; AVX1-NEXT:    cmovnsq %rax, %rdx
-; AVX1-NEXT:    vcvtsi2ss %rdx, %xmm15, %xmm0
-; AVX1-NEXT:    jns .LBB184_6
-; AVX1-NEXT:  # %bb.5:
-; AVX1-NEXT:    vaddss %xmm0, %xmm0, %xmm0
-; AVX1-NEXT:  .LBB184_6: # %entry
-; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm4
+; AVX1-NEXT:    vmovq %xmm0, %rax
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm0
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[2,3]
+; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm3
+; AVX1-NEXT:    vblendvpd %xmm1, %xmm3, %xmm1, %xmm1
+; AVX1-NEXT:    vmovq %xmm1, %rax
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm3
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm3[0],xmm0[3]
+; AVX1-NEXT:    vpextrq $1, %xmm1, %rax
+; AVX1-NEXT:    vcvtsi2ss %rax, %xmm15, %xmm1
+; AVX1-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]
+; AVX1-NEXT:    vaddps %xmm0, %xmm0, %xmm1
+; AVX1-NEXT:    vblendvps %xmm2, %xmm1, %xmm0, %xmm0
 ; AVX1-NEXT:    vzeroupper
 ; AVX1-NEXT:    retq
 ;
-; AVX512-LABEL: constrained_vector_uitofp_v3f32_v3i64:
-; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512-NEXT:    vmovq %xmm1, %rax
-; AVX512-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
-; AVX512-NEXT:    vmovq %xmm0, %rax
-; AVX512-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm2
-; AVX512-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm2[0],xmm0[0],xmm2[2,3]
-; AVX512-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0],xmm0[3]
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX512F-LABEL: constrained_vector_uitofp_v3f32_v3i64:
+; AVX512F:       # %bb.0: # %entry
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm1
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm2
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
+; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm0
+; AVX512F-NEXT:    vmovq %xmm0, %rax
+; AVX512F-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm2
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
+; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
+; AVX512F-NEXT:    vcvtusi2ss %rax, %xmm15, %xmm0
+; AVX512F-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
+; AVX512F-NEXT:    vzeroupper
+; AVX512F-NEXT:    retq
+;
+; AVX512DQ-LABEL: constrained_vector_uitofp_v3f32_v3i64:
+; AVX512DQ:       # %bb.0: # %entry
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512DQ-NEXT:    vcvtuqq2ps %zmm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
+; AVX512DQ-NEXT:    vzeroupper
+; AVX512DQ-NEXT:    retq
 entry:
   %result = call <3 x float>
            @llvm.experimental.constrained.uitofp.v3f32.v3i64(<3 x i64> %x,
@@ -7716,15 +5820,15 @@ entry:
 define <4 x double> @constrained_vector_uitofp_v4f64_v4i32(<4 x i32> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v4f64_v4i32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xorpd %xmm2, %xmm2
 ; CHECK-NEXT:    movapd %xmm0, %xmm1
-; CHECK-NEXT:    unpckhps {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
-; CHECK-NEXT:    movapd {{.*#+}} xmm3 = [4.503599627370496E+15,4.503599627370496E+15]
-; CHECK-NEXT:    orpd %xmm3, %xmm1
-; CHECK-NEXT:    subpd %xmm3, %xmm1
+; CHECK-NEXT:    xorpd %xmm2, %xmm2
 ; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
+; CHECK-NEXT:    movapd {{.*#+}} xmm3 = [4.503599627370496E+15,4.503599627370496E+15]
 ; CHECK-NEXT:    orpd %xmm3, %xmm0
 ; CHECK-NEXT:    subpd %xmm3, %xmm0
+; CHECK-NEXT:    unpckhps {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
+; CHECK-NEXT:    orpd %xmm3, %xmm1
+; CHECK-NEXT:    subpd %xmm3, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v4f64_v4i32:
@@ -7740,7 +5844,7 @@ define <4 x double> @constrained_vector_uitofp_v4f64_v4i32(<4 x i32> %x) #0 {
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v4f64_v4i32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 def $ymm0
 ; AVX512-NEXT:    vcvtudq2pd %ymm0, %zmm0
 ; AVX512-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512-NEXT:    retq
@@ -7775,7 +5879,7 @@ define <4 x float> @constrained_vector_uitofp_v4f32_v4i32(<4 x i32> %x) #0 {
 ;
 ; AVX512-LABEL: constrained_vector_uitofp_v4f32_v4i32:
 ; AVX512:       # %bb.0: # %entry
-; AVX512-NEXT:    vmovaps %xmm0, %xmm0
+; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 def $zmm0
 ; AVX512-NEXT:    vcvtudq2ps %zmm0, %zmm0
 ; AVX512-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
 ; AVX512-NEXT:    vzeroupper
@@ -7791,118 +5895,54 @@ entry:
 define <4 x double> @constrained_vector_uitofp_v4f64_v4i64(<4 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v4f64_v4i64:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movdqa %xmm0, %xmm2
-; CHECK-NEXT:    movq %xmm0, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2sd %rdx, %xmm0
-; CHECK-NEXT:    jns .LBB187_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addsd %xmm0, %xmm0
-; CHECK-NEXT:  .LBB187_2: # %entry
-; CHECK-NEXT:    pshufd {{.*#+}} xmm2 = xmm2[2,3,2,3]
-; CHECK-NEXT:    movq %xmm2, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    cvtsi2sd %rdx, %xmm3
-; CHECK-NEXT:    jns .LBB187_4
-; CHECK-NEXT:  # %bb.3:
-; CHECK-NEXT:    addsd %xmm3, %xmm3
-; CHECK-NEXT:  .LBB187_4: # %entry
-; CHECK-NEXT:    movq %xmm1, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    xorps %xmm2, %xmm2
-; CHECK-NEXT:    cvtsi2sd %rdx, %xmm2
-; CHECK-NEXT:    jns .LBB187_6
-; CHECK-NEXT:  # %bb.5:
-; CHECK-NEXT:    addsd %xmm2, %xmm2
-; CHECK-NEXT:  .LBB187_6: # %entry
-; CHECK-NEXT:    unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
-; CHECK-NEXT:    movq %xmm1, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2sd %rdx, %xmm1
-; CHECK-NEXT:    jns .LBB187_8
-; CHECK-NEXT:  # %bb.7:
-; CHECK-NEXT:    addsd %xmm1, %xmm1
-; CHECK-NEXT:  .LBB187_8: # %entry
-; CHECK-NEXT:    unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
-; CHECK-NEXT:    movapd %xmm2, %xmm1
+; CHECK-NEXT:    movdqa {{.*#+}} xmm2 = [4294967295,4294967295]
+; CHECK-NEXT:    movdqa %xmm0, %xmm3
+; CHECK-NEXT:    pand %xmm2, %xmm3
+; CHECK-NEXT:    movdqa {{.*#+}} xmm4 = [4841369599423283200,4841369599423283200]
+; CHECK-NEXT:    por %xmm4, %xmm3
+; CHECK-NEXT:    psrlq $32, %xmm0
+; CHECK-NEXT:    movdqa {{.*#+}} xmm5 = [4985484787499139072,4985484787499139072]
+; CHECK-NEXT:    por %xmm5, %xmm0
+; CHECK-NEXT:    movapd {{.*#+}} xmm6 = [1.9342813118337666E+25,1.9342813118337666E+25]
+; CHECK-NEXT:    subpd %xmm6, %xmm0
+; CHECK-NEXT:    addpd %xmm3, %xmm0
+; CHECK-NEXT:    pand %xmm1, %xmm2
+; CHECK-NEXT:    por %xmm4, %xmm2
+; CHECK-NEXT:    psrlq $32, %xmm1
+; CHECK-NEXT:    por %xmm5, %xmm1
+; CHECK-NEXT:    subpd %xmm6, %xmm1
+; CHECK-NEXT:    addpd %xmm2, %xmm1
 ; CHECK-NEXT:    retq
 ;
 ; AVX1-LABEL: constrained_vector_uitofp_v4f64_v4i64:
 ; AVX1:       # %bb.0: # %entry
-; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; AVX1-NEXT:    vpextrd $2, %xmm1, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm2
-; AVX1-NEXT:    vmovd %xmm1, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm2 = xmm3[0],xmm2[0]
-; AVX1-NEXT:    vextractps $2, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-NEXT:    vmovq %xmm0, %rax
-; AVX1-NEXT:    movl %eax, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm4
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm3 = xmm4[0],xmm3[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
-; AVX1-NEXT:    vpextrd $3, %xmm1, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-NEXT:    vpextrd $1, %xmm1, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm1
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
-; AVX1-NEXT:    vpextrd $3, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm3
-; AVX1-NEXT:    vpextrd $1, %xmm0, %eax
-; AVX1-NEXT:    vcvtsi2sd %rax, %xmm15, %xmm0
-; AVX1-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm3[0]
-; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
-; AVX1-NEXT:    vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
-; AVX1-NEXT:    vaddpd %ymm2, %ymm0, %ymm0
+; AVX1-NEXT:    vxorps %xmm1, %xmm1, %xmm1
+; AVX1-NEXT:    vblendps {{.*#+}} ymm2 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX1-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
+; AVX1-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
+; AVX1-NEXT:    vshufps {{.*#+}} ymm0 = ymm0[0,2,1,3,4,6,5,7]
+; AVX1-NEXT:    vorps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX1-NEXT:    vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX1-NEXT:    vaddpd %ymm0, %ymm2, %ymm0
 ; AVX1-NEXT:    retq
 ;
 ; AVX512F-LABEL: constrained_vector_uitofp_v4f64_v4i64:
 ; AVX512F:       # %bb.0: # %entry
-; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512F-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512F-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512F-NEXT:    vmovq %xmm1, %rax
-; AVX512F-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm1
-; AVX512F-NEXT:    vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rax
-; AVX512F-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm2
-; AVX512F-NEXT:    vmovq %xmm0, %rax
-; AVX512F-NEXT:    vcvtusi2sd %rax, %xmm15, %xmm0
-; AVX512F-NEXT:    vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
-; AVX512F-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
+; AVX512F-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; AVX512F-NEXT:    vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
+; AVX512F-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200]
+; AVX512F-NEXT:    vpor %ymm2, %ymm1, %ymm1
+; AVX512F-NEXT:    vpsrlq $32, %ymm0, %ymm0
+; AVX512F-NEXT:    vpbroadcastq {{.*#+}} ymm2 = [4985484787499139072,4985484787499139072,4985484787499139072,4985484787499139072]
+; AVX512F-NEXT:    vpor %ymm2, %ymm0, %ymm0
+; AVX512F-NEXT:    vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
+; AVX512F-NEXT:    vsubpd %ymm2, %ymm0, %ymm0
+; AVX512F-NEXT:    vaddpd %ymm0, %ymm1, %ymm0
 ; AVX512F-NEXT:    retq
 ;
 ; AVX512DQ-LABEL: constrained_vector_uitofp_v4f64_v4i64:
 ; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtuqq2pd %zmm0, %zmm0
 ; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 killed $zmm0
 ; AVX512DQ-NEXT:    retq
@@ -7918,62 +5958,73 @@ define <4 x float> @constrained_vector_uitofp_v4f32_v4i64(<4 x i64> %x) #0 {
 ; CHECK-LABEL: constrained_vector_uitofp_v4f32_v4i64:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movq %xmm1, %rax
-; CHECK-NEXT:    movq %rax, %rcx
-; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
 ; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm2
-; CHECK-NEXT:    jns .LBB188_2
-; CHECK-NEXT:  # %bb.1:
-; CHECK-NEXT:    addss %xmm2, %xmm2
-; CHECK-NEXT:  .LBB188_2: # %entry
+; CHECK-NEXT:    js .LBB188_1
+; CHECK-NEXT:  # %bb.2: # %entry
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm2
 ; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
 ; CHECK-NEXT:    movq %xmm1, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    jns .LBB188_5
+; CHECK-NEXT:  .LBB188_4:
 ; CHECK-NEXT:    movq %rax, %rcx
 ; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm3
-; CHECK-NEXT:    jns .LBB188_4
-; CHECK-NEXT:  # %bb.3:
+; CHECK-NEXT:    andl $1, %eax
+; CHECK-NEXT:    orq %rcx, %rax
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm3
 ; CHECK-NEXT:    addss %xmm3, %xmm3
-; CHECK-NEXT:  .LBB188_4: # %entry
 ; CHECK-NEXT:    movq %xmm0, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    jns .LBB188_8
+; CHECK-NEXT:  .LBB188_7:
 ; CHECK-NEXT:    movq %rax, %rcx
 ; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
+; CHECK-NEXT:    andl $1, %eax
+; CHECK-NEXT:    orq %rcx, %rax
 ; CHECK-NEXT:    xorps %xmm1, %xmm1
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm1
-; CHECK-NEXT:    jns .LBB188_6
-; CHECK-NEXT:  # %bb.5:
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm1
 ; CHECK-NEXT:    addss %xmm1, %xmm1
-; CHECK-NEXT:  .LBB188_6: # %entry
+; CHECK-NEXT:    jmp .LBB188_9
+; CHECK-NEXT:  .LBB188_1:
+; CHECK-NEXT:    movq %rax, %rcx
+; CHECK-NEXT:    shrq %rcx
+; CHECK-NEXT:    andl $1, %eax
+; CHECK-NEXT:    orq %rcx, %rax
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm2
+; CHECK-NEXT:    addss %xmm2, %xmm2
+; CHECK-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
+; CHECK-NEXT:    movq %xmm1, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    js .LBB188_4
+; CHECK-NEXT:  .LBB188_5: # %entry
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm3
+; CHECK-NEXT:    movq %xmm0, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    js .LBB188_7
+; CHECK-NEXT:  .LBB188_8: # %entry
+; CHECK-NEXT:    xorps %xmm1, %xmm1
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm1
+; CHECK-NEXT:  .LBB188_9: # %entry
 ; CHECK-NEXT:    unpcklps {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1]
 ; CHECK-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
 ; CHECK-NEXT:    movq %xmm0, %rax
+; CHECK-NEXT:    testq %rax, %rax
+; CHECK-NEXT:    js .LBB188_10
+; CHECK-NEXT:  # %bb.11: # %entry
+; CHECK-NEXT:    xorps %xmm0, %xmm0
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm0
+; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
+; CHECK-NEXT:    movaps %xmm1, %xmm0
+; CHECK-NEXT:    retq
+; CHECK-NEXT:  .LBB188_10:
 ; CHECK-NEXT:    movq %rax, %rcx
 ; CHECK-NEXT:    shrq %rcx
-; CHECK-NEXT:    movl %eax, %edx
-; CHECK-NEXT:    andl $1, %edx
-; CHECK-NEXT:    orq %rcx, %rdx
-; CHECK-NEXT:    testq %rax, %rax
-; CHECK-NEXT:    cmovnsq %rax, %rdx
+; CHECK-NEXT:    andl $1, %eax
+; CHECK-NEXT:    orq %rcx, %rax
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    cvtsi2ss %rdx, %xmm0
-; CHECK-NEXT:    jns .LBB188_8
-; CHECK-NEXT:  # %bb.7:
+; CHECK-NEXT:    cvtsi2ss %rax, %xmm0
 ; CHECK-NEXT:    addss %xmm0, %xmm0
-; CHECK-NEXT:  .LBB188_8: # %entry
 ; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
 ; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
@@ -8029,7 +6080,7 @@ define <4 x float> @constrained_vector_uitofp_v4f32_v4i64(<4 x i64> %x) #0 {
 ;
 ; AVX512DQ-LABEL: constrained_vector_uitofp_v4f32_v4i64:
 ; AVX512DQ:       # %bb.0: # %entry
-; AVX512DQ-NEXT:    vmovaps %ymm0, %ymm0
+; AVX512DQ-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
 ; AVX512DQ-NEXT:    vcvtuqq2ps %zmm0, %ymm0
 ; AVX512DQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; AVX512DQ-NEXT:    vzeroupper
@@ -8092,8 +6143,7 @@ define <16 x float> @vpaddd_mask_test(<16 x float> %i, <16 x float> %j, <16 x i3
 ; AVX512-LABEL: vpaddd_mask_test:
 ; AVX512:       # %bb.0:
 ; AVX512-NEXT:    vptestmd %zmm2, %zmm2, %k1
-; AVX512-NEXT:    vaddps %zmm1, %zmm0, %zmm1
-; AVX512-NEXT:    vmovaps %zmm1, %zmm0 {%k1}
+; AVX512-NEXT:    vaddps %zmm1, %zmm0, %zmm0 {%k1}
 ; AVX512-NEXT:    retq
   %mask = icmp ne <16 x i32> %mask1, zeroinitializer
   %x = call <16 x float> @llvm.experimental.constrained.fadd.v16f32(<16 x float> %i, <16 x float> %j, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
@@ -8170,42 +6220,42 @@ entry:
 define <3 x float> @constrained_vector_tan_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_tan_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq tanf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq tanf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq tanf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_tan_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq tanf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq tanf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq tanf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -8221,10 +6271,10 @@ define <3 x double> @constrained_vector_tan_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq tan at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq tan at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -8232,9 +6282,9 @@ define <3 x double> @constrained_vector_tan_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -8393,44 +6443,44 @@ entry:
 }
 
 define <3 x float> @constrained_vector_acos_v3f32() #0 {
-; CHECK-LABEL: constrained_vector_acos_v3f32:
-; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-LABEL: constrained_vector_acos_v3f32:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq acosf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq acosf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq acosf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_acos_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq acosf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq acosf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq acosf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -8446,10 +6496,10 @@ define <3 x double> @constrained_vector_acos_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq acos at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq acos at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -8457,9 +6507,9 @@ define <3 x double> @constrained_vector_acos_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -8620,42 +6670,42 @@ entry:
 define <3 x float> @constrained_vector_asin_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_asin_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq asinf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq asinf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq asinf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_asin_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq asinf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq asinf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq asinf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -8671,10 +6721,10 @@ define <3 x double> @constrained_vector_asin_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq asin at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq asin at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -8682,9 +6732,9 @@ define <3 x double> @constrained_vector_asin_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -8845,42 +6895,42 @@ entry:
 define <3 x float> @constrained_vector_atan_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_atan_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq atanf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq atanf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq atanf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_atan_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq atanf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq atanf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq atanf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -8896,10 +6946,10 @@ define <3 x double> @constrained_vector_atan_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq atan at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq atan at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -8907,9 +6957,9 @@ define <3 x double> @constrained_vector_atan_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -9078,48 +7128,48 @@ entry:
 define <3 x float> @constrained_vector_atan2_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_atan2_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [2.5E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm1 = [2.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq atan2f at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    movss {{.*#+}} xmm1 = [2.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq atan2f at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; CHECK-NEXT:    movss {{.*#+}} xmm1 = [2.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm1 = [2.5E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq atan2f at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_atan2_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [2.5E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [2.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq atan2f at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [2.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq atan2f at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
-; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [2.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm1 = [2.5E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq atan2f at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -9136,13 +7186,13 @@ define <3 x double> @constrained_vector_atan2_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [2.3100000000000001E+1,0.0E+0]
-; CHECK-NEXT:    callq atan2 at PLT
-; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [2.3E+1,0.0E+0]
 ; CHECK-NEXT:    callq atan2 at PLT
+; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [2.3100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    callq atan2 at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
 ; CHECK-NEXT:    movsd {{.*#+}} xmm1 = [2.3199999999999999E+1,0.0E+0]
@@ -9150,9 +7200,9 @@ define <3 x double> @constrained_vector_atan2_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -9327,42 +7377,42 @@ entry:
 define <3 x float> @constrained_vector_cosh_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_cosh_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq coshf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq coshf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq coshf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_cosh_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq coshf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq coshf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq coshf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -9378,10 +7428,10 @@ define <3 x double> @constrained_vector_cosh_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq cosh at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq cosh at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -9389,9 +7439,9 @@ define <3 x double> @constrained_vector_cosh_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -9552,42 +7602,42 @@ entry:
 define <3 x float> @constrained_vector_sinh_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_sinh_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq sinhf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq sinhf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq sinhf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_sinh_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq sinhf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq sinhf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq sinhf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -9603,10 +7653,10 @@ define <3 x double> @constrained_vector_sinh_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq sinh at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq sinh at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -9614,9 +7664,9 @@ define <3 x double> @constrained_vector_sinh_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
@@ -9777,42 +7827,42 @@ entry:
 define <3 x float> @constrained_vector_tanh_v3f32() #0 {
 ; CHECK-LABEL: constrained_vector_tanh_v3f32:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    subq $40, %rsp
-; CHECK-NEXT:    .cfi_def_cfa_offset 48
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    subq $24, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq tanhf at PLT
-; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
 ; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq tanhf at PLT
+; CHECK-NEXT:    unpcklps (%rsp), %xmm0 # 16-byte Folded Reload
+; CHECK-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movaps %xmm0, (%rsp) # 16-byte Spill
-; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; CHECK-NEXT:    callq tanhf at PLT
 ; CHECK-NEXT:    movaps (%rsp), %xmm1 # 16-byte Reload
-; CHECK-NEXT:    unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; CHECK-NEXT:    unpcklpd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Folded Reload
-; CHECK-NEXT:    # xmm1 = xmm1[0],mem[0]
+; CHECK-NEXT:    movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
 ; CHECK-NEXT:    movaps %xmm1, %xmm0
-; CHECK-NEXT:    addq $40, %rsp
+; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
 ; CHECK-NEXT:    retq
 ;
 ; AVX-LABEL: constrained_vector_tanh_v3f32:
 ; AVX:       # %bb.0: # %entry
-; AVX-NEXT:    subq $40, %rsp
-; AVX-NEXT:    .cfi_def_cfa_offset 48
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    subq $24, %rsp
+; AVX-NEXT:    .cfi_def_cfa_offset 32
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq tanhf at PLT
-; AVX-NEXT:    vmovaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.2E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq tanhf at PLT
+; AVX-NEXT:    vinsertps $16, (%rsp), %xmm0, %xmm0 # 16-byte Folded Reload
+; AVX-NEXT:    # xmm0 = xmm0[0],mem[0],xmm0[2,3]
 ; AVX-NEXT:    vmovaps %xmm0, (%rsp) # 16-byte Spill
-; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.3E+1,0.0E+0,0.0E+0,0.0E+0]
+; AVX-NEXT:    vmovss {{.*#+}} xmm0 = [4.4E+1,0.0E+0,0.0E+0,0.0E+0]
 ; AVX-NEXT:    callq tanhf at PLT
 ; AVX-NEXT:    vmovaps (%rsp), %xmm1 # 16-byte Reload
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0],xmm0[0],xmm1[2,3]
-; AVX-NEXT:    vinsertps $32, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0, %xmm0 # 16-byte Folded Reload
-; AVX-NEXT:    # xmm0 = xmm0[0,1],mem[0],xmm0[3]
-; AVX-NEXT:    addq $40, %rsp
+; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
+; AVX-NEXT:    addq $24, %rsp
 ; AVX-NEXT:    .cfi_def_cfa_offset 8
 ; AVX-NEXT:    retq
 entry:
@@ -9828,10 +7878,10 @@ define <3 x double> @constrained_vector_tanh_v3f64() #0 {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 32
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
 ; CHECK-NEXT:    callq tanh at PLT
 ; CHECK-NEXT:    movsd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2E+1,0.0E+0]
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2100000000000001E+1,0.0E+0]
 ; CHECK-NEXT:    callq tanh at PLT
 ; CHECK-NEXT:    movsd %xmm0, (%rsp) # 8-byte Spill
 ; CHECK-NEXT:    movsd {{.*#+}} xmm0 = [4.2200000000000003E+1,0.0E+0]
@@ -9839,9 +7889,9 @@ define <3 x double> @constrained_vector_tanh_v3f64() #0 {
 ; CHECK-NEXT:    movsd %xmm0, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    fldl {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    wait
-; CHECK-NEXT:    movsd (%rsp), %xmm0 # 8-byte Reload
+; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 8-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero
-; CHECK-NEXT:    movsd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 8-byte Reload
+; CHECK-NEXT:    movsd (%rsp), %xmm1 # 8-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero
 ; CHECK-NEXT:    addq $24, %rsp
 ; CHECK-NEXT:    .cfi_def_cfa_offset 8
diff --git a/llvm/test/CodeGen/X86/vector-half-conversions.ll b/llvm/test/CodeGen/X86/vector-half-conversions.ll
index 8048985ccb1ff..582dcb7cb251e 100644
--- a/llvm/test/CodeGen/X86/vector-half-conversions.ll
+++ b/llvm/test/CodeGen/X86/vector-half-conversions.ll
@@ -439,13 +439,11 @@ define <2 x float> @cvt_2i16_to_2f32_constrained(<2 x i16> %a0) nounwind strictf
 ;
 ; F16C-LABEL: cvt_2i16_to_2f32_constrained:
 ; F16C:       # %bb.0:
-; F16C-NEXT:    vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
 ; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: cvt_2i16_to_2f32_constrained:
 ; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
 ; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX512-NEXT:    retq
   %1 = bitcast <2 x i16> %a0 to <2 x half>
@@ -778,9 +776,10 @@ define <16 x float> @cvt_16i16_to_16f32_constrained(<16 x i16> %a0) nounwind str
 ;
 ; F16C-LABEL: cvt_16i16_to_16f32_constrained:
 ; F16C:       # %bb.0:
-; F16C-NEXT:    vextractf128 $1, %ymm0, %xmm1
-; F16C-NEXT:    vcvtph2ps %xmm1, %ymm1
-; F16C-NEXT:    vcvtph2ps %xmm0, %ymm0
+; F16C-NEXT:    vcvtph2ps %xmm0, %ymm2
+; F16C-NEXT:    vextractf128 $1, %ymm0, %xmm0
+; F16C-NEXT:    vcvtph2ps %xmm0, %ymm1
+; F16C-NEXT:    vmovaps %ymm2, %ymm0
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: cvt_16i16_to_16f32_constrained:
@@ -1644,14 +1643,12 @@ define <2 x double> @cvt_2i16_to_2f64_constrained(<2 x i16> %a0) nounwind strict
 ;
 ; F16C-LABEL: cvt_2i16_to_2f64_constrained:
 ; F16C:       # %bb.0:
-; F16C-NEXT:    vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
 ; F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; F16C-NEXT:    vcvtps2pd %xmm0, %xmm0
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: cvt_2i16_to_2f64_constrained:
 ; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
 ; AVX512-NEXT:    vcvtph2ps %xmm0, %xmm0
 ; AVX512-NEXT:    vcvtps2pd %xmm0, %xmm0
 ; AVX512-NEXT:    retq
@@ -1775,10 +1772,10 @@ define <8 x double> @cvt_8i16_to_8f64_constrained(<8 x i16> %a0) nounwind strict
 ;
 ; F16C-LABEL: cvt_8i16_to_8f64_constrained:
 ; F16C:       # %bb.0:
-; F16C-NEXT:    vcvtph2ps %xmm0, %ymm0
-; F16C-NEXT:    vextractf128 $1, %ymm0, %xmm1
+; F16C-NEXT:    vcvtph2ps %xmm0, %ymm1
+; F16C-NEXT:    vcvtps2pd %xmm1, %ymm0
+; F16C-NEXT:    vextractf128 $1, %ymm1, %xmm1
 ; F16C-NEXT:    vcvtps2pd %xmm1, %ymm1
-; F16C-NEXT:    vcvtps2pd %xmm0, %ymm0
 ; F16C-NEXT:    retq
 ;
 ; AVX512-LABEL: cvt_8i16_to_8f64_constrained:
diff --git a/llvm/test/CodeGen/X86/vector-shuffle-combining.ll b/llvm/test/CodeGen/X86/vector-shuffle-combining.ll
index a913963b7a9d1..bbfb33e58b82c 100644
--- a/llvm/test/CodeGen/X86/vector-shuffle-combining.ll
+++ b/llvm/test/CodeGen/X86/vector-shuffle-combining.ll
@@ -3235,14 +3235,10 @@ define void @PR43024_strictfp() strictfp {
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    movsd {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
 ; SSE2-NEXT:    movaps %xmm0, (%rax)
+; SSE2-NEXT:    addss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    mulps %xmm1, %xmm0
-; SSE2-NEXT:    movaps %xmm0, %xmm2
-; SSE2-NEXT:    shufps {{.*#+}} xmm2 = xmm2[1,1],xmm0[1,1]
-; SSE2-NEXT:    addps %xmm0, %xmm2
-; SSE2-NEXT:    addps %xmm1, %xmm2
-; SSE2-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; SSE2-NEXT:    addps %xmm2, %xmm0
+; SSE2-NEXT:    addss %xmm1, %xmm0
+; SSE2-NEXT:    addss %xmm1, %xmm0
 ; SSE2-NEXT:    movss %xmm0, (%rax)
 ; SSE2-NEXT:    retq
 ;
@@ -3250,13 +3246,10 @@ define void @PR43024_strictfp() strictfp {
 ; SSSE3:       # %bb.0:
 ; SSSE3-NEXT:    movsd {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
 ; SSSE3-NEXT:    movaps %xmm0, (%rax)
+; SSSE3-NEXT:    addss %xmm0, %xmm0
 ; SSSE3-NEXT:    xorps %xmm1, %xmm1
-; SSSE3-NEXT:    mulps %xmm1, %xmm0
-; SSSE3-NEXT:    movshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; SSSE3-NEXT:    addps %xmm0, %xmm2
-; SSSE3-NEXT:    addps %xmm1, %xmm2
-; SSSE3-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; SSSE3-NEXT:    addps %xmm2, %xmm0
+; SSSE3-NEXT:    addss %xmm1, %xmm0
+; SSSE3-NEXT:    addss %xmm1, %xmm0
 ; SSSE3-NEXT:    movss %xmm0, (%rax)
 ; SSSE3-NEXT:    retq
 ;
@@ -3264,13 +3257,10 @@ define void @PR43024_strictfp() strictfp {
 ; SSE41:       # %bb.0:
 ; SSE41-NEXT:    movsd {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
 ; SSE41-NEXT:    movaps %xmm0, (%rax)
+; SSE41-NEXT:    addss %xmm0, %xmm0
 ; SSE41-NEXT:    xorps %xmm1, %xmm1
-; SSE41-NEXT:    mulps %xmm1, %xmm0
-; SSE41-NEXT:    movshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; SSE41-NEXT:    addps %xmm0, %xmm2
-; SSE41-NEXT:    addps %xmm1, %xmm2
-; SSE41-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; SSE41-NEXT:    addps %xmm2, %xmm0
+; SSE41-NEXT:    addss %xmm1, %xmm0
+; SSE41-NEXT:    addss %xmm1, %xmm0
 ; SSE41-NEXT:    movss %xmm0, (%rax)
 ; SSE41-NEXT:    retq
 ;
@@ -3278,13 +3268,10 @@ define void @PR43024_strictfp() strictfp {
 ; AVX:       # %bb.0:
 ; AVX-NEXT:    vmovsd {{.*#+}} xmm0 = [NaN,NaN,0.0E+0,0.0E+0]
 ; AVX-NEXT:    vmovaps %xmm0, (%rax)
+; AVX-NEXT:    vaddss {{\.?LCPI[0-9]+_[0-9]+}}+4(%rip), %xmm0, %xmm0
 ; AVX-NEXT:    vxorps %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vmulps %xmm1, %xmm0, %xmm0
-; AVX-NEXT:    vmovshdup {{.*#+}} xmm2 = xmm0[1,1,3,3]
-; AVX-NEXT:    vaddps %xmm2, %xmm0, %xmm2
-; AVX-NEXT:    vaddps %xmm2, %xmm1, %xmm1
-; AVX-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; AVX-NEXT:    vaddps %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vaddss %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vaddss %xmm1, %xmm0, %xmm0
 ; AVX-NEXT:    vmovss %xmm0, (%rax)
 ; AVX-NEXT:    retq
   store <4 x float> <float 0x7FF8000000000000, float 0x7FF8000000000000, float 0x0, float 0x0>, ptr undef, align 16
diff --git a/llvm/test/Feature/fp-intrinsics.ll b/llvm/test/Feature/fp-intrinsics.ll
index ada22c39abc9e..f750d7154c98c 100644
--- a/llvm/test/Feature/fp-intrinsics.ll
+++ b/llvm/test/Feature/fp-intrinsics.ll
@@ -1,15 +1,19 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt -O3 -S < %s | FileCheck %s
 
 ; Test to verify that constants aren't folded when the rounding mode is unknown.
-; CHECK-LABEL: @f1
-; CHECK: call double @llvm.experimental.constrained.fdiv.f64
 define double @f1() #0 {
+; CHECK-LABEL: define noundef double @f1(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret double 1.000000e-01
+;
 entry:
   %div = call double @llvm.experimental.constrained.fdiv.f64(
-                                               double 1.000000e+00,
-                                               double 1.000000e+01,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double 1.000000e+00,
+  double 1.000000e+01,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %div
 }
 
@@ -21,14 +25,17 @@ entry:
 ;   return a - 0.0;
 ; }
 ;
-; CHECK-LABEL: @f2
-; CHECK: call double @llvm.experimental.constrained.fsub.f64
 define double @f2(double %a) #0 {
+; CHECK-LABEL: define double @f2(
+; CHECK-SAME: double returned [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret double [[A]]
+;
 entry:
   %div = call double @llvm.experimental.constrained.fsub.f64(
-                                               double %a, double 0.000000e+00,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double %a, double 0.000000e+00,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %div
 }
 
@@ -41,25 +48,27 @@ entry:
 ;   return -((-a)*b);
 ; }
 ;
-; CHECK-LABEL: @f3
-; CHECK: call double @llvm.experimental.constrained.fsub.f64
-; CHECK: call double @llvm.experimental.constrained.fmul.f64
-; CHECK: call double @llvm.experimental.constrained.fsub.f64
 define double @f3(double %a, double %b) #0 {
+; CHECK-LABEL: define double @f3(
+; CHECK-SAME: double [[A:%.*]], double [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RET3:%.*]] = fmul double [[A]], [[B]]
+; CHECK-NEXT:    ret double [[RET3]]
+;
 entry:
   %sub = call double @llvm.experimental.constrained.fsub.f64(
-                                               double -0.000000e+00, double %a,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double -0.000000e+00, double %a,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   %mul = call double @llvm.experimental.constrained.fmul.f64(
-                                               double %sub, double %b,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double %sub, double %b,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   %ret = call double @llvm.experimental.constrained.fsub.f64(
-                                               double -0.000000e+00,
-                                               double %mul,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double -0.000000e+00,
+  double %mul,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %ret
 }
 
@@ -74,19 +83,24 @@ entry:
 ; }
 ;
 ;
-; CHECK-LABEL: @f4
-; CHECK-NOT: select
-; CHECK: br i1 %cmp
 define double @f4(i32 %n, double %a) #0 {
+; CHECK-LABEL: define double @f4(
+; CHECK-SAME: i32 [[N:%.*]], double [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[N]], 0
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd double [[A]], 1.000000e+00
+; CHECK-NEXT:    [[SPEC_SELECT:%.*]] = select i1 [[CMP]], double [[ADD1]], double [[A]]
+; CHECK-NEXT:    ret double [[SPEC_SELECT]]
+;
 entry:
   %cmp = icmp sgt i32 %n, 0
   br i1 %cmp, label %if.then, label %if.end
 
 if.then:
   %add = call double @llvm.experimental.constrained.fadd.f64(
-                                               double 1.000000e+00, double %a,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double 1.000000e+00, double %a,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   br label %if.end
 
 if.end:
@@ -95,393 +109,527 @@ if.end:
 }
 
 ; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f5
-; CHECK: call double @llvm.experimental.constrained.sqrt
 define double @f5() #0 {
+; CHECK-LABEL: define noundef double @f5(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = tail call float @llvm.sqrt.f32(float 4.200000e+01)
+; CHECK-NEXT:    ret double 0x4019EC474A261264
+;
 entry:
   %result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that pow(42.1, 3.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f6
-; CHECK: call double @llvm.experimental.constrained.pow
 define double @f6() #0 {
+; CHECK-LABEL: define noundef double @f6(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.pow.f64(double 4.210000e+01, double 3.000000e+00)
+; CHECK-NEXT:    ret double 0x40F237A760418938
+;
 entry:
   %result = call double @llvm.experimental.constrained.pow.f64(double 42.1,
-                                               double 3.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double 3.0,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that powi(42.1, 3) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f7
-; CHECK: call double @llvm.experimental.constrained.powi
 define double @f7() #0 {
+; CHECK-LABEL: define noundef double @f7(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.powi.f64.i32(double 4.210000e+01, i32 3)
+; CHECK-NEXT:    ret double 0x40F237A760418938
+;
 entry:
   %result = call double @llvm.experimental.constrained.powi.f64(double 42.1,
-                                               i32 3,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  i32 3,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that sin(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f8
-; CHECK: call double @llvm.experimental.constrained.sin
 define double @f8() #0 {
+; CHECK-LABEL: define noundef double @f8(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.sin.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0xBFED5424FF4C0FED
+;
 entry:
   %result = call double @llvm.experimental.constrained.sin.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that cos(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f9
-; CHECK: call double @llvm.experimental.constrained.cos
 define double @f9() #0 {
+; CHECK-LABEL: define noundef double @f9(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.cos.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0xBFD9995C01B055A5
+;
 entry:
   %result = call double @llvm.experimental.constrained.cos.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that tan(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: ftan
-; CHECK: call double @llvm.experimental.constrained.tan
 define double @ftan() #0 {
+; CHECK-LABEL: define double @ftan(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.tan.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double [[RESULT1]]
+;
 entry:
   %result = call double @llvm.experimental.constrained.tan.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that acos(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: facos
-; CHECK: call double @llvm.experimental.constrained.acos
 define double @facos() #0 {
+; CHECK-LABEL: define double @facos(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.acos.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double [[RESULT1]]
+;
 entry:
   %result = call double @llvm.experimental.constrained.acos.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that asin(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: fasin
-; CHECK: call double @llvm.experimental.constrained.asin
 define double @fasin() #0 {
+; CHECK-LABEL: define double @fasin(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.asin.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double [[RESULT1]]
+;
 entry:
   %result = call double @llvm.experimental.constrained.asin.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that atan(42.0, 23.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: fatan
-; CHECK: call double @llvm.experimental.constrained.atan
 define double @fatan() #0 {
+; CHECK-LABEL: define noundef double @fatan(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.atan.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x3FF8C079F3350D26
+;
 entry:
   %result = call double @llvm.experimental.constrained.atan.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that atan2(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: fatan2
-; CHECK: call double @llvm.experimental.constrained.atan2
 define double @fatan2() #0 {
+; CHECK-LABEL: define double @fatan2(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.atan2.f64(double 4.200000e+01, double 2.300000e+01)
+; CHECK-NEXT:    ret double [[RESULT1]]
+;
 entry:
   %result = call double @llvm.experimental.constrained.atan2.f64(
-                                              double 42.0,
-                                              double 23.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double 42.0,
+  double 23.0,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that cosh(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: fcosh
-; CHECK: call double @llvm.experimental.constrained.cosh
 define double @fcosh() #0 {
+; CHECK-LABEL: define noundef double @fcosh(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.cosh.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x43A8232558201159
+;
 entry:
   %result = call double @llvm.experimental.constrained.cosh.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that sinh(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: fsinh
-; CHECK: call double @llvm.experimental.constrained.sinh
 define double @fsinh() #0 {
+; CHECK-LABEL: define noundef double @fsinh(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.sinh.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x43A8232558201159
+;
 entry:
   %result = call double @llvm.experimental.constrained.sinh.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that tanh(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: ftanh
-; CHECK: call double @llvm.experimental.constrained.tanh
 define double @ftanh() #0 {
+; CHECK-LABEL: define double @ftanh(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.tanh.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double [[RESULT1]]
+;
 entry:
   %result = call double @llvm.experimental.constrained.tanh.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f10
-; CHECK: call double @llvm.experimental.constrained.exp
 define double @f10() #0 {
+; CHECK-LABEL: define noundef double @f10(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.exp.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x43B8232558201159
+;
 entry:
   %result = call double @llvm.experimental.constrained.exp.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that exp2(42.1) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f11
-; CHECK: call double @llvm.experimental.constrained.exp2
 define double @f11() #0 {
+; CHECK-LABEL: define noundef double @f11(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.exp2.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret double 0x429125FBEE250669
+;
 entry:
   %result = call double @llvm.experimental.constrained.exp2.f64(double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that log(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f12
-; CHECK: call double @llvm.experimental.constrained.log
 define double @f12() #0 {
+; CHECK-LABEL: define noundef double @f12(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.log.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x400DE6BF542E3D2D
+;
 entry:
   %result = call double @llvm.experimental.constrained.log.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that log10(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f13
-; CHECK: call double @llvm.experimental.constrained.log10
 define double @f13() #0 {
+; CHECK-LABEL: define noundef double @f13(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.log10.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x3FF9F8D43F783A1F
+;
 entry:
   %result = call double @llvm.experimental.constrained.log10.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that log2(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f14
-; CHECK: call double @llvm.experimental.constrained.log2
 define double @f14() #0 {
+; CHECK-LABEL: define noundef double @f14(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.log2.f64(double 4.200000e+01)
+; CHECK-NEXT:    ret double 0x401591BBA891F171
+;
 entry:
   %result = call double @llvm.experimental.constrained.log2.f64(double 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that rint(42.1) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f15
-; CHECK: call double @llvm.experimental.constrained.rint
 define double @f15() #0 {
+; CHECK-LABEL: define noundef double @f15(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.rint.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret double 4.200000e+01
+;
 entry:
   %result = call double @llvm.experimental.constrained.rint.f64(double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that nearbyint(42.1) isn't simplified when the rounding mode is
 ; unknown.
-; CHECK-LABEL: f16
-; CHECK: call double @llvm.experimental.constrained.nearbyint
 define double @f16() #0 {
+; CHECK-LABEL: define noundef double @f16(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.nearbyint.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret double 4.200000e+01
+;
 entry:
   %result = call double @llvm.experimental.constrained.nearbyint.f64(
-                                               double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double 42.1,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that fma(42.1) isn't simplified when the rounding mode is
 ; unknown.
-; CHECK-LABEL: f17
-; CHECK: call double @llvm.experimental.constrained.fma
 define double @f17() #0 {
+; CHECK-LABEL: define noundef double @f17(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call double @llvm.fma.f64(double 4.210000e+01, double 4.210000e+01, double 4.210000e+01)
+; CHECK-NEXT:    ret double 0x409C5A0A3D70A3D8
+;
 entry:
   %result = call double @llvm.experimental.constrained.fma.f64(double 42.1, double 42.1, double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that fptoui(42.1) isn't simplified when the rounding mode is
 ; unknown.
-; CHECK-LABEL: f18
-; CHECK: call zeroext i32 @llvm.experimental.constrained.fptoui
 define zeroext i32 @f18() #0 {
+; CHECK-LABEL: define noundef zeroext i32 @f18(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 42
+;
 entry:
   %result = call zeroext i32 @llvm.experimental.constrained.fptoui.i32.f64(
-                                               double 42.1,
-                                               metadata !"fpexcept.strict") #0
+  double 42.1,
+  metadata !"fpexcept.strict") #0
   ret i32 %result
 }
 
 ; Verify that fptosi(42.1) isn't simplified when the rounding mode is
 ; unknown.
-; CHECK-LABEL: f19
-; CHECK: call i32 @llvm.experimental.constrained.fptosi
 define i32 @f19() #0 {
+; CHECK-LABEL: define noundef i32 @f19(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 42
+;
 entry:
   %result = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double 42.1,
-                                               metadata !"fpexcept.strict") #0
+  metadata !"fpexcept.strict") #0
   ret i32 %result
 }
 
 ; Verify that fptrunc(42.1) isn't simplified when the rounding mode is
 ; unknown.
-; CHECK-LABEL: f20
-; CHECK: call float @llvm.experimental.constrained.fptrunc
 define float @f20() #0 {
+; CHECK-LABEL: define noundef float @f20(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret float 0x40450CCCC0000000
+;
 entry:
   %result = call float @llvm.experimental.constrained.fptrunc.f32.f64(
-                                               double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  double 42.1,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret float %result
 }
 
 ; Verify that fpext(42.1) isn't simplified when the rounding mode is
 ; unknown.
-; CHECK-LABEL: f21
-; CHECK: call double @llvm.experimental.constrained.fpext
 define double @f21() #0 {
+; CHECK-LABEL: define noundef double @f21(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret double 4.200000e+01
+;
 entry:
   %result = call double @llvm.experimental.constrained.fpext.f64.f32(float 42.0,
-                                               metadata !"fpexcept.strict") #0
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that lrint(42.1) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f22
-; CHECK: call i32 @llvm.experimental.constrained.lrint
 define i32 @f22() #0 {
+; CHECK-LABEL: define noundef i32 @f22(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i32 @llvm.lrint.i32.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret i32 [[RESULT1]]
+;
 entry:
   %result = call i32 @llvm.experimental.constrained.lrint.i32.f64(double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret i32 %result
 }
 
 ; Verify that lrintf(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f23
-; CHECK: call i32 @llvm.experimental.constrained.lrint
 define i32 @f23() #0 {
+; CHECK-LABEL: define noundef i32 @f23(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i32 @llvm.lrint.i32.f32(float 4.200000e+01)
+; CHECK-NEXT:    ret i32 [[RESULT1]]
+;
 entry:
   %result = call i32 @llvm.experimental.constrained.lrint.i32.f32(float 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret i32 %result
 }
 
 ; Verify that llrint(42.1) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f24
-; CHECK: call i64 @llvm.experimental.constrained.llrint
 define i64 @f24() #0 {
+; CHECK-LABEL: define noundef i64 @f24(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i64 @llvm.llrint.i64.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret i64 [[RESULT1]]
+;
 entry:
   %result = call i64 @llvm.experimental.constrained.llrint.i64.f64(double 42.1,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret i64 %result
 }
 
 ; Verify that llrint(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f25
-; CHECK: call i64 @llvm.experimental.constrained.llrint
 define i64 @f25() #0 {
+; CHECK-LABEL: define noundef i64 @f25(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i64 @llvm.llrint.i64.f32(float 4.200000e+01)
+; CHECK-NEXT:    ret i64 [[RESULT1]]
+;
 entry:
   %result = call i64 @llvm.experimental.constrained.llrint.i64.f32(float 42.0,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret i64 %result
 }
 
 ; Verify that lround(42.1) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f26
-; CHECK: call i32 @llvm.experimental.constrained.lround
 define i32 @f26() #0 {
+; CHECK-LABEL: define noundef i32 @f26(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i32 @llvm.lround.i32.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret i32 [[RESULT1]]
+;
 entry:
   %result = call i32 @llvm.experimental.constrained.lround.i32.f64(double 42.1,
-                                               metadata !"fpexcept.strict") #0
+  metadata !"fpexcept.strict") #0
   ret i32 %result
 }
 
 ; Verify that lround(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f27
-; CHECK: call i32 @llvm.experimental.constrained.lround
 define i32 @f27() #0 {
+; CHECK-LABEL: define noundef i32 @f27(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i32 @llvm.lround.i32.f32(float 4.200000e+01)
+; CHECK-NEXT:    ret i32 [[RESULT1]]
+;
 entry:
   %result = call i32 @llvm.experimental.constrained.lround.i32.f32(float 42.0,
-                                               metadata !"fpexcept.strict") #0
+  metadata !"fpexcept.strict") #0
   ret i32 %result
 }
 
 ; Verify that llround(42.1) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f28
-; CHECK: call i64 @llvm.experimental.constrained.llround
 define i64 @f28() #0 {
+; CHECK-LABEL: define noundef i64 @f28(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i64 @llvm.llround.i64.f64(double 4.210000e+01)
+; CHECK-NEXT:    ret i64 [[RESULT1]]
+;
 entry:
   %result = call i64 @llvm.experimental.constrained.llround.i64.f64(double 42.1,
-                                               metadata !"fpexcept.strict") #0
+  metadata !"fpexcept.strict") #0
   ret i64 %result
 }
 
 ; Verify that llround(42.0) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: f29
-; CHECK: call i64 @llvm.experimental.constrained.llround
 define i64 @f29() #0 {
+; CHECK-LABEL: define noundef i64 @f29(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = tail call i64 @llvm.llround.i64.f32(float 4.200000e+01)
+; CHECK-NEXT:    ret i64 [[RESULT1]]
+;
 entry:
   %result = call i64 @llvm.experimental.constrained.llround.i64.f32(float 42.0,
-                                               metadata !"fpexcept.strict") #0
+  metadata !"fpexcept.strict") #0
   ret i64 %result
 }
 
 ; Verify that sitofp(42) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: @f30
-; CHECK: call double @llvm.experimental.constrained.sitofp
 define double @f30() #0 {
+; CHECK-LABEL: define noundef double @f30(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret double 4.200000e+01
+;
 entry:
   %result = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 42,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
 ; Verify that uitofp(42) isn't simplified when the rounding mode is unknown.
-; CHECK-LABEL: @f31
-; CHECK: call double @llvm.experimental.constrained.uitofp
 define double @f31() #0 {
+; CHECK-LABEL: define noundef double @f31(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret double 4.200000e+01
+;
 entry:
   %result = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 42,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict") #0
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict") #0
   ret double %result
 }
 
diff --git a/llvm/test/Instrumentation/MemorySanitizer/AArch64/arm64-vmul.ll b/llvm/test/Instrumentation/MemorySanitizer/AArch64/arm64-vmul.ll
index e9bb743b189fe..a3aa5cd3f7a52 100644
--- a/llvm/test/Instrumentation/MemorySanitizer/AArch64/arm64-vmul.ll
+++ b/llvm/test/Instrumentation/MemorySanitizer/AArch64/arm64-vmul.ll
@@ -18,7 +18,7 @@ define <8 x i16> @smull8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1:![0-9]+]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8:[0-9]+]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7:[0-9]+]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i8>, ptr [[A]], align 8
@@ -29,7 +29,7 @@ define <8 x i16> @smull8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <8 x i8>, ptr [[B]], align 8
@@ -59,7 +59,7 @@ define <4 x i32> @smull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -70,7 +70,7 @@ define <4 x i32> @smull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -100,7 +100,7 @@ define <2 x i64> @smull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -111,7 +111,7 @@ define <2 x i64> @smull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -145,7 +145,7 @@ define <8 x i16> @umull8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i8>, ptr [[A]], align 8
@@ -156,7 +156,7 @@ define <8 x i16> @umull8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <8 x i8>, ptr [[B]], align 8
@@ -186,7 +186,7 @@ define <4 x i32> @umull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -197,7 +197,7 @@ define <4 x i32> @umull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -227,7 +227,7 @@ define <2 x i64> @umull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -238,7 +238,7 @@ define <2 x i64> @umull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -272,7 +272,7 @@ define <4 x i32> @sqdmull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -283,7 +283,7 @@ define <4 x i32> @sqdmull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -298,7 +298,7 @@ define <4 x i32> @sqdmull4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP3]], [[_MSCMP4]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB15:.*]], label %[[BB16:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB15]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB16]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -320,7 +320,7 @@ define <2 x i64> @sqdmull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -331,7 +331,7 @@ define <2 x i64> @sqdmull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -346,7 +346,7 @@ define <2 x i64> @sqdmull2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP3]], [[_MSCMP4]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB15:.*]], label %[[BB16:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB15]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB16]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -368,7 +368,7 @@ define <4 x i32> @sqdmull2_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[LOAD1:%.*]] = load <8 x i16>, ptr [[A]], align 16
@@ -379,7 +379,7 @@ define <4 x i32> @sqdmull2_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load <8 x i16>, ptr [[B]], align 16
@@ -398,7 +398,7 @@ define <4 x i32> @sqdmull2_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP4]], [[_MSCMP5]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB15:.*]], label %[[BB16:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB15]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB16]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -422,7 +422,7 @@ define <2 x i64> @sqdmull2_2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[LOAD1:%.*]] = load <4 x i32>, ptr [[A]], align 16
@@ -433,7 +433,7 @@ define <2 x i64> @sqdmull2_2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load <4 x i32>, ptr [[B]], align 16
@@ -452,7 +452,7 @@ define <2 x i64> @sqdmull2_2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP4]], [[_MSCMP5]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB15:.*]], label %[[BB16:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB15]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB16]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -480,7 +480,7 @@ define <8 x i16> @pmull8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i8>, ptr [[A]], align 8
@@ -491,7 +491,7 @@ define <8 x i16> @pmull8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <8 x i8>, ptr [[B]], align 8
@@ -523,7 +523,7 @@ define <4 x i16> @sqdmulh_4h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -534,7 +534,7 @@ define <4 x i16> @sqdmulh_4h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -562,7 +562,7 @@ define <8 x i16> @sqdmulh_8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i16>, ptr [[A]], align 16
@@ -573,7 +573,7 @@ define <8 x i16> @sqdmulh_8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <8 x i16>, ptr [[B]], align 16
@@ -601,7 +601,7 @@ define <2 x i32> @sqdmulh_2s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -612,7 +612,7 @@ define <2 x i32> @sqdmulh_2s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -640,7 +640,7 @@ define <4 x i32> @sqdmulh_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i32>, ptr [[A]], align 16
@@ -651,7 +651,7 @@ define <4 x i32> @sqdmulh_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, ptr [[B]], align 16
@@ -679,7 +679,7 @@ define i32 @sqdmulh_1s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[A]], align 4
@@ -690,7 +690,7 @@ define i32 @sqdmulh_1s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[B]], align 4
@@ -724,7 +724,7 @@ define <4 x i16> @sqrdmulh_4h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -735,7 +735,7 @@ define <4 x i16> @sqrdmulh_4h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -763,7 +763,7 @@ define <8 x i16> @sqrdmulh_8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i16>, ptr [[A]], align 16
@@ -774,7 +774,7 @@ define <8 x i16> @sqrdmulh_8h(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <8 x i16>, ptr [[B]], align 16
@@ -802,7 +802,7 @@ define <2 x i32> @sqrdmulh_2s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -813,7 +813,7 @@ define <2 x i32> @sqrdmulh_2s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -841,7 +841,7 @@ define <4 x i32> @sqrdmulh_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i32>, ptr [[A]], align 16
@@ -852,7 +852,7 @@ define <4 x i32> @sqrdmulh_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, ptr [[B]], align 16
@@ -880,7 +880,7 @@ define i32 @sqrdmulh_1s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[A]], align 4
@@ -891,7 +891,7 @@ define i32 @sqrdmulh_1s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP2:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP2]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[B]], align 4
@@ -925,7 +925,7 @@ define <2 x float> @fmulx_2s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[A]], align 8
@@ -936,7 +936,7 @@ define <2 x float> @fmulx_2s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, ptr [[B]], align 8
@@ -965,7 +965,7 @@ define <4 x float> @fmulx_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, ptr [[A]], align 16
@@ -976,7 +976,7 @@ define <4 x float> @fmulx_4s(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, ptr [[B]], align 16
@@ -1005,7 +1005,7 @@ define <2 x double> @fmulx_2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP4]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB3:.*]], label %[[BB4:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB3]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB4]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, ptr [[A]], align 16
@@ -1016,7 +1016,7 @@ define <2 x double> @fmulx_2d(ptr %A, ptr %B) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP8]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB8:.*]], label %[[BB9:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB8]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB9]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[B]], align 16
@@ -1050,7 +1050,7 @@ define <4 x i32> @smlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -1061,7 +1061,7 @@ define <4 x i32> @smlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -1072,7 +1072,7 @@ define <4 x i32> @smlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1107,7 +1107,7 @@ define <2 x i64> @smlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -1118,7 +1118,7 @@ define <2 x i64> @smlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -1129,7 +1129,7 @@ define <2 x i64> @smlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -1214,7 +1214,7 @@ define <4 x i32> @smlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -1225,7 +1225,7 @@ define <4 x i32> @smlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -1236,7 +1236,7 @@ define <4 x i32> @smlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1271,7 +1271,7 @@ define <2 x i64> @smlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -1282,7 +1282,7 @@ define <2 x i64> @smlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -1293,7 +1293,7 @@ define <2 x i64> @smlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -1383,7 +1383,7 @@ define <4 x i32> @sqdmlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -1394,7 +1394,7 @@ define <4 x i32> @sqdmlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -1405,7 +1405,7 @@ define <4 x i32> @sqdmlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1420,7 +1420,7 @@ define <4 x i32> @sqdmlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP5]], [[_MSCMP6]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -1447,7 +1447,7 @@ define <2 x i64> @sqdmlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -1458,7 +1458,7 @@ define <2 x i64> @sqdmlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -1469,7 +1469,7 @@ define <2 x i64> @sqdmlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -1484,7 +1484,7 @@ define <2 x i64> @sqdmlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP5]], [[_MSCMP6]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -1511,7 +1511,7 @@ define <4 x i32> @sqdmlal2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[LOAD1:%.*]] = load <8 x i16>, ptr [[A]], align 16
@@ -1522,7 +1522,7 @@ define <4 x i32> @sqdmlal2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load <8 x i16>, ptr [[B]], align 16
@@ -1533,7 +1533,7 @@ define <4 x i32> @sqdmlal2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1552,7 +1552,7 @@ define <4 x i32> @sqdmlal2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP7]], [[_MSCMP8]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -1581,7 +1581,7 @@ define <2 x i64> @sqdmlal2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[LOAD1:%.*]] = load <4 x i32>, ptr [[A]], align 16
@@ -1592,7 +1592,7 @@ define <2 x i64> @sqdmlal2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load <4 x i32>, ptr [[B]], align 16
@@ -1603,7 +1603,7 @@ define <2 x i64> @sqdmlal2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -1622,7 +1622,7 @@ define <2 x i64> @sqdmlal2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP7]], [[_MSCMP8]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -1651,7 +1651,7 @@ define <4 x i32> @sqdmlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -1662,7 +1662,7 @@ define <4 x i32> @sqdmlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -1673,7 +1673,7 @@ define <4 x i32> @sqdmlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1688,7 +1688,7 @@ define <4 x i32> @sqdmlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP5]], [[_MSCMP6]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -1715,7 +1715,7 @@ define <2 x i64> @sqdmlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -1726,7 +1726,7 @@ define <2 x i64> @sqdmlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP3:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP3]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -1737,7 +1737,7 @@ define <2 x i64> @sqdmlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -1752,7 +1752,7 @@ define <2 x i64> @sqdmlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP5]], [[_MSCMP6]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -1779,7 +1779,7 @@ define <4 x i32> @sqdmlsl2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[LOAD1:%.*]] = load <8 x i16>, ptr [[A]], align 16
@@ -1790,7 +1790,7 @@ define <4 x i32> @sqdmlsl2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load <8 x i16>, ptr [[B]], align 16
@@ -1801,7 +1801,7 @@ define <4 x i32> @sqdmlsl2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1820,7 +1820,7 @@ define <4 x i32> @sqdmlsl2_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP7]], [[_MSCMP8]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -1849,7 +1849,7 @@ define <2 x i64> @sqdmlsl2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[LOAD1:%.*]] = load <4 x i32>, ptr [[A]], align 16
@@ -1860,7 +1860,7 @@ define <2 x i64> @sqdmlsl2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[LOAD2:%.*]] = load <4 x i32>, ptr [[B]], align 16
@@ -1871,7 +1871,7 @@ define <2 x i64> @sqdmlsl2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -1890,7 +1890,7 @@ define <2 x i64> @sqdmlsl2_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP7]], [[_MSCMP8]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB21:.*]], label %[[BB22:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB21]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB22]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -1919,7 +1919,7 @@ define <4 x i32> @umlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -1930,7 +1930,7 @@ define <4 x i32> @umlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -1941,7 +1941,7 @@ define <4 x i32> @umlal4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -1976,7 +1976,7 @@ define <2 x i64> @umlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -1987,7 +1987,7 @@ define <2 x i64> @umlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -1998,7 +1998,7 @@ define <2 x i64> @umlal2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -2083,7 +2083,7 @@ define <4 x i32> @umlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, ptr [[A]], align 8
@@ -2094,7 +2094,7 @@ define <4 x i32> @umlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i16>, ptr [[B]], align 8
@@ -2105,7 +2105,7 @@ define <4 x i32> @umlsl4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i32>, ptr [[C]], align 16
@@ -2140,7 +2140,7 @@ define <2 x i64> @umlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, ptr [[A]], align 8
@@ -2151,7 +2151,7 @@ define <2 x i64> @umlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, ptr [[B]], align 8
@@ -2162,7 +2162,7 @@ define <2 x i64> @umlsl2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[C]], align 16
@@ -2247,7 +2247,7 @@ define <2 x float> @fmla_2s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP5]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[A]], align 8
@@ -2258,7 +2258,7 @@ define <2 x float> @fmla_2s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, ptr [[B]], align 8
@@ -2269,7 +2269,7 @@ define <2 x float> @fmla_2s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, ptr [[C]], align 8
@@ -2300,7 +2300,7 @@ define <4 x float> @fmla_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP5]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, ptr [[A]], align 16
@@ -2311,7 +2311,7 @@ define <4 x float> @fmla_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, ptr [[B]], align 16
@@ -2322,7 +2322,7 @@ define <4 x float> @fmla_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x float>, ptr [[C]], align 16
@@ -2353,7 +2353,7 @@ define <2 x double> @fmla_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP5]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, ptr [[A]], align 16
@@ -2364,7 +2364,7 @@ define <2 x double> @fmla_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP4:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP4]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[B]], align 16
@@ -2375,7 +2375,7 @@ define <2 x double> @fmla_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x double>, ptr [[C]], align 16
@@ -2410,7 +2410,7 @@ define <2 x float> @fmls_2s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[A]], align 8
@@ -2421,7 +2421,7 @@ define <2 x float> @fmls_2s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, ptr [[B]], align 8
@@ -2432,7 +2432,7 @@ define <2 x float> @fmls_2s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, ptr [[C]], align 8
@@ -2466,7 +2466,7 @@ define <4 x float> @fmls_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, ptr [[A]], align 16
@@ -2477,7 +2477,7 @@ define <4 x float> @fmls_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, ptr [[B]], align 16
@@ -2488,7 +2488,7 @@ define <4 x float> @fmls_4s(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x float>, ptr [[C]], align 16
@@ -2522,7 +2522,7 @@ define <2 x double> @fmls_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, ptr [[A]], align 16
@@ -2533,7 +2533,7 @@ define <2 x double> @fmls_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[B]], align 16
@@ -2544,7 +2544,7 @@ define <2 x double> @fmls_2d(ptr %A, ptr %B, ptr %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x double>, ptr [[C]], align 16
@@ -2578,7 +2578,7 @@ define <2 x float> @fmls_commuted_neg_2s(ptr %A, ptr %B, ptr %C) nounwind saniti
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[A]], align 8
@@ -2589,7 +2589,7 @@ define <2 x float> @fmls_commuted_neg_2s(ptr %A, ptr %B, ptr %C) nounwind saniti
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, ptr [[B]], align 8
@@ -2600,7 +2600,7 @@ define <2 x float> @fmls_commuted_neg_2s(ptr %A, ptr %B, ptr %C) nounwind saniti
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, ptr [[C]], align 8
@@ -2634,7 +2634,7 @@ define <4 x float> @fmls_commuted_neg_4s(ptr %A, ptr %B, ptr %C) nounwind saniti
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, ptr [[A]], align 16
@@ -2645,7 +2645,7 @@ define <4 x float> @fmls_commuted_neg_4s(ptr %A, ptr %B, ptr %C) nounwind saniti
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, ptr [[B]], align 16
@@ -2656,7 +2656,7 @@ define <4 x float> @fmls_commuted_neg_4s(ptr %A, ptr %B, ptr %C) nounwind saniti
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x float>, ptr [[C]], align 16
@@ -2690,7 +2690,7 @@ define <2 x double> @fmls_commuted_neg_2d(ptr %A, ptr %B, ptr %C) nounwind sanit
 ; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i64 [[TMP9]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, ptr [[A]], align 16
@@ -2701,7 +2701,7 @@ define <2 x double> @fmls_commuted_neg_2d(ptr %A, ptr %B, ptr %C) nounwind sanit
 ; CHECK-NEXT:    [[_MSCMP5:%.*]] = icmp ne i64 [[TMP10]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP5]], label %[[BB9:.*]], label %[[BB10:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB9]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB10]]:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[B]], align 16
@@ -2712,7 +2712,7 @@ define <2 x double> @fmls_commuted_neg_2d(ptr %A, ptr %B, ptr %C) nounwind sanit
 ; CHECK-NEXT:    [[_MSCMP6:%.*]] = icmp ne i64 [[TMP14]], 0
 ; CHECK-NEXT:    br i1 [[_MSCMP6]], label %[[BB14:.*]], label %[[BB15:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB14]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB15]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x double>, ptr [[C]], align 16
@@ -2797,7 +2797,7 @@ define <2 x float> @fmla_indexed_scalar_2s(<2 x float> %a, <2 x float> %b, float
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[V1:%.*]] = insertelement <2 x float> undef, float [[C]], i32 0
 ; CHECK-NEXT:    [[V2:%.*]] = insertelement <2 x float> [[V1]], float [[C]], i32 1
-; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <2 x float> @llvm.fma.v2f32(<2 x float> [[V1]], <2 x float> [[B]], <2 x float> [[A]]) #[[ATTR7:[0-9]+]]
+; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <2 x float> @llvm.fma.v2f32(<2 x float> [[V1]], <2 x float> [[B]], <2 x float> [[A]]) #[[ATTR6:[0-9]+]]
 ; CHECK-NEXT:    store <2 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x float> [[FMLA1]]
 ;
@@ -2817,7 +2817,7 @@ define <4 x float> @fmla_indexed_scalar_4s(<4 x float> %a, <4 x float> %b, float
 ; CHECK-NEXT:    [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[C]], i32 1
 ; CHECK-NEXT:    [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[C]], i32 2
 ; CHECK-NEXT:    [[V4:%.*]] = insertelement <4 x float> [[V3]], float [[C]], i32 3
-; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <4 x float> @llvm.fma.v4f32(<4 x float> [[V4]], <4 x float> [[B]], <4 x float> [[A]]) #[[ATTR7]]
+; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <4 x float> @llvm.fma.v4f32(<4 x float> [[V4]], <4 x float> [[B]], <4 x float> [[A]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x float> [[FMLA1]]
 ;
@@ -2837,7 +2837,7 @@ define <2 x double> @fmla_indexed_scalar_2d(<2 x double> %a, <2 x double> %b, do
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[V1:%.*]] = insertelement <2 x double> undef, double [[C]], i32 0
 ; CHECK-NEXT:    [[V2:%.*]] = insertelement <2 x double> [[V1]], double [[C]], i32 1
-; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <2 x double> @llvm.fma.v2f64(<2 x double> [[V2]], <2 x double> [[B]], <2 x double> [[A]]) #[[ATTR7]]
+; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <2 x double> @llvm.fma.v2f64(<2 x double> [[V2]], <2 x double> [[B]], <2 x double> [[A]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x double> [[FMLA1]]
 ;
@@ -2855,7 +2855,7 @@ define <2 x float> @fmls_indexed_2s_strict(<2 x float> %a, <2 x float> %b, <2 x
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = fneg <2 x float> [[C]]
 ; CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x float> [[B]], <2 x float> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[FMLS1:%.*]] = tail call <2 x float> @llvm.experimental.constrained.fma.v2f32(<2 x float> [[TMP0]], <2 x float> [[LANE]], <2 x float> [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR9:[0-9]+]]
+; CHECK-NEXT:    [[FMLS1:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> [[TMP0]], <2 x float> [[LANE]], <2 x float> [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    store <2 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x float> [[FMLS1]]
 ;
@@ -2873,7 +2873,7 @@ define <4 x float> @fmls_indexed_4s_strict(<4 x float> %a, <4 x float> %b, <4 x
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = fneg <4 x float> [[C]]
 ; CHECK-NEXT:    [[LANE:%.*]] = shufflevector <4 x float> [[B]], <4 x float> undef, <4 x i32> zeroinitializer
-; CHECK-NEXT:    [[FMLS1:%.*]] = tail call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> [[TMP0]], <4 x float> [[LANE]], <4 x float> [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR9]]
+; CHECK-NEXT:    [[FMLS1:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP0]], <4 x float> [[LANE]], <4 x float> [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x float> [[FMLS1]]
 ;
@@ -2891,7 +2891,7 @@ define <2 x double> @fmls_indexed_2d_strict(<2 x double> %a, <2 x double> %b, <2
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = fneg <2 x double> [[C]]
 ; CHECK-NEXT:    [[LANE:%.*]] = shufflevector <2 x double> [[B]], <2 x double> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[FMLS1:%.*]] = tail call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> [[TMP0]], <2 x double> [[LANE]], <2 x double> [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR9]]
+; CHECK-NEXT:    [[FMLS1:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[TMP0]], <2 x double> [[LANE]], <2 x double> [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x double> [[FMLS1]]
 ;
@@ -2909,7 +2909,7 @@ define <2 x float> @fmla_indexed_scalar_2s_strict(<2 x float> %a, <2 x float> %b
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[V1:%.*]] = insertelement <2 x float> undef, float [[C]], i32 0
 ; CHECK-NEXT:    [[V2:%.*]] = insertelement <2 x float> [[V1]], float [[C]], i32 1
-; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <2 x float> @llvm.experimental.constrained.fma.v2f32(<2 x float> [[V2]], <2 x float> [[B]], <2 x float> [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR9]]
+; CHECK-NEXT:    [[FMLA1:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> [[V2]], <2 x float> [[B]], <2 x float> [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    store <2 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x float> [[FMLA1]]
 ;
@@ -2929,7 +2929,7 @@ define <4 x float> @fmla_indexed_scalar_4s_strict(<4 x float> %a, <4 x float> %b
 ; CHECK-NEXT:    [[V2:%.*]] = insertelement <4 x float> [[V1]], float [[C]], i32 1
 ; CHECK-NEXT:    [[V3:%.*]] = insertelement <4 x float> [[V2]], float [[C]], i32 2
 ; CHECK-NEXT:    [[V4:%.*]] = insertelement <4 x float> [[V3]], float [[C]], i32 3
-; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> [[V4]], <4 x float> [[B]], <4 x float> [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR9]]
+; CHECK-NEXT:    [[FMLA1:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> [[V4]], <4 x float> [[B]], <4 x float> [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x float> [[FMLA1]]
 ;
@@ -2949,7 +2949,7 @@ define <2 x double> @fmla_indexed_scalar_2d_strict(<2 x double> %a, <2 x double>
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[V1:%.*]] = insertelement <2 x double> undef, double [[C]], i32 0
 ; CHECK-NEXT:    [[V2:%.*]] = insertelement <2 x double> [[V1]], double [[C]], i32 1
-; CHECK-NEXT:    [[FMLA1:%.*]] = tail call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> [[V2]], <2 x double> [[B]], <2 x double> [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR9]]
+; CHECK-NEXT:    [[FMLA1:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[V2]], <2 x double> [[B]], <2 x double> [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x double> [[FMLA1]]
 ;
@@ -3397,7 +3397,7 @@ define <4 x i32> @sqdmull_lane_4s(<4 x i16> %A, <4 x i16> %B) nounwind sanitize_
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB5:.*]], label %[[BB6:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB5]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB6]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[A]], <4 x i16> [[TMP3]])
@@ -3424,7 +3424,7 @@ define <2 x i64> @sqdmull_lane_2d(<2 x i32> %A, <2 x i32> %B) nounwind sanitize_
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB5:.*]], label %[[BB6:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB5]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB6]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[A]], <2 x i32> [[TMP3]])
@@ -3453,7 +3453,7 @@ define <4 x i32> @sqdmull2_lane_4s(<8 x i16> %A, <8 x i16> %B) nounwind sanitize
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB5:.*]], label %[[BB6:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB5]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB6]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -3483,7 +3483,7 @@ define <2 x i64> @sqdmull2_lane_2d(<4 x i32> %A, <4 x i32> %B) nounwind sanitize
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB5:.*]], label %[[BB6:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB5]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB6]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -3640,7 +3640,7 @@ define <4 x i32> @sqdmlal_lane_4s(<4 x i16> %A, <4 x i16> %B, <4 x i32> %C) noun
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[A]], <4 x i16> [[TMP4]])
@@ -3671,7 +3671,7 @@ define <2 x i64> @sqdmlal_lane_2d(<2 x i32> %A, <2 x i32> %B, <2 x i64> %C) noun
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[A]], <2 x i32> [[TMP4]])
@@ -3704,7 +3704,7 @@ define <4 x i32> @sqdmlal2_lane_4s(<8 x i16> %A, <8 x i16> %B, <4 x i32> %C) nou
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -3738,7 +3738,7 @@ define <2 x i64> @sqdmlal2_lane_2d(<4 x i32> %A, <4 x i32> %B, <2 x i64> %C) nou
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -3772,7 +3772,7 @@ define i32 @sqdmlal_lane_1s(i32 %A, i16 %B, <4 x i16> %C) nounwind sanitize_memo
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[PROD_VEC:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[LHS]], <4 x i16> [[RHS]])
@@ -3809,7 +3809,7 @@ define i32 @sqdmlsl_lane_1s(i32 %A, i16 %B, <4 x i16> %C) nounwind sanitize_memo
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[PROD_VEC:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[LHS]], <4 x i16> [[RHS]])
@@ -3842,7 +3842,7 @@ define i32 @sqadd_lane1_sqdmull4s(i32 %A, <4 x i16> %B, <4 x i16> %C) nounwind s
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[PROD_VEC:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[B]], <4 x i16> [[C]])
@@ -3872,7 +3872,7 @@ define i32 @sqsub_lane1_sqdmull4s(i32 %A, <4 x i16> %B, <4 x i16> %C) nounwind s
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[PROD_VEC:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[B]], <4 x i16> [[C]])
@@ -3902,7 +3902,7 @@ define i64 @sqdmlal_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind sanitize_memo
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[PROD:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 [[B]], i32 [[RHS]])
@@ -3933,7 +3933,7 @@ define i64 @sqdmlsl_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind sanitize_memo
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[PROD:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 [[B]], i32 [[RHS]])
@@ -4063,7 +4063,7 @@ define <4 x i32> @sqdmlsl_lane_4s(<4 x i16> %A, <4 x i16> %B, <4 x i32> %C) noun
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[A]], <4 x i16> [[TMP4]])
@@ -4094,7 +4094,7 @@ define <2 x i64> @sqdmlsl_lane_2d(<2 x i32> %A, <2 x i32> %B, <2 x i64> %C) noun
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP2]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[A]], <2 x i32> [[TMP4]])
@@ -4127,7 +4127,7 @@ define <4 x i32> @sqdmlsl2_lane_4s(<8 x i16> %A, <8 x i16> %B, <4 x i32> %C) nou
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -4161,7 +4161,7 @@ define <2 x i64> @sqdmlsl2_lane_2d(<4 x i32> %A, <4 x i32> %B, <2 x i64> %C) nou
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP5:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP2]])
@@ -4234,7 +4234,7 @@ define float @fmulxs(float %a, float %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[_MSPROP:%.*]] = or i32 [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    [[_MSPROP1:%.*]] = or i32 [[_MSPROP]], 0
-; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call float @llvm.aarch64.neon.fmulx.f32(float [[A]], float [[B]]) #[[ATTR7]]
+; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call float @llvm.aarch64.neon.fmulx.f32(float [[A]], float [[B]]) #[[ATTR6]]
 ; CHECK-NEXT:    store i32 [[_MSPROP1]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret float [[FMULX_I]]
 ;
@@ -4250,7 +4250,7 @@ define double @fmulxd(double %a, double %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[_MSPROP:%.*]] = or i64 [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    [[_MSPROP1:%.*]] = or i64 [[_MSPROP]], 0
-; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call double @llvm.aarch64.neon.fmulx.f64(double [[A]], double [[B]]) #[[ATTR7]]
+; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call double @llvm.aarch64.neon.fmulx.f64(double [[A]], double [[B]]) #[[ATTR6]]
 ; CHECK-NEXT:    store i64 [[_MSPROP1]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret double [[FMULX_I]]
 ;
@@ -4268,7 +4268,7 @@ define float @fmulxs_lane(float %a, <4 x float> %vec) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[B:%.*]] = extractelement <4 x float> [[VEC]], i32 3
 ; CHECK-NEXT:    [[_MSPROP1:%.*]] = or i32 [[TMP2]], [[_MSPROP]]
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or i32 [[_MSPROP1]], 0
-; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call float @llvm.aarch64.neon.fmulx.f32(float [[A]], float [[B]]) #[[ATTR7]]
+; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call float @llvm.aarch64.neon.fmulx.f32(float [[A]], float [[B]]) #[[ATTR6]]
 ; CHECK-NEXT:    store i32 [[_MSPROP2]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret float [[FMULX_I]]
 ;
@@ -4287,7 +4287,7 @@ define double @fmulxd_lane(double %a, <2 x double> %vec) nounwind sanitize_memor
 ; CHECK-NEXT:    [[B:%.*]] = extractelement <2 x double> [[VEC]], i32 1
 ; CHECK-NEXT:    [[_MSPROP1:%.*]] = or i64 [[TMP2]], [[_MSPROP]]
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or i64 [[_MSPROP1]], 0
-; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call double @llvm.aarch64.neon.fmulx.f64(double [[A]], double [[B]]) #[[ATTR7]]
+; CHECK-NEXT:    [[FMULX_I:%.*]] = tail call double @llvm.aarch64.neon.fmulx.f64(double [[A]], double [[B]]) #[[ATTR6]]
 ; CHECK-NEXT:    store i64 [[_MSPROP2]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret double [[FMULX_I]]
 ;
@@ -4344,7 +4344,7 @@ define <8 x i16> @foo0(<16 x i8> %a, <16 x i8> %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <8 x i8> [[TMP4]], [[TMP6]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <8 x i8> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <8 x i8> [[_MSPROP3]] to <8 x i16>
-; CHECK-NEXT:    [[VMULL_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.smull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.smull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <8 x i16> [[TMP7]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <8 x i16> [[VMULL_I_I]]
 ;
@@ -4379,7 +4379,7 @@ define <4 x i32> @foo1(<8 x i16> %a, <8 x i16> %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <4 x i16> [[TMP4]], [[TMP6]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <4 x i16> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <4 x i16> [[_MSPROP3]] to <4 x i32>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> [[TMP7]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I_I]]
 ;
@@ -4414,7 +4414,7 @@ define <2 x i64> @foo2(<4 x i32> %a, <4 x i32> %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <2 x i32> [[TMP4]], [[TMP6]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <2 x i32> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <2 x i32> [[_MSPROP3]] to <2 x i64>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> [[TMP7]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I_I]]
 ;
@@ -4449,7 +4449,7 @@ define <8 x i16> @foo3(<16 x i8> %a, <16 x i8> %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <8 x i8> [[TMP4]], [[TMP6]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <8 x i8> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <8 x i8> [[_MSPROP3]] to <8 x i16>
-; CHECK-NEXT:    [[VMULL_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <8 x i16> [[TMP7]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <8 x i16> [[VMULL_I_I]]
 ;
@@ -4484,7 +4484,7 @@ define <4 x i32> @foo4(<8 x i16> %a, <8 x i16> %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <4 x i16> [[TMP4]], [[TMP6]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <4 x i16> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <4 x i16> [[_MSPROP3]] to <4 x i32>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> [[TMP7]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I_I]]
 ;
@@ -4519,7 +4519,7 @@ define <2 x i64> @foo5(<4 x i32> %a, <4 x i32> %b) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <2 x i32> [[TMP4]], [[TMP6]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <2 x i32> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = zext <2 x i32> [[_MSPROP3]] to <2 x i64>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> [[TMP7]], ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I_I]]
 ;
@@ -4535,14 +4535,14 @@ define <2 x i64> @foo5(<4 x i32> %a, <4 x i32> %b) nounwind sanitize_memory {
 
 define <4 x i32> @foo6(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @foo6(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR6:[0-9]+]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR5:[0-9]+]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i16> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> <i32 1>
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <4 x i16>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[C]], <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I]]
 ;
@@ -4557,14 +4557,14 @@ entry:
 
 define <4 x i32> @foo6a(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @foo6a(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i16> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <4 x i16>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[C]], <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I]]
 ;
@@ -4579,14 +4579,14 @@ entry:
 
 define <2 x i64> @foo7(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <2 x i64> @foo7(
-; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> <i32 1>
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <2 x i32>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[C]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I]]
 ;
@@ -4601,14 +4601,14 @@ entry:
 
 define <2 x i64> @foo7a(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <2 x i64> @foo7a(
-; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <2 x i32>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[C]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I]]
 ;
@@ -4624,14 +4624,14 @@ entry:
 
 define <4 x i32> @foo8(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @foo8(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i16> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> <i32 1>
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <4 x i16>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[C]], <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I]]
 ;
@@ -4646,14 +4646,14 @@ entry:
 
 define <4 x i32> @foo8a(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @foo8a(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <8 x i16> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <4 x i16>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <4 x i16> [[C]], <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I]]
 ;
@@ -4668,14 +4668,14 @@ entry:
 
 define <2 x i64> @foo9(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <2 x i64> @foo9(
-; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> <i32 1>
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <2 x i32>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[C]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I]]
 ;
@@ -4690,14 +4690,14 @@ entry:
 
 define <2 x i64> @foo9a(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <2 x i64> @foo9a(
-; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> [[B]] to <2 x i64>
 ; CHECK-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> undef, <1 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I]] to <2 x i32>
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[C]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
-; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I]]
 ;
@@ -4732,7 +4732,7 @@ define <8 x i16> @bar0(<8 x i16> %a, <16 x i8> %b, <16 x i8> %c) nounwind saniti
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <8 x i8> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <8 x i8> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <8 x i8> [[_MSPROP3]] to <8 x i16>
-; CHECK-NEXT:    [[VMULL_I_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.smull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL_I_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.smull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <8 x i16> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD_I:%.*]] = add <8 x i16> [[VMULL_I_I_I]], [[A]]
 ; CHECK-NEXT:    store <8 x i16> [[_MSPROP4]], ptr @__msan_retval_tls, align 8
@@ -4771,7 +4771,7 @@ define <4 x i32> @bar1(<4 x i32> %a, <8 x i16> %b, <8 x i16> %c) nounwind saniti
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <4 x i16> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <4 x i16> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <4 x i16> [[_MSPROP3]] to <4 x i32>
-; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <4 x i32> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD_I:%.*]] = add <4 x i32> [[VMULL2_I_I_I]], [[A]]
 ; CHECK-NEXT:    store <4 x i32> [[_MSPROP4]], ptr @__msan_retval_tls, align 8
@@ -4810,7 +4810,7 @@ define <2 x i64> @bar2(<2 x i64> %a, <4 x i32> %b, <4 x i32> %c) nounwind saniti
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <2 x i32> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <2 x i32> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <2 x i32> [[_MSPROP3]] to <2 x i64>
-; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <2 x i64> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD_I:%.*]] = add <2 x i64> [[VMULL2_I_I_I]], [[A]]
 ; CHECK-NEXT:    store <2 x i64> [[_MSPROP4]], ptr @__msan_retval_tls, align 8
@@ -4849,7 +4849,7 @@ define <8 x i16> @bar3(<8 x i16> %a, <16 x i8> %b, <16 x i8> %c) nounwind saniti
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <8 x i8> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <8 x i8> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <8 x i8> [[_MSPROP3]] to <8 x i16>
-; CHECK-NEXT:    [[VMULL_I_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL_I_I_I:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.umull.v8i16(<8 x i8> [[TMP1]], <8 x i8> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <8 x i16> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD_I:%.*]] = add <8 x i16> [[VMULL_I_I_I]], [[A]]
 ; CHECK-NEXT:    store <8 x i16> [[_MSPROP4]], ptr @__msan_retval_tls, align 8
@@ -4888,7 +4888,7 @@ define <4 x i32> @bar4(<4 x i32> %a, <8 x i16> %b, <8 x i16> %c) nounwind saniti
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <4 x i16> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <4 x i16> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <4 x i16> [[_MSPROP3]] to <4 x i32>
-; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <4 x i32> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD_I:%.*]] = add <4 x i32> [[VMULL2_I_I_I]], [[A]]
 ; CHECK-NEXT:    store <4 x i32> [[_MSPROP4]], ptr @__msan_retval_tls, align 8
@@ -4927,7 +4927,7 @@ define <2 x i64> @bar5(<2 x i64> %a, <4 x i32> %b, <4 x i32> %c) nounwind saniti
 ; CHECK-NEXT:    [[_MSPROP2:%.*]] = or <2 x i32> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <2 x i32> [[_MSPROP2]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <2 x i32> [[_MSPROP3]] to <2 x i64>
-; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <2 x i64> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD_I:%.*]] = add <2 x i64> [[VMULL2_I_I_I]], [[A]]
 ; CHECK-NEXT:    store <2 x i64> [[_MSPROP4]], ptr @__msan_retval_tls, align 8
@@ -4968,7 +4968,7 @@ define <4 x i32> @mlal2_1(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c) nounwind san
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <4 x i16> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <4 x i16> [[_MSPROP3]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <4 x i16> [[_MSPROP4]] to <4 x i32>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP5:%.*]] = or <4 x i32> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add <4 x i32> [[VMULL2_I_I]], [[A]]
 ; CHECK-NEXT:    store <4 x i32> [[_MSPROP5]], ptr @__msan_retval_tls, align 8
@@ -5010,7 +5010,7 @@ define <2 x i64> @mlal2_2(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c) nounwind san
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <2 x i32> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <2 x i32> [[_MSPROP3]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <2 x i32> [[_MSPROP4]] to <2 x i64>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP5:%.*]] = or <2 x i64> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add <2 x i64> [[VMULL2_I_I]], [[A]]
 ; CHECK-NEXT:    store <2 x i64> [[_MSPROP5]], ptr @__msan_retval_tls, align 8
@@ -5052,7 +5052,7 @@ define <4 x i32> @mlal2_4(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c) nounwind san
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <4 x i16> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <4 x i16> [[_MSPROP3]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <4 x i16> [[_MSPROP4]] to <4 x i32>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP5:%.*]] = or <4 x i32> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add <4 x i32> [[VMULL2_I_I]], [[A]]
 ; CHECK-NEXT:    store <4 x i32> [[_MSPROP5]], ptr @__msan_retval_tls, align 8
@@ -5094,7 +5094,7 @@ define <2 x i64> @mlal2_5(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c) nounwind san
 ; CHECK-NEXT:    [[_MSPROP3:%.*]] = or <2 x i32> [[TMP5]], [[TMP7]]
 ; CHECK-NEXT:    [[_MSPROP4:%.*]] = or <2 x i32> [[_MSPROP3]], zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = zext <2 x i32> [[_MSPROP4]] to <2 x i64>
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[TMP3]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[_MSPROP5:%.*]] = or <2 x i64> [[TMP8]], [[TMP11]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add <2 x i64> [[VMULL2_I_I]], [[A]]
 ; CHECK-NEXT:    store <2 x i64> [[_MSPROP5]], ptr @__msan_retval_tls, align 8
@@ -5312,7 +5312,7 @@ entry:
 
 define <4 x i32> @vmull_low_n_s16_test(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c, i32 %d) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @vmull_low_n_s16_test(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[CONV:%.*]] = trunc i32 [[D]] to i16
@@ -5323,7 +5323,7 @@ define <4 x i32> @vmull_low_n_s16_test(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c,
 ; CHECK-NEXT:    [[VECINIT1_I:%.*]] = insertelement <4 x i16> [[VECINIT_I]], i16 [[CONV]], i32 1
 ; CHECK-NEXT:    [[VECINIT2_I:%.*]] = insertelement <4 x i16> [[VECINIT1_I]], i16 [[CONV]], i32 2
 ; CHECK-NEXT:    [[VECINIT3_I:%.*]] = insertelement <4 x i16> [[VECINIT2_I]], i16 [[CONV]], i32 3
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[VECINIT3_I]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[VECINIT3_I]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I_I]]
 ;
@@ -5342,7 +5342,7 @@ entry:
 
 define <4 x i32> @vmull_high_n_s16_test(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c, i32 %d) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @vmull_high_n_s16_test(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[CONV:%.*]] = trunc i32 [[D]] to i16
@@ -5353,7 +5353,7 @@ define <4 x i32> @vmull_high_n_s16_test(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c
 ; CHECK-NEXT:    [[VECINIT1_I:%.*]] = insertelement <4 x i16> [[VECINIT_I]], i16 [[CONV]], i32 1
 ; CHECK-NEXT:    [[VECINIT2_I:%.*]] = insertelement <4 x i16> [[VECINIT1_I]], i16 [[CONV]], i32 2
 ; CHECK-NEXT:    [[VECINIT3_I:%.*]] = insertelement <4 x i16> [[VECINIT2_I]], i16 [[CONV]], i32 3
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[VECINIT3_I]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.smull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[VECINIT3_I]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I_I]]
 ;
@@ -5372,7 +5372,7 @@ entry:
 
 define <2 x i64> @vmull_high_n_s32_test(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c, i32 %d) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <2 x i64> @vmull_high_n_s32_test(
-; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> [[B]] to <2 x i64>
@@ -5380,7 +5380,7 @@ define <2 x i64> @vmull_high_n_s32_test(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I_I]] to <2 x i32>
 ; CHECK-NEXT:    [[VECINIT_I:%.*]] = insertelement <2 x i32> undef, i32 [[D]], i32 0
 ; CHECK-NEXT:    [[VECINIT1_I:%.*]] = insertelement <2 x i32> [[VECINIT_I]], i32 [[D]], i32 1
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[VECINIT1_I]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.smull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[VECINIT1_I]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I_I]]
 ;
@@ -5396,7 +5396,7 @@ entry:
 
 define <4 x i32> @vmull_high_n_u16_test(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c, i32 %d) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <4 x i32> @vmull_high_n_u16_test(
-; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <4 x i32> [[A:%.*]], <8 x i16> [[B:%.*]], <4 x i16> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[CONV:%.*]] = trunc i32 [[D]] to i16
@@ -5407,7 +5407,7 @@ define <4 x i32> @vmull_high_n_u16_test(<4 x i32> %a, <8 x i16> %b, <4 x i16> %c
 ; CHECK-NEXT:    [[VECINIT1_I:%.*]] = insertelement <4 x i16> [[VECINIT_I]], i16 [[CONV]], i32 1
 ; CHECK-NEXT:    [[VECINIT2_I:%.*]] = insertelement <4 x i16> [[VECINIT1_I]], i16 [[CONV]], i32 2
 ; CHECK-NEXT:    [[VECINIT3_I:%.*]] = insertelement <4 x i16> [[VECINIT2_I]], i16 [[CONV]], i32 3
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[VECINIT3_I]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.umull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[VECINIT3_I]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <4 x i32> [[VMULL2_I_I]]
 ;
@@ -5426,7 +5426,7 @@ entry:
 
 define <2 x i64> @vmull_high_n_u32_test(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c, i32 %d) nounwind readnone optsize ssp {
 ; CHECK-LABEL: define <2 x i64> @vmull_high_n_u32_test(
-; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: <2 x i64> [[A:%.*]], <4 x i32> [[B:%.*]], <2 x i32> [[C:%.*]], i32 [[D:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <4 x i32> [[B]] to <2 x i64>
@@ -5434,7 +5434,7 @@ define <2 x i64> @vmull_high_n_u32_test(<2 x i64> %a, <4 x i32> %b, <2 x i32> %c
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[SHUFFLE_I_I]] to <2 x i32>
 ; CHECK-NEXT:    [[VECINIT_I:%.*]] = insertelement <2 x i32> undef, i32 [[D]], i32 0
 ; CHECK-NEXT:    [[VECINIT1_I:%.*]] = insertelement <2 x i32> [[VECINIT_I]], i32 [[D]], i32 1
-; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[VECINIT1_I]]) #[[ATTR7]]
+; CHECK-NEXT:    [[VMULL2_I_I:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[TMP1]], <2 x i32> [[VECINIT1_I]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[VMULL2_I_I]]
 ;
@@ -5528,7 +5528,7 @@ define <2 x i64> @mull_from_two_extracts(<4 x i32> %lhs, <4 x i32> %rhs) {
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[RES]]
 ;
@@ -5545,7 +5545,7 @@ define <2 x i64> @mlal_from_two_extracts(<2 x i64> %accum, <4 x i32> %lhs, <4 x
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[SUM:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> [[ACCUM]], <2 x i64> [[RES]])
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[SUM]]
@@ -5565,7 +5565,7 @@ define <2 x i64> @mull_from_extract_dup_low(<4 x i32> %lhs, i32 %rhs) {
 ; CHECK-NEXT:    [[RHSVEC_TMP:%.*]] = insertelement <2 x i32> undef, i32 [[RHS]], i32 0
 ; CHECK-NEXT:    [[RHSVEC:%.*]] = insertelement <2 x i32> [[RHSVEC_TMP]], i32 [[RHS]], i32 1
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHSVEC]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHSVEC]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[RES]]
 ;
@@ -5585,7 +5585,7 @@ define <2 x i64> @mull_from_extract_dup_high(<4 x i32> %lhs, i32 %rhs) {
 ; CHECK-NEXT:    [[RHSVEC_TMP:%.*]] = insertelement <2 x i32> undef, i32 [[RHS]], i32 0
 ; CHECK-NEXT:    [[RHSVEC:%.*]] = insertelement <2 x i32> [[RHSVEC_TMP]], i32 [[RHS]], i32 1
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHSVEC]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHSVEC]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[RES]]
 ;
@@ -5605,7 +5605,7 @@ define <8 x i16> @pmull_from_extract_dup_low(<16 x i8> %lhs, i8 %rhs) {
 ; CHECK-NEXT:    [[RHSVEC_0:%.*]] = insertelement <8 x i8> undef, i8 [[RHS]], i32 0
 ; CHECK-NEXT:    [[RHSVEC:%.*]] = shufflevector <8 x i8> [[RHSVEC_0]], <8 x i8> undef, <8 x i32> zeroinitializer
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <16 x i8> [[LHS]], <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
-; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHSVEC]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHSVEC]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <8 x i16> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <8 x i16> [[RES]]
 ;
@@ -5625,7 +5625,7 @@ define <8 x i16> @pmull_from_extract_dup_high(<16 x i8> %lhs, i8 %rhs) {
 ; CHECK-NEXT:    [[RHSVEC_0:%.*]] = insertelement <8 x i8> undef, i8 [[RHS]], i32 0
 ; CHECK-NEXT:    [[RHSVEC:%.*]] = shufflevector <8 x i8> [[RHSVEC_0]], <8 x i8> undef, <8 x i32> zeroinitializer
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <16 x i8> [[LHS]], <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
-; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHSVEC]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHSVEC]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <8 x i16> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <8 x i16> [[RES]]
 ;
@@ -5644,7 +5644,7 @@ define <8 x i16> @pmull_from_extract_duplane_low(<16 x i8> %lhs, <8 x i8> %rhs)
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <16 x i8> [[LHS]], <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <8 x i8> [[RHS]], <8 x i8> undef, <8 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <8 x i16> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <8 x i16> [[RES]]
 ;
@@ -5661,7 +5661,7 @@ define <8 x i16> @pmull_from_extract_duplane_high(<16 x i8> %lhs, <8 x i8> %rhs)
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <16 x i8> [[LHS]], <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <8 x i8> [[RHS]], <8 x i8> undef, <8 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> [[LHS_HIGH]], <8 x i8> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <8 x i16> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <8 x i16> [[RES]]
 ;
@@ -5678,7 +5678,7 @@ define <2 x i64> @sqdmull_from_extract_duplane_low(<4 x i32> %lhs, <4 x i32> %rh
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[RES]]
 ;
@@ -5695,7 +5695,7 @@ define <2 x i64> @sqdmull_from_extract_duplane_high(<4 x i32> %lhs, <4 x i32> %r
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[RES]]
 ;
@@ -5712,7 +5712,7 @@ define <2 x i64> @sqdmlal_from_extract_duplane_low(<2 x i64> %accum, <4 x i32> %
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[SUM:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> [[ACCUM]], <2 x i64> [[RES]])
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[SUM]]
@@ -5731,7 +5731,7 @@ define <2 x i64> @sqdmlal_from_extract_duplane_high(<2 x i64> %accum, <4 x i32>
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[SUM:%.*]] = call <2 x i64> @llvm.aarch64.neon.sqadd.v2i64(<2 x i64> [[ACCUM]], <2 x i64> [[RES]])
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[SUM]]
@@ -5750,7 +5750,7 @@ define <2 x i64> @umlal_from_extract_duplane_low(<2 x i64> %accum, <4 x i32> %lh
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 0, i32 1>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[SUM:%.*]] = add <2 x i64> [[ACCUM]], [[RES]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[SUM]]
@@ -5769,7 +5769,7 @@ define <2 x i64> @umlal_from_extract_duplane_high(<2 x i64> %accum, <4 x i32> %l
 ; CHECK-NEXT:    call void @llvm.donothing()
 ; CHECK-NEXT:    [[LHS_HIGH:%.*]] = shufflevector <4 x i32> [[LHS]], <4 x i32> undef, <2 x i32> <i32 2, i32 3>
 ; CHECK-NEXT:    [[RHS_HIGH:%.*]] = shufflevector <4 x i32> [[RHS]], <4 x i32> undef, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR7]]
+; CHECK-NEXT:    [[RES:%.*]] = tail call <2 x i64> @llvm.aarch64.neon.umull.v2i64(<2 x i32> [[LHS_HIGH]], <2 x i32> [[RHS_HIGH]]) #[[ATTR6]]
 ; CHECK-NEXT:    [[SUM:%.*]] = add <2 x i64> [[ACCUM]], [[RES]]
 ; CHECK-NEXT:    store <2 x i64> zeroinitializer, ptr @__msan_retval_tls, align 8
 ; CHECK-NEXT:    ret <2 x i64> [[SUM]]
@@ -6004,7 +6004,7 @@ define i32 @sqdmlal_s(i16 %A, i16 %B, i32 %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -6034,7 +6034,7 @@ define i64 @sqdmlal_d(i32 %A, i32 %B, i64 %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 [[A]], i32 [[B]])
@@ -6066,7 +6066,7 @@ define i32 @sqdmlsl_s(i16 %A, i16 %B, i32 %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP3]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB6:.*]], label %[[BB7:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB6]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB7]]:
 ; CHECK-NEXT:    [[TMP3:%.*]] = tail call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> [[TMP1]], <4 x i16> [[TMP2]])
@@ -6096,7 +6096,7 @@ define i64 @sqdmlsl_d(i32 %A, i32 %B, i64 %C) nounwind sanitize_memory {
 ; CHECK-NEXT:    [[_MSOR:%.*]] = or i1 [[_MSCMP]], [[_MSCMP1]]
 ; CHECK-NEXT:    br i1 [[_MSOR]], label %[[BB4:.*]], label %[[BB5:.*]], !prof [[PROF1]]
 ; CHECK:       [[BB4]]:
-; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR8]]
+; CHECK-NEXT:    call void @__msan_warning_noreturn() #[[ATTR7]]
 ; CHECK-NEXT:    unreachable
 ; CHECK:       [[BB5]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 [[A]], i32 [[B]])
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll
index 223ba631b354d..486df75703ced 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=ALL,CI %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=ALL,GFX9 %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=ALL,GFX908 %s
@@ -7,6 +7,7 @@
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=ALL,GFX11 %s
 
 define void @test_atomicrmw_fadd_f32_global_no_use_unsafe(ptr addrspace(1) %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -23,6 +24,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe(ptr addrspace(1) %ptr,
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -39,6 +41,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe(ptr addrspace(1) %ptr,
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -55,6 +58,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe(ptr addrspace(1) %ptr,
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe(
 ; GFX90A-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX90A-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -71,10 +75,12 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe(ptr addrspace(1) %ptr,
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -96,6 +102,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe(ptr addrspace(1) %ptr,
 }
 
 define void @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(ptr addrspace(7) %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -112,6 +119,7 @@ define void @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(ptr addrspace(
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -128,6 +136,7 @@ define void @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(ptr addrspace(
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -144,6 +153,7 @@ define void @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(ptr addrspace(
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(
 ; GFX90A-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; GFX90A-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -160,10 +170,12 @@ define void @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(ptr addrspace(
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(7) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -185,6 +197,7 @@ define void @test_atomicrmw_fadd_f32_buffer_fat_ptr_no_use_unsafe(ptr addrspace(
 }
 
 define void @test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_as999_no_use_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -201,6 +214,7 @@ define void @test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_as999_no_use_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -217,6 +231,7 @@ define void @test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_as999_no_use_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -233,6 +248,7 @@ define void @test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_as999_no_use_unsafe(
 ; GFX90A-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; GFX90A-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -249,10 +265,12 @@ define void @test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_as999_no_use_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(999) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_as999_no_use_unsafe(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -274,6 +292,7 @@ define void @test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr
 }
 
 define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -290,6 +309,7 @@ define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, float
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret float [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -306,6 +326,7 @@ define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, float
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret float [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -322,14 +343,17 @@ define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, float
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret float [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0:![0-9]+]]
 ; GFX90A-NEXT:    ret float [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0:![0-9]+]]
 ; GFX942-NEXT:    ret float [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0:![0-9]+]]
 ; GFX11-NEXT:    ret float [[RES]]
@@ -339,6 +363,7 @@ define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, float
 }
 
 define float @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(ptr addrspace(7) %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -355,6 +380,7 @@ define float @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(ptr addrspace(7) %pt
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret float [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -371,6 +397,7 @@ define float @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(ptr addrspace(7) %pt
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret float [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(7) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -387,14 +414,17 @@ define float @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(ptr addrspace(7) %pt
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret float [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(7) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX90A-NEXT:    ret float [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(7) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret float [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(7) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX11-NEXT:    ret float [[RES]]
@@ -404,6 +434,7 @@ define float @test_atomicrmw_fadd_f32_buffer_fat_ptr_unsafe(ptr addrspace(7) %pt
 }
 
 define float @test_atomicrmw_fadd_f32_as999_unsafe(ptr addrspace(999) %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_as999_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -420,6 +451,7 @@ define float @test_atomicrmw_fadd_f32_as999_unsafe(ptr addrspace(999) %ptr, floa
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret float [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_as999_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -436,6 +468,7 @@ define float @test_atomicrmw_fadd_f32_as999_unsafe(ptr addrspace(999) %ptr, floa
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret float [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_as999_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(999) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -452,14 +485,17 @@ define float @test_atomicrmw_fadd_f32_as999_unsafe(ptr addrspace(999) %ptr, floa
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret float [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_as999_unsafe(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(999) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX90A-NEXT:    ret float [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_as999_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(999) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret float [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_as999_unsafe(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(999) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX11-NEXT:    ret float [[RES]]
@@ -469,6 +505,7 @@ define float @test_atomicrmw_fadd_f32_as999_unsafe(ptr addrspace(999) %ptr, floa
 }
 
 define double @test_atomicrmw_fadd_f64_global_unsafe(ptr addrspace(1) %ptr, double %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_global_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -485,6 +522,7 @@ define double @test_atomicrmw_fadd_f64_global_unsafe(ptr addrspace(1) %ptr, doub
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_global_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -501,6 +539,7 @@ define double @test_atomicrmw_fadd_f64_global_unsafe(ptr addrspace(1) %ptr, doub
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_global_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -517,14 +556,17 @@ define double @test_atomicrmw_fadd_f64_global_unsafe(ptr addrspace(1) %ptr, doub
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_global_unsafe(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VALUE:%.*]] syncscope("wavefront") monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX90A-NEXT:    ret double [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_global_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VALUE:%.*]] syncscope("wavefront") monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret double [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_global_unsafe(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -546,6 +588,7 @@ define double @test_atomicrmw_fadd_f64_global_unsafe(ptr addrspace(1) %ptr, doub
 }
 
 define float @test_atomicrmw_fadd_f32_flat_unsafe(ptr %ptr, float %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_flat_unsafe(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -562,6 +605,7 @@ define float @test_atomicrmw_fadd_f32_flat_unsafe(ptr %ptr, float %value) #3 {
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret float [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_flat_unsafe(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -578,6 +622,7 @@ define float @test_atomicrmw_fadd_f32_flat_unsafe(ptr %ptr, float %value) #3 {
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret float [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_flat_unsafe(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -594,6 +639,7 @@ define float @test_atomicrmw_fadd_f32_flat_unsafe(ptr %ptr, float %value) #3 {
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret float [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_flat_unsafe(
 ; GFX90A-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX90A-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -620,10 +666,12 @@ define float @test_atomicrmw_fadd_f32_flat_unsafe(ptr %ptr, float %value) #3 {
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret float [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_flat_unsafe(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !noalias.addrspace [[META1:![0-9]+]], !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret float [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_flat_unsafe(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !noalias.addrspace [[META1:![0-9]+]], !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX11-NEXT:    ret float [[RES]]
@@ -633,6 +681,7 @@ define float @test_atomicrmw_fadd_f32_flat_unsafe(ptr %ptr, float %value) #3 {
 }
 
 define double @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(ptr %ptr, double %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -649,6 +698,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(ptr %ptr, double %
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -665,6 +715,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(ptr %ptr, double %
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -681,14 +732,17 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(ptr %ptr, double %
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr [[PTR:%.*]], double [[VALUE:%.*]] syncscope("wavefront") monotonic, align 8, !noalias.addrspace [[META1]], !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX90A-NEXT:    ret double [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr [[PTR:%.*]], double [[VALUE:%.*]] syncscope("wavefront") monotonic, align 8, !noalias.addrspace [[META1]], !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret double [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -710,6 +764,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe__noprivate(ptr %ptr, double %
 }
 
 define double @test_atomicrmw_fadd_f64_flat_unsafe(ptr %ptr, double %value) #3 {
+; CI: Function Attrs: denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe(
 ; CI-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; CI-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -740,6 +795,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe(ptr %ptr, double %value) #3 {
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[RES]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe(
 ; GFX9-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX9-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -770,6 +826,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe(ptr %ptr, double %value) #3 {
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[RES]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe(
 ; GFX908-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX908-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -800,6 +857,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe(ptr %ptr, double %value) #3 {
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[RES]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe(
 ; GFX90A-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX90A-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -826,6 +884,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe(ptr %ptr, double %value) #3 {
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret double [[RES]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe(
 ; GFX942-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX942-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -852,6 +911,7 @@ define double @test_atomicrmw_fadd_f64_flat_unsafe(ptr %ptr, double %value) #3 {
 ; GFX942:       atomicrmw.end:
 ; GFX942-NEXT:    ret double [[RES]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_flat_unsafe(
 ; GFX11-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX11-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -992,6 +1052,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_ieee(ptr addrspace(1) %ptr, f
 }
 
 define void @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(ptr addrspace(1) %ptr, float %value) #4 {
+; CI: Function Attrs: denormal_fpenv(float: dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -1008,6 +1069,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(ptr addrspace(1)
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(float: dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -1024,6 +1086,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(ptr addrspace(1)
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(float: dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -1040,6 +1103,7 @@ define void @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(ptr addrspace(1)
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(float: dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(
 ; GFX90A-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX90A-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -1056,10 +1120,12 @@ define void @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(ptr addrspace(1)
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(float: dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(float: dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_global_no_use_denorm_flush(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX11-NEXT:    ret void
@@ -1673,12 +1739,13 @@ define float @test_atomicrmw_fadd_f32_global_one_as(ptr addrspace(1) %ptr, float
 }
 
 define void @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(ptr addrspace(1) %ptr, float %value) #1 {
+; CI: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; CI:       atomicrmw.start:
 ; CI-NEXT:    [[LOADED:%.*]] = phi float [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; CI-NEXT:    [[NEW:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[LOADED]], float [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8:[0-9]+]]
+; CI-NEXT:    [[NEW:%.*]] = fadd float [[LOADED]], [[VALUE:%.*]]
 ; CI-NEXT:    [[TMP2:%.*]] = bitcast float [[NEW]] to i32
 ; CI-NEXT:    [[TMP3:%.*]] = bitcast float [[LOADED]] to i32
 ; CI-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i32 [[TMP3]], i32 [[TMP2]] syncscope("wavefront") monotonic monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
@@ -1689,12 +1756,13 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(ptr addrspace
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GFX9:       atomicrmw.start:
 ; GFX9-NEXT:    [[LOADED:%.*]] = phi float [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; GFX9-NEXT:    [[NEW:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[LOADED]], float [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8:[0-9]+]]
+; GFX9-NEXT:    [[NEW:%.*]] = fadd float [[LOADED]], [[VALUE:%.*]]
 ; GFX9-NEXT:    [[TMP2:%.*]] = bitcast float [[NEW]] to i32
 ; GFX9-NEXT:    [[TMP3:%.*]] = bitcast float [[LOADED]] to i32
 ; GFX9-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i32 [[TMP3]], i32 [[TMP2]] syncscope("wavefront") monotonic monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
@@ -1705,18 +1773,22 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(ptr addrspace
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(
 ; GFX908-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX11-NEXT:    ret void
@@ -1726,12 +1798,13 @@ define void @test_atomicrmw_fadd_f32_global_no_use_unsafe_strictfp(ptr addrspace
 }
 
 define double @test_atomicrmw_fadd_f64_global_unsafe_strictfp(ptr addrspace(1) %ptr, double %value) #1 {
+; CI: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_global_unsafe_strictfp(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; CI:       atomicrmw.start:
 ; CI-NEXT:    [[LOADED:%.*]] = phi double [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; CI-NEXT:    [[NEW:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[LOADED]], double [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; CI-NEXT:    [[NEW:%.*]] = fadd double [[LOADED]], [[VALUE:%.*]]
 ; CI-NEXT:    [[TMP2:%.*]] = bitcast double [[NEW]] to i64
 ; CI-NEXT:    [[TMP3:%.*]] = bitcast double [[LOADED]] to i64
 ; CI-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i64 [[TMP3]], i64 [[TMP2]] syncscope("wavefront") monotonic monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
@@ -1742,12 +1815,13 @@ define double @test_atomicrmw_fadd_f64_global_unsafe_strictfp(ptr addrspace(1) %
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[TMP5]]
 ;
+; GFX9: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_global_unsafe_strictfp(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GFX9:       atomicrmw.start:
 ; GFX9-NEXT:    [[LOADED:%.*]] = phi double [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; GFX9-NEXT:    [[NEW:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[LOADED]], double [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; GFX9-NEXT:    [[NEW:%.*]] = fadd double [[LOADED]], [[VALUE:%.*]]
 ; GFX9-NEXT:    [[TMP2:%.*]] = bitcast double [[NEW]] to i64
 ; GFX9-NEXT:    [[TMP3:%.*]] = bitcast double [[LOADED]] to i64
 ; GFX9-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i64 [[TMP3]], i64 [[TMP2]] syncscope("wavefront") monotonic monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
@@ -1758,12 +1832,13 @@ define double @test_atomicrmw_fadd_f64_global_unsafe_strictfp(ptr addrspace(1) %
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[TMP5]]
 ;
+; GFX908: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_global_unsafe_strictfp(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GFX908:       atomicrmw.start:
 ; GFX908-NEXT:    [[LOADED:%.*]] = phi double [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; GFX908-NEXT:    [[NEW:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[LOADED]], double [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8:[0-9]+]]
+; GFX908-NEXT:    [[NEW:%.*]] = fadd double [[LOADED]], [[VALUE:%.*]]
 ; GFX908-NEXT:    [[TMP2:%.*]] = bitcast double [[NEW]] to i64
 ; GFX908-NEXT:    [[TMP3:%.*]] = bitcast double [[LOADED]] to i64
 ; GFX908-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i64 [[TMP3]], i64 [[TMP2]] syncscope("wavefront") monotonic monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
@@ -1774,20 +1849,23 @@ define double @test_atomicrmw_fadd_f64_global_unsafe_strictfp(ptr addrspace(1) %
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[TMP5]]
 ;
+; GFX90A: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_global_unsafe_strictfp(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VALUE:%.*]] syncscope("wavefront") monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX90A-NEXT:    ret double [[RES]]
 ;
+; GFX942: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_global_unsafe_strictfp(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], double [[VALUE:%.*]] syncscope("wavefront") monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
 ; GFX942-NEXT:    ret double [[RES]]
 ;
+; GFX11: Function Attrs: strictfp denormal_fpenv(float: preservesign)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_global_unsafe_strictfp(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GFX11:       atomicrmw.start:
 ; GFX11-NEXT:    [[LOADED:%.*]] = phi double [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; GFX11-NEXT:    [[NEW:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[LOADED]], double [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8:[0-9]+]]
+; GFX11-NEXT:    [[NEW:%.*]] = fadd double [[LOADED]], [[VALUE:%.*]]
 ; GFX11-NEXT:    [[TMP2:%.*]] = bitcast double [[NEW]] to i64
 ; GFX11-NEXT:    [[TMP3:%.*]] = bitcast double [[LOADED]] to i64
 ; GFX11-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i64 [[TMP3]], i64 [[TMP2]] syncscope("wavefront") monotonic monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]]
@@ -1803,12 +1881,13 @@ define double @test_atomicrmw_fadd_f64_global_unsafe_strictfp(ptr addrspace(1) %
 }
 
 define float @test_atomicrmw_fadd_f32_local_strictfp(ptr addrspace(3) %ptr, float %value) #2 {
+; CI: Function Attrs: strictfp
 ; CI-LABEL: @test_atomicrmw_fadd_f32_local_strictfp(
 ; CI-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(3) [[PTR:%.*]], align 4
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; CI:       atomicrmw.start:
 ; CI-NEXT:    [[LOADED:%.*]] = phi float [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; CI-NEXT:    [[NEW:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[LOADED]], float [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; CI-NEXT:    [[NEW:%.*]] = fadd float [[LOADED]], [[VALUE:%.*]]
 ; CI-NEXT:    [[TMP2:%.*]] = bitcast float [[NEW]] to i32
 ; CI-NEXT:    [[TMP3:%.*]] = bitcast float [[LOADED]] to i32
 ; CI-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(3) [[PTR]], i32 [[TMP3]], i32 [[TMP2]] seq_cst seq_cst, align 4
@@ -1819,22 +1898,27 @@ define float @test_atomicrmw_fadd_f32_local_strictfp(ptr addrspace(3) %ptr, floa
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret float [[TMP5]]
 ;
+; GFX9: Function Attrs: strictfp
 ; GFX9-LABEL: @test_atomicrmw_fadd_f32_local_strictfp(
 ; GFX9-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4
 ; GFX9-NEXT:    ret float [[RES]]
 ;
+; GFX908: Function Attrs: strictfp
 ; GFX908-LABEL: @test_atomicrmw_fadd_f32_local_strictfp(
 ; GFX908-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4
 ; GFX908-NEXT:    ret float [[RES]]
 ;
+; GFX90A: Function Attrs: strictfp
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_local_strictfp(
 ; GFX90A-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4
 ; GFX90A-NEXT:    ret float [[RES]]
 ;
+; GFX942: Function Attrs: strictfp
 ; GFX942-LABEL: @test_atomicrmw_fadd_f32_local_strictfp(
 ; GFX942-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4
 ; GFX942-NEXT:    ret float [[RES]]
 ;
+; GFX11: Function Attrs: strictfp
 ; GFX11-LABEL: @test_atomicrmw_fadd_f32_local_strictfp(
 ; GFX11-NEXT:    [[RES:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], float [[VALUE:%.*]] seq_cst, align 4
 ; GFX11-NEXT:    ret float [[RES]]
@@ -2029,8 +2113,9 @@ define bfloat @test_atomicrmw_fadd_bf16_global_system_align4(ptr addrspace(1) %p
 }
 
 define bfloat @test_atomicrmw_fadd_bf16_local_strictfp(ptr addrspace(3) %ptr, bfloat %value) #2 {
+; ALL: Function Attrs: strictfp
 ; ALL-LABEL: @test_atomicrmw_fadd_bf16_local_strictfp(
-; ALL-NEXT:    [[ALIGNEDADDR:%.*]] = call ptr addrspace(3) @llvm.ptrmask.p3.i32(ptr addrspace(3) [[PTR:%.*]], i32 -4) #[[ATTR8:[0-9]+]]
+; ALL-NEXT:    [[ALIGNEDADDR:%.*]] = call ptr addrspace(3) @llvm.ptrmask.p3.i32(ptr addrspace(3) [[PTR:%.*]], i32 -4) #[[ATTR7:[0-9]+]]
 ; ALL-NEXT:    [[TMP1:%.*]] = ptrtoint ptr addrspace(3) [[PTR]] to i32
 ; ALL-NEXT:    [[PTRLSB:%.*]] = and i32 [[TMP1]], 3
 ; ALL-NEXT:    [[TMP2:%.*]] = shl i32 [[PTRLSB]], 3
@@ -2043,7 +2128,7 @@ define bfloat @test_atomicrmw_fadd_bf16_local_strictfp(ptr addrspace(3) %ptr, bf
 ; ALL-NEXT:    [[SHIFTED:%.*]] = lshr i32 [[LOADED]], [[TMP2]]
 ; ALL-NEXT:    [[EXTRACTED:%.*]] = trunc i32 [[SHIFTED]] to i16
 ; ALL-NEXT:    [[TMP4:%.*]] = bitcast i16 [[EXTRACTED]] to bfloat
-; ALL-NEXT:    [[NEW:%.*]] = call bfloat @llvm.experimental.constrained.fadd.bf16(bfloat [[TMP4]], bfloat [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR8]]
+; ALL-NEXT:    [[NEW:%.*]] = fadd bfloat [[TMP4]], [[VALUE:%.*]]
 ; ALL-NEXT:    [[TMP5:%.*]] = bitcast bfloat [[NEW]] to i16
 ; ALL-NEXT:    [[EXTENDED:%.*]] = zext i16 [[TMP5]] to i32
 ; ALL-NEXT:    [[SHIFTED1:%.*]] = shl nuw i32 [[EXTENDED]], [[TMP2]]
@@ -2272,6 +2357,7 @@ define float @test_atomicrmw_fadd_f32_global_system_ret__amdgpu_ignore_denormal_
 }
 
 define void @test_atomicrmw_fadd_f32_daz_global_system_noret(ptr addrspace(1) %ptr, float %value) #3 {
+; ALL: Function Attrs: denormal_fpenv(float: preservesign)
 ; ALL-LABEL: @test_atomicrmw_fadd_f32_daz_global_system_noret(
 ; ALL-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2293,6 +2379,7 @@ define void @test_atomicrmw_fadd_f32_daz_global_system_noret(ptr addrspace(1) %p
 }
 
 define float @test_atomicrmw_fadd_f32_daz_global_system_ret(ptr addrspace(1) %ptr, float %value) #3 {
+; ALL: Function Attrs: denormal_fpenv(float: preservesign)
 ; ALL-LABEL: @test_atomicrmw_fadd_f32_daz_global_system_ret(
 ; ALL-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2314,6 +2401,7 @@ define float @test_atomicrmw_fadd_f32_daz_global_system_ret(ptr addrspace(1) %pt
 }
 
 define void @test_atomicrmw_fadd_f32_daz_global_system_noret__amdgpu_ignore_denormal_mode(ptr addrspace(1) %ptr, float %value) #3 {
+; ALL: Function Attrs: denormal_fpenv(float: preservesign)
 ; ALL-LABEL: @test_atomicrmw_fadd_f32_daz_global_system_noret__amdgpu_ignore_denormal_mode(
 ; ALL-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2335,6 +2423,7 @@ define void @test_atomicrmw_fadd_f32_daz_global_system_noret__amdgpu_ignore_deno
 }
 
 define float @test_atomicrmw_fadd_f32_daz_global_system_ret__amdgpu_ignore_denormal_mode(ptr addrspace(1) %ptr, float %value) #3 {
+; ALL: Function Attrs: denormal_fpenv(float: preservesign)
 ; ALL-LABEL: @test_atomicrmw_fadd_f32_daz_global_system_ret__amdgpu_ignore_denormal_mode(
 ; ALL-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2356,6 +2445,7 @@ define float @test_atomicrmw_fadd_f32_daz_global_system_ret__amdgpu_ignore_denor
 }
 
 define void @test_atomicrmw_fadd_f32_dyndenorm_global_system_noret__amdgpu_ignore_denormal_mode(ptr addrspace(1) %ptr, float %value) #4 {
+; ALL: Function Attrs: denormal_fpenv(float: dynamic)
 ; ALL-LABEL: @test_atomicrmw_fadd_f32_dyndenorm_global_system_noret__amdgpu_ignore_denormal_mode(
 ; ALL-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2377,6 +2467,7 @@ define void @test_atomicrmw_fadd_f32_dyndenorm_global_system_noret__amdgpu_ignor
 }
 
 define float @test_atomicrmw_fadd_f32_dyndenorm_global_system_ret__amdgpu_ignore_denormal_mode(ptr addrspace(1) %ptr, float %value) #4 {
+; ALL: Function Attrs: denormal_fpenv(float: dynamic)
 ; ALL-LABEL: @test_atomicrmw_fadd_f32_dyndenorm_global_system_ret__amdgpu_ignore_denormal_mode(
 ; ALL-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2562,6 +2653,7 @@ define float @test_atomicrmw_fadd_f32_local_ret__amdgpu_ignore_denormal_mode(ptr
 }
 
 define void @test_atomicrmw_fadd_f64_dyndenorm_global_system_noret(ptr addrspace(1) %ptr, double %value) #5 {
+; ALL: Function Attrs: denormal_fpenv(dynamic)
 ; ALL-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_global_system_noret(
 ; ALL-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2583,6 +2675,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_global_system_noret(ptr addrspace
 }
 
 define double @test_atomicrmw_fadd_f64_dyndenorm_global_system_ret(ptr addrspace(1) %ptr, double %value) #5 {
+; ALL: Function Attrs: denormal_fpenv(dynamic)
 ; ALL-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_global_system_ret(
 ; ALL-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2604,6 +2697,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_global_system_ret(ptr addrspace
 }
 
 define void @test_atomicrmw_fadd_f64_dyndenorm_global_system_noret__amdgpu_ignore_denormal_mode(ptr addrspace(1) %ptr, double %value) #5 {
+; ALL: Function Attrs: denormal_fpenv(dynamic)
 ; ALL-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_global_system_noret__amdgpu_ignore_denormal_mode(
 ; ALL-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2625,6 +2719,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_global_system_noret__amdgpu_ignor
 }
 
 define double @test_atomicrmw_fadd_f64_dyndenorm_global_system_ret__amdgpu_ignore_denormal_mode(ptr addrspace(1) %ptr, double %value) #5 {
+; ALL: Function Attrs: denormal_fpenv(dynamic)
 ; ALL-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_global_system_ret__amdgpu_ignore_denormal_mode(
 ; ALL-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; ALL-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2646,6 +2741,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_global_system_ret__amdgpu_ignor
 }
 
 define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(ptr addrspace(3) %ptr, double %value) #5 {
+; CI: Function Attrs: denormal_fpenv(dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2662,6 +2758,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(ptr addrspace(
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2678,6 +2775,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(ptr addrspace(
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2694,14 +2792,17 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(ptr addrspace(
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(
 ; GFX90A-NEXT:    [[UNUSED:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(
 ; GFX942-NEXT:    [[UNUSED:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2723,6 +2824,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret(ptr addrspace(
 }
 
 define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(ptr addrspace(3) %ptr, double %value) #5 {
+; CI: Function Attrs: denormal_fpenv(dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2739,6 +2841,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(ptr addrspace(
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2755,6 +2858,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(ptr addrspace(
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2771,14 +2875,17 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(ptr addrspace(
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(
 ; GFX90A-NEXT:    [[RET:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8
 ; GFX90A-NEXT:    ret double [[RET]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(
 ; GFX942-NEXT:    [[RET:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8
 ; GFX942-NEXT:    ret double [[RET]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2800,6 +2907,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret(ptr addrspace(
 }
 
 define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(ptr addrspace(3) %ptr, double %value) #5 {
+; CI: Function Attrs: denormal_fpenv(dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2816,6 +2924,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2832,6 +2941,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2848,14 +2958,17 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX90A-NEXT:    [[UNUSED:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8, !amdgpu.ignore.denormal.mode [[META0]]
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX942-NEXT:    [[UNUSED:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8, !amdgpu.ignore.denormal.mode [[META0]]
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2877,6 +2990,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_local_system_noret__amdgpu_ignore
 }
 
 define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(ptr addrspace(3) %ptr, double %value) #5 {
+; CI: Function Attrs: denormal_fpenv(dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(
 ; CI-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; CI-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2893,6 +3007,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[TMP5]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX9-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX9-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2909,6 +3024,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[TMP5]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX908-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX908-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2925,14 +3041,17 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[TMP5]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX90A-NEXT:    [[RET:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8, !amdgpu.ignore.denormal.mode [[META0]]
 ; GFX90A-NEXT:    ret double [[RET]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX942-NEXT:    [[RET:%.*]] = atomicrmw fadd ptr addrspace(3) [[PTR:%.*]], double [[VALUE:%.*]] monotonic, align 8, !amdgpu.ignore.denormal.mode [[META0]]
 ; GFX942-NEXT:    ret double [[RET]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_local_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX11-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(3) [[PTR:%.*]], align 8
 ; GFX11-NEXT:    br label [[ATOMICRMW_START:%.*]]
@@ -2996,6 +3115,7 @@ define float @test_atomicrmw_fadd_f32_flat_system_ret__amdgpu_ignore_denormal_mo
 }
 
 define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(ptr %ptr, double %value) #5 {
+; CI: Function Attrs: denormal_fpenv(dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(
 ; CI-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; CI-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3025,6 +3145,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret void
 ;
+; GFX9: Function Attrs: denormal_fpenv(dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX9-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX9-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3054,6 +3175,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret void
 ;
+; GFX908: Function Attrs: denormal_fpenv(dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX908-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX908-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3083,6 +3205,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret void
 ;
+; GFX90A: Function Attrs: denormal_fpenv(dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX90A-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX90A-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -3120,6 +3243,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret void
 ;
+; GFX942: Function Attrs: denormal_fpenv(dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX942-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX942-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -3157,6 +3281,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_
 ; GFX942:       atomicrmw.end:
 ; GFX942-NEXT:    ret void
 ;
+; GFX11: Function Attrs: denormal_fpenv(dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_denormal_mode(
 ; GFX11-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX11-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3191,6 +3316,7 @@ define void @test_atomicrmw_fadd_f64_dyndenorm_flat_system_noret__amdgpu_ignore_
 }
 
 define double @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(ptr %ptr, double %value) #5 {
+; CI: Function Attrs: denormal_fpenv(dynamic)
 ; CI-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(
 ; CI-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; CI-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3221,6 +3347,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_
 ; CI:       atomicrmw.end:
 ; CI-NEXT:    ret double [[RET]]
 ;
+; GFX9: Function Attrs: denormal_fpenv(dynamic)
 ; GFX9-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX9-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX9-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3251,6 +3378,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_
 ; GFX9:       atomicrmw.end:
 ; GFX9-NEXT:    ret double [[RET]]
 ;
+; GFX908: Function Attrs: denormal_fpenv(dynamic)
 ; GFX908-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX908-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX908-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
@@ -3281,6 +3409,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_
 ; GFX908:       atomicrmw.end:
 ; GFX908-NEXT:    ret double [[RET]]
 ;
+; GFX90A: Function Attrs: denormal_fpenv(dynamic)
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX90A-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX90A-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -3319,6 +3448,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_
 ; GFX90A:       atomicrmw.end:
 ; GFX90A-NEXT:    ret double [[RET]]
 ;
+; GFX942: Function Attrs: denormal_fpenv(dynamic)
 ; GFX942-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX942-NEXT:    [[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr [[PTR:%.*]])
 ; GFX942-NEXT:    br i1 [[IS_SHARED]], label [[ATOMICRMW_SHARED:%.*]], label [[ATOMICRMW_CHECK_PRIVATE:%.*]]
@@ -3357,6 +3487,7 @@ define double @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_
 ; GFX942:       atomicrmw.end:
 ; GFX942-NEXT:    ret double [[RET]]
 ;
+; GFX11: Function Attrs: denormal_fpenv(dynamic)
 ; GFX11-LABEL: @test_atomicrmw_fadd_f64_dyndenorm_flat_system_ret__amdgpu_ignore_denormal_mode(
 ; GFX11-NEXT:    [[IS_PRIVATE:%.*]] = call i1 @llvm.amdgcn.is.private(ptr [[PTR:%.*]])
 ; GFX11-NEXT:    br i1 [[IS_PRIVATE]], label [[ATOMICRMW_PRIVATE:%.*]], label [[ATOMICRMW_GLOBAL:%.*]]
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmax.ll b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmax.ll
index e0f065506bee2..ca639c06aff37 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmax.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmax.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GCN,GFX7 %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GCN,GFX9 %s
 
@@ -273,12 +273,13 @@ define double @test_atomicrmw_fmax_f64_local(ptr addrspace(3) %ptr, double %valu
 }
 
 define double @test_atomicrmw_fmax_f64_global_strictfp(ptr addrspace(1) %ptr, double %value) strictfp {
+; GCN: Function Attrs: strictfp
 ; GCN-LABEL: @test_atomicrmw_fmax_f64_global_strictfp(
 ; GCN-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GCN-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GCN:       atomicrmw.start:
 ; GCN-NEXT:    [[LOADED:%.*]] = phi double [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP6:%.*]], [[ATOMICRMW_START]] ]
-; GCN-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.maxnum.f64(double [[LOADED]], double [[VALUE:%.*]], metadata !"fpexcept.strict") #[[ATTR4:[0-9]+]]
+; GCN-NEXT:    [[TMP2:%.*]] = call double @llvm.maxnum.f64(double [[LOADED]], double [[VALUE:%.*]]) #[[ATTR3:[0-9]+]]
 ; GCN-NEXT:    [[TMP3:%.*]] = bitcast double [[TMP2]] to i64
 ; GCN-NEXT:    [[TMP4:%.*]] = bitcast double [[LOADED]] to i64
 ; GCN-NEXT:    [[TMP5:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i64 [[TMP4]], i64 [[TMP3]] seq_cst seq_cst, align 8
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmin.ll b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmin.ll
index fd4db4a0cf699..393b51f903473 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmin.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fmin.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GCN,GFX7 %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefixes=GCN,GFX9 %s
 
@@ -273,12 +273,13 @@ define double @test_atomicrmw_fmin_f64_local(ptr addrspace(3) %ptr, double %valu
 }
 
 define double @test_atomicrmw_fmin_f64_global_strictfp(ptr addrspace(1) %ptr, double %value) strictfp {
+; GCN: Function Attrs: strictfp
 ; GCN-LABEL: @test_atomicrmw_fmin_f64_global_strictfp(
 ; GCN-NEXT:    [[TMP1:%.*]] = load double, ptr addrspace(1) [[PTR:%.*]], align 8
 ; GCN-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GCN:       atomicrmw.start:
 ; GCN-NEXT:    [[LOADED:%.*]] = phi double [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP6:%.*]], [[ATOMICRMW_START]] ]
-; GCN-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.minnum.f64(double [[LOADED]], double [[VALUE:%.*]], metadata !"fpexcept.strict") #[[ATTR4:[0-9]+]]
+; GCN-NEXT:    [[TMP2:%.*]] = call double @llvm.minnum.f64(double [[LOADED]], double [[VALUE:%.*]]) #[[ATTR3:[0-9]+]]
 ; GCN-NEXT:    [[TMP3:%.*]] = bitcast double [[TMP2]] to i64
 ; GCN-NEXT:    [[TMP4:%.*]] = bitcast double [[LOADED]] to i64
 ; GCN-NEXT:    [[TMP5:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i64 [[TMP4]], i64 [[TMP3]] seq_cst seq_cst, align 8
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fsub.ll b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fsub.ll
index b5e553754277c..012740514cf2c 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fsub.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fsub.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefix=GCN %s
 ; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes='require<libcall-lowering-info>,atomic-expand' %s | FileCheck -check-prefix=GCN %s
 
@@ -297,12 +297,13 @@ define double @test_atomicrmw_fsub_f64_local(ptr addrspace(3) %ptr, double %valu
 }
 
 define float @test_atomicrmw_fsub_f32_global_strictfp(ptr addrspace(1) %ptr, float %value) strictfp {
+; GCN: Function Attrs: strictfp
 ; GCN-LABEL: @test_atomicrmw_fsub_f32_global_strictfp(
 ; GCN-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; GCN-NEXT:    br label [[ATOMICRMW_START:%.*]]
 ; GCN:       atomicrmw.start:
 ; GCN-NEXT:    [[LOADED:%.*]] = phi float [ [[TMP1]], [[TMP0:%.*]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]
-; GCN-NEXT:    [[NEW:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[LOADED]], float [[VALUE:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4:[0-9]+]]
+; GCN-NEXT:    [[NEW:%.*]] = fsub float [[LOADED]], [[VALUE:%.*]]
 ; GCN-NEXT:    [[TMP2:%.*]] = bitcast float [[NEW]] to i32
 ; GCN-NEXT:    [[TMP3:%.*]] = bitcast float [[LOADED]] to i32
 ; GCN-NEXT:    [[TMP4:%.*]] = cmpxchg ptr addrspace(1) [[PTR]], i32 [[TMP3]], i32 [[TMP2]] seq_cst seq_cst, align 4
diff --git a/llvm/test/Transforms/Attributor/nofpclass-log.ll b/llvm/test/Transforms/Attributor/nofpclass-log.ll
index 4bd62d8e903ba..2268201ed7ea6 100644
--- a/llvm/test/Transforms/Attributor/nofpclass-log.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass-log.ll
@@ -12,8 +12,8 @@ declare float @llvm.experimental.constrained.log10.f32(float, metadata, metadata
 
 define float @ret_log(float %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @ret_log
-; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR2:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float [[ARG]]) #[[ATTR10:[0-9]+]]
+; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float [[ARG]]) #[[ATTR9:[0-9]+]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -22,8 +22,8 @@ define float @ret_log(float %arg) #0 {
 
 define float @ret_log_noinf(float nofpclass(inf) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_noinf
-; CHECK-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -32,8 +32,8 @@ define float @ret_log_noinf(float nofpclass(inf) %arg) #0 {
 
 define float @ret_log_noneg(float nofpclass(ninf nsub nnorm) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @ret_log_noneg
-; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(ninf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(ninf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -42,8 +42,8 @@ define float @ret_log_noneg(float nofpclass(ninf nsub nnorm) %arg) #0 {
 
 define float @ret_log_noneg_nonan(float nofpclass(ninf nsub nnorm nan) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nan nzero sub) float @ret_log_noneg_nonan
-; CHECK-SAME: (float nofpclass(nan ninf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan nzero sub) float @llvm.log.f32(float nofpclass(nan ninf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan ninf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan nzero sub) float @llvm.log.f32(float nofpclass(nan ninf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -52,8 +52,8 @@ define float @ret_log_noneg_nonan(float nofpclass(ninf nsub nnorm nan) %arg) #0
 
 define float @ret_log_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_noinf_noneg
-; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -62,8 +62,8 @@ define float @ret_log_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) #0 {
 
 define float @ret_log_noinf_noneg_nonan(float nofpclass(inf nsub nnorm nan) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nan pinf nzero sub) float @ret_log_noinf_noneg_nonan
-; CHECK-SAME: (float nofpclass(nan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan pinf nzero sub) float @llvm.log.f32(float nofpclass(nan inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan pinf nzero sub) float @llvm.log.f32(float nofpclass(nan inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -72,8 +72,8 @@ define float @ret_log_noinf_noneg_nonan(float nofpclass(inf nsub nnorm nan) %arg
 
 define float @ret_log_nopinf(float nofpclass(pinf) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_nopinf
-; CHECK-SAME: (float nofpclass(pinf) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(pinf) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(pinf) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(pinf) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -82,8 +82,8 @@ define float @ret_log_nopinf(float nofpclass(pinf) %arg) #0 {
 
 define float @ret_log_noninf(float nofpclass(ninf) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @ret_log_noninf
-; CHECK-SAME: (float nofpclass(ninf) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(ninf) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(ninf) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(ninf) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -92,8 +92,8 @@ define float @ret_log_noninf(float nofpclass(ninf) %arg) #0 {
 
 define float @ret_log_nonan(float nofpclass(nan) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @ret_log_nonan
-; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(nan) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(nan) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -102,8 +102,8 @@ define float @ret_log_nonan(float nofpclass(nan) %arg) #0 {
 
 define float @ret_log_nonan_noinf(float nofpclass(nan inf) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_nonan_noinf
-; CHECK-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(nan inf) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(nan inf) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -112,8 +112,8 @@ define float @ret_log_nonan_noinf(float nofpclass(nan inf) %arg) #0 {
 
 define float @ret_log_nonan_noinf_nozero(float nofpclass(nan inf zero) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub) float @ret_log_nonan_noinf_nozero
-; CHECK-SAME: (float nofpclass(nan inf zero) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub) float @llvm.log.f32(float nofpclass(nan inf zero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan inf zero) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub) float @llvm.log.f32(float nofpclass(nan inf zero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -122,8 +122,8 @@ define float @ret_log_nonan_noinf_nozero(float nofpclass(nan inf zero) %arg) #0
 
 define float @ret_log_noinf_nozero(float nofpclass(inf zero) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub) float @ret_log_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -132,8 +132,8 @@ define float @ret_log_noinf_nozero(float nofpclass(inf zero) %arg) #0 {
 
 define float @ret_log_noinf_nonegzero(float nofpclass(inf nzero) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -142,9 +142,9 @@ define float @ret_log_noinf_nonegzero(float nofpclass(inf nzero) %arg) #0 {
 
 define float @ret_log_positive_source(i32 %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nan pinf nzero sub) float @ret_log_positive_source
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR1]] {
 ; CHECK-NEXT:    [[UITOFP:%.*]] = uitofp i32 [[ARG]] to float
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan pinf nzero sub) float @llvm.log.f32(float [[UITOFP]]) #[[ATTR10]]
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan pinf nzero sub) float @llvm.log.f32(float [[UITOFP]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %uitofp = uitofp i32 %arg to float
@@ -155,9 +155,9 @@ define float @ret_log_positive_source(i32 %arg) #0 {
 ; Could produce a nan because we don't know if the multiply is negative.
 define float @ret_log_unknown_sign(float nofpclass(nan) %arg, float nofpclass(nan) %arg1) #0 {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @ret_log_unknown_sign
-; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]], float nofpclass(nan) [[ARG1:%.*]]) #[[ATTR2]] {
+; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]], float nofpclass(nan) [[ARG1:%.*]]) #[[ATTR1]] {
 ; CHECK-NEXT:    [[UNKNOWN_SIGN_NOT_NAN:%.*]] = fmul nnan float [[ARG]], [[ARG1]]
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float [[UNKNOWN_SIGN_NOT_NAN]]) #[[ATTR10]]
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float [[UNKNOWN_SIGN_NOT_NAN]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %unknown.sign.not.nan = fmul nnan float %arg, %arg1
@@ -167,8 +167,8 @@ define float @ret_log_unknown_sign(float nofpclass(nan) %arg, float nofpclass(na
 
 define float @ret_log_daz_noinf_nozero(float nofpclass(inf zero) %arg) #1 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_daz_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR3:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR2:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -177,8 +177,8 @@ define float @ret_log_daz_noinf_nozero(float nofpclass(inf zero) %arg) #1 {
 
 define <2 x float> @ret_log_daz_noinf_nozero_v2f32(<2 x float> nofpclass(inf zero) %arg) #1 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) <2 x float> @ret_log_daz_noinf_nozero_v2f32
-; CHECK-SAME: (<2 x float> nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR3]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) <2 x float> @llvm.log.v2f32(<2 x float> nofpclass(inf zero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (<2 x float> nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) <2 x float> @llvm.log.v2f32(<2 x float> nofpclass(inf zero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret <2 x float> [[CALL]]
 ;
   %call = call <2 x float> @llvm.log.v2f32(<2 x float> %arg)
@@ -187,8 +187,8 @@ define <2 x float> @ret_log_daz_noinf_nozero_v2f32(<2 x float> nofpclass(inf zer
 
 define float @ret_log_daz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #1 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_daz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR3]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -197,8 +197,8 @@ define float @ret_log_daz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #1 {
 
 define float @ret_log_dapz_noinf_nozero(float nofpclass(inf zero) %arg) #2 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_dapz_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR4:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR3:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -207,8 +207,8 @@ define float @ret_log_dapz_noinf_nozero(float nofpclass(inf zero) %arg) #2 {
 
 define float @ret_log_dapz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #2 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_dapz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR4]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -217,8 +217,8 @@ define float @ret_log_dapz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #2 {
 
 define float @ret_log_dynamic_noinf_nozero(float nofpclass(inf zero) %arg) #3 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_dynamic_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR5:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG:%.*]]) #[[ATTR4:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf zero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -227,8 +227,8 @@ define float @ret_log_dynamic_noinf_nozero(float nofpclass(inf zero) %arg) #3 {
 
 define float @ret_log_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %arg) #3 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_dynamic_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR5]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR4]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -237,8 +237,8 @@ define float @ret_log_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %arg) #
 
 define float @ret_log_ftz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #4 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_ftz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR6:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR5:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -247,8 +247,8 @@ define float @ret_log_ftz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #4 {
 
 define float @ret_log_ftpz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #5 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_ftpz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR7:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR6:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -257,8 +257,8 @@ define float @ret_log_ftpz_noinf_nonegzero(float nofpclass(inf nzero) %arg) #5 {
 
 define float @ret_log_ftz_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %arg) #6 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log_ftz_dynamic_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR8:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG:%.*]]) #[[ATTR7:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(inf nzero) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log.f32(float %arg)
@@ -267,9 +267,9 @@ define float @ret_log_ftz_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %ar
 
 define float @constrained_log(float %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @constrained_log
-; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR9:[0-9]+]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(nzero sub) float @llvm.experimental.constrained.log.f32(float [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11:[0-9]+]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR8:[0-9]+]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float [[ARG]]) #[[ATTR10:[0-9]+]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.log.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -277,9 +277,9 @@ define float @constrained_log(float %arg) strictfp {
 
 define float @constrained_log_nonan(float nofpclass(nan) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @constrained_log_nonan
-; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(nzero sub) float @llvm.experimental.constrained.log.f32(float nofpclass(nan) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(nan) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.log.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -287,9 +287,9 @@ define float @constrained_log_nonan(float nofpclass(nan) %arg) strictfp {
 
 define float @constrained_log_nopinf(float nofpclass(pinf) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @constrained_log_nopinf
-; CHECK-SAME: (float nofpclass(pinf) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.experimental.constrained.log.f32(float nofpclass(pinf) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(pinf) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log.f32(float nofpclass(pinf) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.log.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -297,9 +297,9 @@ define float @constrained_log_nopinf(float nofpclass(pinf) %arg) strictfp {
 
 define float @constrained_log_nonegzero(float nofpclass(nzero) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(nzero sub) float @constrained_log_nonegzero
-; CHECK-SAME: (float nofpclass(nzero) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(nzero sub) float @llvm.experimental.constrained.log.f32(float nofpclass(nzero) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(nzero) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(nzero sub) float @llvm.log.f32(float nofpclass(nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.log.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -307,9 +307,9 @@ define float @constrained_log_nonegzero(float nofpclass(nzero) %arg) strictfp {
 
 define float @constrained_log_nozero(float nofpclass(zero) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(ninf nzero sub) float @constrained_log_nozero
-; CHECK-SAME: (float nofpclass(zero) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(ninf nzero sub) float @llvm.experimental.constrained.log.f32(float nofpclass(zero) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(zero) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(ninf nzero sub) float @llvm.log.f32(float nofpclass(zero) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.log.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -317,8 +317,8 @@ define float @constrained_log_nozero(float nofpclass(zero) %arg) strictfp {
 
 define float @ret_log2_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log2_noinf_noneg
-; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log2.f32(float %arg)
@@ -327,8 +327,8 @@ define float @ret_log2_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) #0 {
 
 define float @ret_log2_noinf_noneg_nonan(float nofpclass(inf nsub nnorm nan) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nan pinf nzero sub) float @ret_log2_noinf_noneg_nonan
-; CHECK-SAME: (float nofpclass(nan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan pinf nzero sub) float @llvm.log2.f32(float nofpclass(nan inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan pinf nzero sub) float @llvm.log2.f32(float nofpclass(nan inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log2.f32(float %arg)
@@ -337,8 +337,8 @@ define float @ret_log2_noinf_noneg_nonan(float nofpclass(inf nsub nnorm nan) %ar
 
 define float @ret_log2_noinf_noneg_noqnan(float nofpclass(inf nsub nnorm qnan) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log2_noinf_noneg_noqnan
-; CHECK-SAME: (float nofpclass(qnan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(qnan inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(qnan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(qnan inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log2.f32(float %arg)
@@ -347,8 +347,8 @@ define float @ret_log2_noinf_noneg_noqnan(float nofpclass(inf nsub nnorm qnan) %
 
 define float @ret_log2_noinf_noneg_nosnan(float nofpclass(inf nsub nnorm snan) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log2_noinf_noneg_nosnan
-; CHECK-SAME: (float nofpclass(snan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(snan inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(snan inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(snan inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log2.f32(float %arg)
@@ -357,8 +357,8 @@ define float @ret_log2_noinf_noneg_nosnan(float nofpclass(inf nsub nnorm snan) %
 
 define float @ret_log10_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) #0 {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_log10_noinf_noneg
-; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log10.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log10.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.log10.f32(float %arg)
@@ -367,9 +367,9 @@ define float @ret_log10_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) #0 {
 
 define float @ret_constrained_log2_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_constrained_log2_noinf_noneg
-; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.experimental.constrained.log2.f32(float nofpclass(inf nsub nnorm) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[CALL]]
+; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[CALL1:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log2.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[CALL1]]
 ;
   %call = call float @llvm.experimental.constrained.log2.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %call
@@ -377,9 +377,9 @@ define float @ret_constrained_log2_noinf_noneg(float nofpclass(inf nsub nnorm) %
 
 define float @ret_constrained_log10_noinf_noneg(float nofpclass(inf nsub nnorm) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(pinf nzero sub) float @ret_constrained_log10_noinf_noneg
-; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(pinf nzero sub) float @llvm.experimental.constrained.log10.f32(float nofpclass(inf nsub nnorm) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[CALL]]
+; CHECK-SAME: (float nofpclass(inf nsub nnorm) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[CALL1:%.*]] = call nofpclass(pinf nzero sub) float @llvm.log10.f32(float nofpclass(inf nsub nnorm) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[CALL1]]
 ;
   %call = call float @llvm.experimental.constrained.log10.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %call
diff --git a/llvm/test/Transforms/Attributor/nofpclass-sqrt.ll b/llvm/test/Transforms/Attributor/nofpclass-sqrt.ll
index aa62e807cddb9..2c7f0962f6097 100644
--- a/llvm/test/Transforms/Attributor/nofpclass-sqrt.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass-sqrt.ll
@@ -7,8 +7,8 @@ declare float @llvm.experimental.constrained.sqrt.f32(float, metadata, metadata)
 
 define float @ret_sqrt(float %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(ninf sub nnorm) float @ret_sqrt
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(ninf sub nnorm) float @llvm.sqrt.f32(float [[ARG0]]) #[[ATTR10:[0-9]+]]
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(ninf sub nnorm) float @llvm.sqrt.f32(float [[ARG0]]) #[[ATTR9:[0-9]+]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -17,8 +17,8 @@ define float @ret_sqrt(float %arg0) #0 {
 
 define float @ret_sqrt_noinf(float nofpclass(inf) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @ret_sqrt_noinf
-; CHECK-SAME: (float nofpclass(inf) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -27,8 +27,8 @@ define float @ret_sqrt_noinf(float nofpclass(inf) %arg0) #0 {
 
 define float @ret_sqrt_nopinf(float nofpclass(pinf) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @ret_sqrt_nopinf
-; CHECK-SAME: (float nofpclass(pinf) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(pinf) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(pinf) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(pinf) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -37,8 +37,8 @@ define float @ret_sqrt_nopinf(float nofpclass(pinf) %arg0) #0 {
 
 define float @ret_sqrt_noninf(float nofpclass(ninf) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(ninf sub nnorm) float @ret_sqrt_noninf
-; CHECK-SAME: (float nofpclass(ninf) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(ninf sub nnorm) float @llvm.sqrt.f32(float nofpclass(ninf) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(ninf) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(ninf sub nnorm) float @llvm.sqrt.f32(float nofpclass(ninf) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -47,8 +47,8 @@ define float @ret_sqrt_noninf(float nofpclass(ninf) %arg0) #0 {
 
 define float @ret_sqrt_nonan(float nofpclass(nan) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(snan ninf sub nnorm) float @ret_sqrt_nonan
-; CHECK-SAME: (float nofpclass(nan) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan ninf sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan ninf sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -57,8 +57,8 @@ define float @ret_sqrt_nonan(float nofpclass(nan) %arg0) #0 {
 
 define float @ret_sqrt_nonan_noinf(float nofpclass(nan inf) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(snan inf sub nnorm) float @ret_sqrt_nonan_noinf
-; CHECK-SAME: (float nofpclass(nan inf) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan inf) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan inf) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan inf) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -67,8 +67,8 @@ define float @ret_sqrt_nonan_noinf(float nofpclass(nan inf) %arg0) #0 {
 
 define float @ret_sqrt_nonan_noinf_nozero(float nofpclass(nan inf zero) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(snan inf nzero sub nnorm) float @ret_sqrt_nonan_noinf_nozero
-; CHECK-SAME: (float nofpclass(nan inf zero) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan inf zero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(nan inf zero) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan inf zero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -77,8 +77,8 @@ define float @ret_sqrt_nonan_noinf_nozero(float nofpclass(nan inf zero) %arg0) #
 
 define float @ret_sqrt_noinf_nozero(float nofpclass(inf zero) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -87,8 +87,8 @@ define float @ret_sqrt_noinf_nozero(float nofpclass(inf zero) %arg0) #0 {
 
 define float @ret_sqrt_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #0 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR2]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -97,9 +97,9 @@ define float @ret_sqrt_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #0 {
 
 define float @ret_sqrt_positive_source(i32 %arg) #0 {
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub nnorm) float @ret_sqrt_positive_source
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR1]] {
 ; CHECK-NEXT:    [[UITOFP:%.*]] = uitofp i32 [[ARG]] to float
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan inf nzero sub nnorm) float @llvm.sqrt.f32(float [[UITOFP]]) #[[ATTR10]]
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(nan inf nzero sub nnorm) float @llvm.sqrt.f32(float [[UITOFP]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %uitofp = uitofp i32 %arg to float
@@ -110,9 +110,9 @@ define float @ret_sqrt_positive_source(i32 %arg) #0 {
 ; Could produce a nan because we don't know if the multiply is negative.
 define float @ret_sqrt_unknown_sign(float nofpclass(nan) %arg0, float nofpclass(nan) %arg1) #0 {
 ; CHECK-LABEL: define nofpclass(snan ninf sub nnorm) float @ret_sqrt_unknown_sign
-; CHECK-SAME: (float nofpclass(nan) [[ARG0:%.*]], float nofpclass(nan) [[ARG1:%.*]]) #[[ATTR2]] {
+; CHECK-SAME: (float nofpclass(nan) [[ARG0:%.*]], float nofpclass(nan) [[ARG1:%.*]]) #[[ATTR1]] {
 ; CHECK-NEXT:    [[UNKNOWN_SIGN_NOT_NAN:%.*]] = fmul nnan float [[ARG0]], [[ARG1]]
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan ninf sub nnorm) float @llvm.sqrt.f32(float [[UNKNOWN_SIGN_NOT_NAN]]) #[[ATTR10]]
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(snan ninf sub nnorm) float @llvm.sqrt.f32(float [[UNKNOWN_SIGN_NOT_NAN]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %unknown.sign.not.nan = fmul nnan float %arg0, %arg1
@@ -122,8 +122,8 @@ define float @ret_sqrt_unknown_sign(float nofpclass(nan) %arg0, float nofpclass(
 
 define float @ret_sqrt_daz_noinf_nozero(float nofpclass(inf zero) %arg0) #1 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @ret_sqrt_daz_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR3:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR2:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -132,8 +132,8 @@ define float @ret_sqrt_daz_noinf_nozero(float nofpclass(inf zero) %arg0) #1 {
 
 define <2 x float> @ret_sqrt_daz_noinf_nozero_v2f32(<2 x float> nofpclass(inf zero) %arg0) #1 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) <2 x float> @ret_sqrt_daz_noinf_nozero_v2f32
-; CHECK-SAME: (<2 x float> nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR3]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) <2 x float> @llvm.sqrt.v2f32(<2 x float> nofpclass(inf zero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (<2 x float> nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) <2 x float> @llvm.sqrt.v2f32(<2 x float> nofpclass(inf zero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret <2 x float> [[CALL]]
 ;
   %call = call <2 x float> @llvm.sqrt.v2f32(<2 x float> %arg0)
@@ -142,8 +142,8 @@ define <2 x float> @ret_sqrt_daz_noinf_nozero_v2f32(<2 x float> nofpclass(inf ze
 
 define float @ret_sqrt_daz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #1 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @ret_sqrt_daz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR3]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -152,8 +152,8 @@ define float @ret_sqrt_daz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #1
 
 define float @ret_sqrt_dapz_noinf_nozero(float nofpclass(inf zero) %arg0) #2 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_dapz_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR4:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR3:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -162,8 +162,8 @@ define float @ret_sqrt_dapz_noinf_nozero(float nofpclass(inf zero) %arg0) #2 {
 
 define float @ret_sqrt_dapz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #2 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_dapz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR4]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -172,8 +172,8 @@ define float @ret_sqrt_dapz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #2
 
 define float @ret_sqrt_dynamic_noinf_nozero(float nofpclass(inf zero) %arg0) #3 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @ret_sqrt_dynamic_noinf_nozero
-; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR5:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf zero) [[ARG0:%.*]]) #[[ATTR4:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf zero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -182,8 +182,8 @@ define float @ret_sqrt_dynamic_noinf_nozero(float nofpclass(inf zero) %arg0) #3
 
 define float @ret_sqrt_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #3 {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @ret_sqrt_dynamic_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR5]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR4]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -192,8 +192,8 @@ define float @ret_sqrt_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %arg0)
 
 define float @ret_sqrt_ftz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #4 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_ftz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR6:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR5:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -202,8 +202,8 @@ define float @ret_sqrt_ftz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #4
 
 define float @ret_sqrt_ftpz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #5 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_ftpz_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR7:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR6:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -212,8 +212,8 @@ define float @ret_sqrt_ftpz_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #5
 
 define float @ret_sqrt_ftz_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %arg0) #6 {
 ; CHECK-LABEL: define nofpclass(inf nzero sub nnorm) float @ret_sqrt_ftz_dynamic_noinf_nonegzero
-; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR8:[0-9]+]] {
-; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR10]]
+; CHECK-SAME: (float nofpclass(inf nzero) [[ARG0:%.*]]) #[[ATTR7:[0-9]+]] {
+; CHECK-NEXT:    [[CALL:%.*]] = call nofpclass(inf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(inf nzero) [[ARG0]]) #[[ATTR9]]
 ; CHECK-NEXT:    ret float [[CALL]]
 ;
   %call = call float @llvm.sqrt.f32(float %arg0)
@@ -222,9 +222,9 @@ define float @ret_sqrt_ftz_dynamic_noinf_nonegzero(float nofpclass(inf nzero) %a
 
 define float @constrained_sqrt(float %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(ninf sub nnorm) float @constrained_sqrt
-; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR9:[0-9]+]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(ninf sub nnorm) float @llvm.experimental.constrained.sqrt.f32(float [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11:[0-9]+]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR8:[0-9]+]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(ninf sub nnorm) float @llvm.sqrt.f32(float [[ARG]]) #[[ATTR10:[0-9]+]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -232,9 +232,9 @@ define float @constrained_sqrt(float %arg) strictfp {
 
 define float @constrained_sqrt_nonan(float nofpclass(nan) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(snan ninf sub nnorm) float @constrained_sqrt_nonan
-; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(snan ninf sub nnorm) float @llvm.experimental.constrained.sqrt.f32(float nofpclass(nan) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(snan ninf sub nnorm) float @llvm.sqrt.f32(float nofpclass(nan) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -242,9 +242,9 @@ define float @constrained_sqrt_nonan(float nofpclass(nan) %arg) strictfp {
 
 define float @constrained_sqrt_nopinf(float nofpclass(pinf) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(inf sub nnorm) float @constrained_sqrt_nopinf
-; CHECK-SAME: (float nofpclass(pinf) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(inf sub nnorm) float @llvm.experimental.constrained.sqrt.f32(float nofpclass(pinf) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(pinf) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(inf sub nnorm) float @llvm.sqrt.f32(float nofpclass(pinf) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -252,9 +252,9 @@ define float @constrained_sqrt_nopinf(float nofpclass(pinf) %arg) strictfp {
 
 define float @constrained_sqrt_nonegzero(float nofpclass(nzero) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(ninf nzero sub nnorm) float @constrained_sqrt_nonegzero
-; CHECK-SAME: (float nofpclass(nzero) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(ninf nzero sub nnorm) float @llvm.experimental.constrained.sqrt.f32(float nofpclass(nzero) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(nzero) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(ninf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(nzero) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -262,9 +262,9 @@ define float @constrained_sqrt_nonegzero(float nofpclass(nzero) %arg) strictfp {
 
 define float @constrained_sqrt_nozero(float nofpclass(zero) %arg) strictfp {
 ; CHECK-LABEL: define nofpclass(ninf nzero sub nnorm) float @constrained_sqrt_nozero
-; CHECK-SAME: (float nofpclass(zero) [[ARG:%.*]]) #[[ATTR9]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(ninf nzero sub nnorm) float @llvm.experimental.constrained.sqrt.f32(float nofpclass(zero) [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR11]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK-SAME: (float nofpclass(zero) [[ARG:%.*]]) #[[ATTR8]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = call nofpclass(ninf nzero sub nnorm) float @llvm.sqrt.f32(float nofpclass(zero) [[ARG]]) #[[ATTR10]]
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.sqrt.f32(float %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
diff --git a/llvm/test/Transforms/Attributor/nofpclass.ll b/llvm/test/Transforms/Attributor/nofpclass.ll
index df1f4d11479ec..27597e2ac4158 100644
--- a/llvm/test/Transforms/Attributor/nofpclass.ll
+++ b/llvm/test/Transforms/Attributor/nofpclass.ll
@@ -158,7 +158,7 @@ define float @return_nofpclass_nan_decl_return() {
 define float @return_nofpclass_nan_arg(float returned nofpclass(nan) %p) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @return_nofpclass_nan_arg
-; CHECK-SAME: (float returned nofpclass(nan) [[P:%.*]]) #[[ATTR3:[0-9]+]] {
+; CHECK-SAME: (float returned nofpclass(nan) [[P:%.*]]) #[[ATTR2:[0-9]+]] {
 ; CHECK-NEXT:    ret float [[P]]
 ;
   ret float %p
@@ -176,7 +176,7 @@ define [2 x [3 x float]] @return_nofpclass_inf_ret_array() {
 define float @returned_nnan_fadd(float %arg0, float %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_nnan_fadd
-; CHECK-SAME: (float [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FADD:%.*]] = fadd nnan float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[FADD]]
 ;
@@ -237,7 +237,7 @@ define void @ninf_arg_used_by_callsite_array([2 x [3 x float]] nofpclass(inf) %a
 define void @nofpclass_call_use_after_unannotated_use(float %arg) {
 ; CHECK-LABEL: define void @nofpclass_call_use_after_unannotated_use
 ; CHECK-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) {
-; CHECK-NEXT:    call void @extern(float nofpclass(nan inf) [[ARG]]) #[[ATTR21:[0-9]+]]
+; CHECK-NEXT:    call void @extern(float nofpclass(nan inf) [[ARG]]) #[[ATTR20:[0-9]+]]
 ; CHECK-NEXT:    call void @extern(float nofpclass(nan inf) [[ARG]])
 ; CHECK-NEXT:    ret void
 ;
@@ -249,12 +249,12 @@ define void @nofpclass_call_use_after_unannotated_use(float %arg) {
 define float @mutually_recursive0(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(all) float @mutually_recursive0
-; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR4:[0-9]+]] {
+; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR3:[0-9]+]] {
 ; TUNIT-NEXT:    ret float undef
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(all) float @mutually_recursive0
-; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR2]] {
 ; CGSCC-NEXT:    ret float undef
 ;
   %call = call float @mutually_recursive1(float %arg)
@@ -264,12 +264,12 @@ define float @mutually_recursive0(float %arg) {
 define float @mutually_recursive1(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(all) float @mutually_recursive1
-; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR4]] {
+; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
 ; TUNIT-NEXT:    ret float undef
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(all) float @mutually_recursive1
-; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR2]] {
 ; CGSCC-NEXT:    ret float undef
 ;
   %call = call float @mutually_recursive0(float %arg)
@@ -360,7 +360,7 @@ define float @fcmp_ord_assume_callsite_arg_return(float %arg) {
 ; CHECK-SAME: (float returned nofpclass(nan) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[IS_NOT_NAN:%.*]] = fcmp ord float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_NAN]]) #[[ATTR22:[0-9]+]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_NAN]]) #[[ATTR21:[0-9]+]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(nan) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -392,7 +392,7 @@ define void @returned_dead_caller() {
 define internal float @only_nofpclass_inf_callers(float %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define internal float @only_nofpclass_inf_callers
-; CHECK-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -403,14 +403,14 @@ define internal float @only_nofpclass_inf_callers(float %arg) {
 define float @call_noinf_0(float nofpclass(inf) %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define float @call_noinf_0
-; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR23:[0-9]+]]
+; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR22:[0-9]+]]
 ; TUNIT-NEXT:    ret float [[RESULT]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define float @call_noinf_0
-; CGSCC-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR4:[0-9]+]] {
-; CGSCC-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR23:[0-9]+]]
+; CGSCC-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3:[0-9]+]] {
+; CGSCC-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR22:[0-9]+]]
 ; CGSCC-NEXT:    ret float [[RESULT]]
 ;
   %result = call float @only_nofpclass_inf_callers(float %arg)
@@ -420,14 +420,14 @@ define float @call_noinf_0(float nofpclass(inf) %arg) {
 define float @call_noinf_1(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define float @call_noinf_1
-; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR23]]
+; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR22]]
 ; TUNIT-NEXT:    ret float [[RESULT]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define float @call_noinf_1
-; CGSCC-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR4]] {
-; CGSCC-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-NEXT:    [[RESULT:%.*]] = call float @only_nofpclass_inf_callers(float nofpclass(inf) [[ARG]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[RESULT]]
 ;
   %result = call float @only_nofpclass_inf_callers(float nofpclass(inf) %arg)
@@ -438,7 +438,7 @@ define float @call_noinf_1(float %arg) {
 define internal float @only_nofpclass_inf_return_users(float %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define internal float @only_nofpclass_inf_return_users
-; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -449,14 +449,14 @@ define internal float @only_nofpclass_inf_return_users(float %arg) {
 define float @call_noinf_return_0(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(inf) float @call_noinf_return_0
-; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR23]]
+; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR22]]
 ; TUNIT-NEXT:    ret float [[RESULT]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(inf) float @call_noinf_return_0
-; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR4]] {
-; CGSCC-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR23]]
+; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[RESULT]]
 ;
   %result = call nofpclass(inf) float @only_nofpclass_inf_return_users(float %arg)
@@ -466,14 +466,14 @@ define float @call_noinf_return_0(float %arg) {
 define float @call_noinf_return_1(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(inf) float @call_noinf_return_1
-; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR23]]
+; TUNIT-SAME: (float [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR22]]
 ; TUNIT-NEXT:    ret float [[RESULT]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(inf) float @call_noinf_return_1
-; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR4]] {
-; CGSCC-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR23]]
+; CGSCC-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-NEXT:    [[RESULT:%.*]] = call nofpclass(inf) float @only_nofpclass_inf_return_users(float [[ARG]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[RESULT]]
 ;
   %result = call nofpclass(inf) float @only_nofpclass_inf_return_users(float %arg)
@@ -485,7 +485,7 @@ define float @fcmp_olt_assume_one_0_callsite_arg_return(float %arg) {
 ; CHECK-SAME: (float returned nofpclass(nan zero) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[IS_NOT_ZERO_OR_NAN:%.*]] = fcmp one float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_ZERO_OR_NAN]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_ZERO_OR_NAN]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(nan zero) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -501,7 +501,7 @@ define float @fcmp_olt_assume_une_0_callsite_arg_return(float %arg) {
 ; CHECK-SAME: (float returned nofpclass(zero) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[IS_NOT_ZERO_OR_NAN:%.*]] = fcmp une float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_ZERO_OR_NAN]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_ZERO_OR_NAN]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(zero) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -516,9 +516,9 @@ define half @fcmp_assume_issubnormal_callsite_arg_return(half %arg) {
 ; CHECK-LABEL: define nofpclass(nan inf norm) half @fcmp_assume_issubnormal_callsite_arg_return
 ; CHECK-SAME: (half returned nofpclass(nan inf norm) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) half @llvm.fabs.f16(half nofpclass(nan inf norm) [[ARG]]) #[[ATTR24:[0-9]+]]
+; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) half @llvm.fabs.f16(half nofpclass(nan inf norm) [[ARG]]) #[[ATTR23:[0-9]+]]
 ; CHECK-NEXT:    [[IS_SUBNORMAL:%.*]] = fcmp olt half [[FABS]], 0xH0400
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_SUBNORMAL]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_SUBNORMAL]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use.f16(half nofpclass(nan inf norm) [[ARG]])
 ; CHECK-NEXT:    ret half [[ARG]]
 ;
@@ -552,11 +552,11 @@ define half @fcmp_assume2_callsite_arg_return(half %arg) {
 ; CHECK-LABEL: define nofpclass(nan pinf zero sub) half @fcmp_assume2_callsite_arg_return
 ; CHECK-SAME: (half returned nofpclass(nan pinf zero sub) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) half @llvm.fabs.f16(half nofpclass(nan pinf zero sub) [[ARG]]) #[[ATTR24]]
+; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) half @llvm.fabs.f16(half nofpclass(nan pinf zero sub) [[ARG]]) #[[ATTR23]]
 ; CHECK-NEXT:    [[NOT_SUBNORMAL_OR_ZERO:%.*]] = fcmp oge half [[FABS]], 0xH0400
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[NOT_SUBNORMAL_OR_ZERO]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[NOT_SUBNORMAL_OR_ZERO]]) #[[ATTR21]]
 ; CHECK-NEXT:    [[NOT_INF:%.*]] = fcmp one half [[ARG]], 0xH7C00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[NOT_INF]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[NOT_INF]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use.f16(half nofpclass(nan pinf zero sub) [[ARG]])
 ; CHECK-NEXT:    ret half [[ARG]]
 ;
@@ -576,8 +576,8 @@ define float @is_fpclass_assume_arg_return(float %arg) {
 ; CHECK-LABEL: define nofpclass(nan pinf pzero sub nnorm) float @is_fpclass_assume_arg_return
 ; CHECK-SAME: (float returned nofpclass(nan pinf pzero sub nnorm) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[CLASS_TEST:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(nan pinf pzero sub nnorm) [[ARG]], i32 noundef 292) #[[ATTR24]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[CLASS_TEST]]) #[[ATTR22]]
+; CHECK-NEXT:    [[CLASS_TEST:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(nan pinf pzero sub nnorm) [[ARG]], i32 noundef 292) #[[ATTR23]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[CLASS_TEST]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(nan pinf pzero sub nnorm) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -594,11 +594,11 @@ define half @assume_fcmp_fabs_with_other_fabs_assume(half %arg) {
 ; CHECK-LABEL: define nofpclass(nan inf zero norm) half @assume_fcmp_fabs_with_other_fabs_assume
 ; CHECK-SAME: (half returned nofpclass(nan inf zero norm) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(nan inf zero nsub norm) half @llvm.fabs.f16(half nofpclass(nan inf zero norm) [[ARG]]) #[[ATTR24]]
+; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(nan inf zero nsub norm) half @llvm.fabs.f16(half nofpclass(nan inf zero norm) [[ARG]]) #[[ATTR23]]
 ; CHECK-NEXT:    [[UNRELATED_FABS:%.*]] = fcmp one half [[FABS]], 0xH0000
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[UNRELATED_FABS]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[UNRELATED_FABS]]) #[[ATTR21]]
 ; CHECK-NEXT:    [[IS_SUBNORMAL:%.*]] = fcmp olt half [[FABS]], 0xH0400
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_SUBNORMAL]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_SUBNORMAL]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use.f16(half nofpclass(nan inf zero norm) [[ARG]])
 ; CHECK-NEXT:    call void @extern.use.f16(half nofpclass(nan inf zero nsub norm) [[FABS]])
 ; CHECK-NEXT:    ret half [[ARG]]
@@ -621,11 +621,11 @@ define half @assume_fcmp_fabs_with_other_fabs_assume_fallback(half %arg) {
 ; CHECK-LABEL: define nofpclass(nan inf sub norm) half @assume_fcmp_fabs_with_other_fabs_assume_fallback
 ; CHECK-SAME: (half returned nofpclass(nan inf sub norm) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(nan inf nzero sub norm) half @llvm.fabs.f16(half nofpclass(nan inf sub norm) [[ARG]]) #[[ATTR24]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef true) #[[ATTR22]]
+; CHECK-NEXT:    [[FABS:%.*]] = call nofpclass(nan inf nzero sub norm) half @llvm.fabs.f16(half nofpclass(nan inf sub norm) [[ARG]]) #[[ATTR23]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef true) #[[ATTR21]]
 ; CHECK-NEXT:    [[UNRELATED_FABS:%.*]] = fcmp oeq half [[FABS]], 0xH0000
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[UNRELATED_FABS]]) #[[ATTR22]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef true) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[UNRELATED_FABS]]) #[[ATTR21]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef true) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use.f16(half nofpclass(nan inf sub norm) [[ARG]])
 ; CHECK-NEXT:    call void @extern.use.f16(half nofpclass(nan inf nzero sub norm) [[FABS]])
 ; CHECK-NEXT:    ret half [[ARG]]
@@ -653,7 +653,7 @@ define float @assume_bundles(i1 %c, float %ret) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[C]], label [[A:%.*]], label [[B:%.*]]
 ; CHECK:       A:
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef true) #[[ATTR22]] [ "nofpclass"(float [[RET]], i32 3) ]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef true) #[[ATTR21]] [ "nofpclass"(float [[RET]], i32 3) ]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(nan) [[RET]])
 ; CHECK-NEXT:    ret float [[RET]]
 ; CHECK:       B:
@@ -678,7 +678,7 @@ B:
 define float @returned_load(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
 ; CHECK-LABEL: define float @returned_load
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR5:[0-9]+]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR4:[0-9]+]] {
 ; CHECK-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR]], align 4
 ; CHECK-NEXT:    ret float [[LOAD]]
 ;
@@ -689,18 +689,18 @@ define float @returned_load(ptr %ptr) {
 define float @pass_nofpclass_inf_through_memory(float nofpclass(inf) %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define float @pass_nofpclass_inf_through_memory
-; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
+; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
 ; TUNIT-NEXT:    [[ALLOCA:%.*]] = alloca float, align 4
 ; TUNIT-NEXT:    store float [[ARG]], ptr [[ALLOCA]], align 4
-; TUNIT-NEXT:    [[RET:%.*]] = call float @returned_load(ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[ALLOCA]]) #[[ATTR25:[0-9]+]]
+; TUNIT-NEXT:    [[RET:%.*]] = call float @returned_load(ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[ALLOCA]]) #[[ATTR24:[0-9]+]]
 ; TUNIT-NEXT:    ret float [[RET]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define float @pass_nofpclass_inf_through_memory
-; CGSCC-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR4]] {
+; CGSCC-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
 ; CGSCC-NEXT:    [[ALLOCA:%.*]] = alloca float, align 4
 ; CGSCC-NEXT:    store float [[ARG]], ptr [[ALLOCA]], align 4
-; CGSCC-NEXT:    [[RET:%.*]] = call float @returned_load(ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[ALLOCA]]) #[[ATTR25:[0-9]+]]
+; CGSCC-NEXT:    [[RET:%.*]] = call float @returned_load(ptr noalias nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[ALLOCA]]) #[[ATTR24:[0-9]+]]
 ; CGSCC-NEXT:    ret float [[RET]]
 ;
   %alloca = alloca float
@@ -712,14 +712,14 @@ define float @pass_nofpclass_inf_through_memory(float nofpclass(inf) %arg) {
 define float @returned_fabs(float %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs
-; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR26:[0-9]+]]
+; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR25:[0-9]+]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs
-; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -729,14 +729,14 @@ define float @returned_fabs(float %x) {
 define float @returned_fabs_nosnan(float nofpclass(snan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(snan ninf nzero nsub nnorm) float @returned_fabs_nosnan
-; TUNIT-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(snan ninf nzero nsub nnorm) float @returned_fabs_nosnan
-; CGSCC-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -746,14 +746,14 @@ define float @returned_fabs_nosnan(float nofpclass(snan) %x) {
 define float @returned_fabs_noqnan(float nofpclass(qnan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(qnan ninf nzero nsub nnorm) float @returned_fabs_noqnan
-; TUNIT-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(qnan ninf nzero nsub nnorm) float @returned_fabs_noqnan
-; CGSCC-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -763,14 +763,14 @@ define float @returned_fabs_noqnan(float nofpclass(qnan) %x) {
 define float @returned_fabs_nonan(float nofpclass(nan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @returned_fabs_nonan
-; TUNIT-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @returned_fabs_nonan
-; CGSCC-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -780,14 +780,14 @@ define float @returned_fabs_nonan(float nofpclass(nan) %x) {
 define float @returned_fabs_noinf(float nofpclass(inf) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(inf nzero nsub nnorm) float @returned_fabs_noinf
-; TUNIT-SAME: (float nofpclass(inf) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(inf) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(inf) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(inf) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(inf nzero nsub nnorm) float @returned_fabs_noinf
-; CGSCC-SAME: (float nofpclass(inf) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(inf) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(inf) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(inf) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -797,14 +797,14 @@ define float @returned_fabs_noinf(float nofpclass(inf) %x) {
 define float @returned_fabs_nopos(float nofpclass(psub pnorm pinf) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nopos
-; TUNIT-SAME: (float nofpclass(pinf psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf psub pnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(pinf psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf psub pnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nopos
-; CGSCC-SAME: (float nofpclass(pinf psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf psub pnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(pinf psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf psub pnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -814,14 +814,14 @@ define float @returned_fabs_nopos(float nofpclass(psub pnorm pinf) %x) {
 define float @returned_fabs_nopos_nopzero(float nofpclass(psub pnorm pinf pzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nopos_nopzero
-; TUNIT-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nopos_nopzero
-; CGSCC-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -831,14 +831,14 @@ define float @returned_fabs_nopos_nopzero(float nofpclass(psub pnorm pinf pzero)
 define float @returned_fabs_nopos_nozero(float nofpclass(psub pnorm pinf zero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf zero nsub nnorm) float @returned_fabs_nopos_nozero
-; TUNIT-SAME: (float nofpclass(pinf zero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf zero psub pnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(pinf zero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf zero psub pnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf zero nsub nnorm) float @returned_fabs_nopos_nozero
-; CGSCC-SAME: (float nofpclass(pinf zero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf zero psub pnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(pinf zero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf zero psub pnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -848,14 +848,14 @@ define float @returned_fabs_nopos_nozero(float nofpclass(psub pnorm pinf zero) %
 define float @returned_fabs_nopos_nonan(float nofpclass(psub pnorm pinf nan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @returned_fabs_nopos_nonan
-; TUNIT-SAME: (float nofpclass(nan pinf psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan pinf psub pnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nan pinf psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan pinf psub pnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @returned_fabs_nopos_nonan
-; CGSCC-SAME: (float nofpclass(nan pinf psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan pinf psub pnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan pinf psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan pinf psub pnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -865,14 +865,14 @@ define float @returned_fabs_nopos_nonan(float nofpclass(psub pnorm pinf nan) %x)
 define float @returned_fabs_noneg(float nofpclass(nsub nnorm ninf) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_noneg
-; TUNIT-SAME: (float nofpclass(ninf nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nsub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(ninf nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nsub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_noneg
-; CGSCC-SAME: (float nofpclass(ninf nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nsub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(ninf nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nsub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -882,14 +882,14 @@ define float @returned_fabs_noneg(float nofpclass(nsub nnorm ninf) %x) {
 define float @returned_fabs_noneg_nonzero(float nofpclass(nsub nnorm ninf nzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_noneg_nonzero
-; TUNIT-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_noneg_nonzero
-; CGSCC-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -899,14 +899,14 @@ define float @returned_fabs_noneg_nonzero(float nofpclass(nsub nnorm ninf nzero)
 define float @returned_fabs_noneg_nozero(float nofpclass(nsub nnorm ninf zero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf zero nsub nnorm) float @returned_fabs_noneg_nozero
-; TUNIT-SAME: (float nofpclass(ninf zero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf zero nsub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(ninf zero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf zero nsub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf zero nsub nnorm) float @returned_fabs_noneg_nozero
-; CGSCC-SAME: (float nofpclass(ninf zero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf zero nsub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(ninf zero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf zero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf zero nsub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -916,14 +916,14 @@ define float @returned_fabs_noneg_nozero(float nofpclass(nsub nnorm ninf zero) %
 define float @returned_fabs_noneg_nonan(float nofpclass(nsub nnorm ninf nan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @returned_fabs_noneg_nonan
-; TUNIT-SAME: (float nofpclass(nan ninf nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan ninf nsub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nan ninf nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan ninf nsub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(nan ninf nzero nsub nnorm) float @returned_fabs_noneg_nonan
-; CGSCC-SAME: (float nofpclass(nan ninf nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan ninf nsub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan ninf nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan ninf nsub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -933,14 +933,14 @@ define float @returned_fabs_noneg_nonan(float nofpclass(nsub nnorm ninf nan) %x)
 define float @returned_fabs_nonsub_nopnorm_nonzero(float nofpclass(nsub pnorm nzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nonsub_nopnorm_nonzero
-; TUNIT-SAME: (float nofpclass(nzero nsub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nsub pnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nzero nsub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nsub pnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nonsub_nopnorm_nonzero
-; CGSCC-SAME: (float nofpclass(nzero nsub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nsub pnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nzero nsub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nsub pnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -950,14 +950,14 @@ define float @returned_fabs_nonsub_nopnorm_nonzero(float nofpclass(nsub pnorm nz
 define float @returned_fabs_nopsub_nonnorm_nopzero(float nofpclass(psub nnorm pzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nopsub_nonnorm_nopzero
-; TUNIT-SAME: (float nofpclass(pzero psub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pzero psub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(pzero psub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pzero psub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nopsub_nonnorm_nopzero
-; CGSCC-SAME: (float nofpclass(pzero psub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pzero psub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(pzero psub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pzero psub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -967,14 +967,14 @@ define float @returned_fabs_nopsub_nonnorm_nopzero(float nofpclass(psub nnorm pz
 define float @returned_fabs_nonnorm_nozero(float nofpclass(nnorm nzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nonnorm_nozero
-; TUNIT-SAME: (float nofpclass(nzero nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nzero nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fabs_nonnorm_nozero
-; CGSCC-SAME: (float nofpclass(nzero nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nzero nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nzero nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %x)
@@ -984,7 +984,7 @@ define float @returned_fabs_nonnorm_nozero(float nofpclass(nnorm nzero) %x) {
 define float @returned_fneg(float %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @returned_fneg
-; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -995,7 +995,7 @@ define float @returned_fneg(float %x) {
 define float @returned_fneg_nosnan(float nofpclass(snan) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(snan) float @returned_fneg_nosnan
-; CHECK-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1006,7 +1006,7 @@ define float @returned_fneg_nosnan(float nofpclass(snan) %x) {
 define float @returned_fneg_noqnan(float nofpclass(qnan) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(qnan) float @returned_fneg_noqnan
-; CHECK-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1017,7 +1017,7 @@ define float @returned_fneg_noqnan(float nofpclass(qnan) %x) {
 define float @returned_fneg_nosnan_ninf_flag(float nofpclass(snan) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(snan inf) float @returned_fneg_nosnan_ninf_flag
-; CHECK-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg ninf float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1028,7 +1028,7 @@ define float @returned_fneg_nosnan_ninf_flag(float nofpclass(snan) %x) {
 define float @returned_fneg_nonan(float nofpclass(nan) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_fneg_nonan
-; CHECK-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1039,7 +1039,7 @@ define float @returned_fneg_nonan(float nofpclass(nan) %x) {
 define float @returned_fneg_noinf(float nofpclass(inf) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf) float @returned_fneg_noinf
-; CHECK-SAME: (float nofpclass(inf) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(inf) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1050,7 +1050,7 @@ define float @returned_fneg_noinf(float nofpclass(inf) %x) {
 define float @returned_fneg_noneg(float nofpclass(ninf nsub nnorm nzero) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_noneg
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1061,7 +1061,7 @@ define float @returned_fneg_noneg(float nofpclass(ninf nsub nnorm nzero) %x) {
 define float @returned_fneg_noneg_nnan_flag(float nofpclass(ninf nsub nnorm nzero) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan pinf pzero psub pnorm) float @returned_fneg_noneg_nnan_flag
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg nnan float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1072,7 +1072,7 @@ define float @returned_fneg_noneg_nnan_flag(float nofpclass(ninf nsub nnorm nzer
 define float @returned_fneg_nonsubnnorm(float nofpclass(nsub nnorm) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(psub pnorm) float @returned_fneg_nonsubnnorm
-; CHECK-SAME: (float nofpclass(nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1083,7 +1083,7 @@ define float @returned_fneg_nonsubnnorm(float nofpclass(nsub nnorm) %x) {
 define float @returned_fneg_nopos(float nofpclass(pinf psub pnorm pzero) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @returned_fneg_nopos
-; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1094,7 +1094,7 @@ define float @returned_fneg_nopos(float nofpclass(pinf psub pnorm pzero) %x) {
 define float @returned_fneg_nopnormpsub(float nofpclass(psub pnorm) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nsub nnorm) float @returned_fneg_nopnormpsub
-; CHECK-SAME: (float nofpclass(psub pnorm) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(psub pnorm) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1105,7 +1105,7 @@ define float @returned_fneg_nopnormpsub(float nofpclass(psub pnorm) %x) {
 define float @returned_fneg_mixed(float nofpclass(psub nnorm nzero qnan ninf) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(qnan pinf pzero nsub pnorm) float @returned_fneg_mixed
-; CHECK-SAME: (float nofpclass(qnan ninf nzero psub nnorm) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(qnan ninf nzero psub nnorm) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[X]]
 ; CHECK-NEXT:    ret float [[FNEG]]
 ;
@@ -1116,15 +1116,15 @@ define float @returned_fneg_mixed(float nofpclass(psub nnorm nzero qnan ninf) %x
 define float @returned_fneg_fabs(float %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_fabs
-; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_fabs
-; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1136,15 +1136,15 @@ define float @returned_fneg_fabs(float %x) {
 define float @returned_fneg_fabs_nosnan(float nofpclass(snan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(snan pinf pzero psub pnorm) float @returned_fneg_fabs_nosnan
-; TUNIT-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(snan pinf pzero psub pnorm) float @returned_fneg_fabs_nosnan
-; CGSCC-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(snan) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(snan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(snan) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1156,15 +1156,15 @@ define float @returned_fneg_fabs_nosnan(float nofpclass(snan) %x) {
 define float @returned_fneg_fabs_noqnan(float nofpclass(qnan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(qnan pinf pzero psub pnorm) float @returned_fneg_fabs_noqnan
-; TUNIT-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(qnan pinf pzero psub pnorm) float @returned_fneg_fabs_noqnan
-; CGSCC-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(qnan) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1176,15 +1176,15 @@ define float @returned_fneg_fabs_noqnan(float nofpclass(qnan) %x) {
 define float @returned_fneg_fabs_nonan(float nofpclass(nan) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(nan pinf pzero psub pnorm) float @returned_fneg_fabs_nonan
-; TUNIT-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(nan pinf pzero psub pnorm) float @returned_fneg_fabs_nonan
-; CGSCC-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(nan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(nan) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1196,15 +1196,15 @@ define float @returned_fneg_fabs_nonan(float nofpclass(nan) %x) {
 define float @returned_fneg_fabs_noneg(float nofpclass(ninf nsub nnorm nzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_fabs_noneg
-; TUNIT-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_fabs_noneg
-; CGSCC-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(ninf nzero nsub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(ninf nzero nsub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1216,15 +1216,15 @@ define float @returned_fneg_fabs_noneg(float nofpclass(ninf nsub nnorm nzero) %x
 define float @returned_fneg_fabs_nopos(float nofpclass(pinf psub pnorm pzero) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_fabs_nopos
-; TUNIT-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(pinf pzero psub pnorm) float @returned_fneg_fabs_nopos
-; CGSCC-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(pinf pzero psub pnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(pinf pzero psub pnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1236,15 +1236,15 @@ define float @returned_fneg_fabs_nopos(float nofpclass(pinf psub pnorm pzero) %x
 define float @returned_fneg_fabs_mixed(float nofpclass(psub nnorm nzero qnan ninf) %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(qnan pinf pzero psub pnorm) float @returned_fneg_fabs_mixed
-; TUNIT-SAME: (float nofpclass(qnan ninf nzero psub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan ninf nzero psub nnorm) [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(qnan ninf nzero psub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan ninf nzero psub nnorm) [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(qnan pinf pzero psub pnorm) float @returned_fneg_fabs_mixed
-; CGSCC-SAME: (float nofpclass(qnan ninf nzero psub nnorm) [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan ninf nzero psub nnorm) [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(qnan ninf nzero psub nnorm) [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(qnan ninf nzero nsub nnorm) float @llvm.fabs.f32(float nofpclass(qnan ninf nzero psub nnorm) [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1256,15 +1256,15 @@ define float @returned_fneg_fabs_mixed(float nofpclass(psub nnorm nzero qnan nin
 define float @returned_fneg_fabs_ninf_flag_fabs(float %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(inf pzero psub pnorm) float @returned_fneg_fabs_ninf_flag_fabs
-; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call ninf nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call ninf nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(inf pzero psub pnorm) float @returned_fneg_fabs_ninf_flag_fabs
-; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call ninf nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call ninf nofpclass(inf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1276,15 +1276,15 @@ define float @returned_fneg_fabs_ninf_flag_fabs(float %x) {
 define float @returned_fneg_fabs_ninf_flag_fneg(float %x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(inf pzero psub pnorm) float @returned_fneg_fabs_ninf_flag_fneg
-; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR26]]
+; TUNIT-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    [[FNEG_FABS:%.*]] = fneg ninf float [[FABS]]
 ; TUNIT-NEXT:    ret float [[FNEG_FABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define nofpclass(inf pzero psub pnorm) float @returned_fneg_fabs_ninf_flag_fneg
-; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR23]]
+; CGSCC-SAME: (float [[X:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[FABS:%.*]] = call nofpclass(ninf nzero nsub nnorm) float @llvm.fabs.f32(float [[X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    [[FNEG_FABS:%.*]] = fneg ninf float [[FABS]]
 ; CGSCC-NEXT:    ret float [[FNEG_FABS]]
 ;
@@ -1296,7 +1296,7 @@ define float @returned_fneg_fabs_ninf_flag_fneg(float %x) {
 define float @uitofp_i32_to_f32(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub nnorm) float @uitofp_i32_to_f32
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = uitofp i32 [[ARG]] to float
 ; CHECK-NEXT:    ret float [[CVT]]
 ;
@@ -1307,7 +1307,7 @@ define float @uitofp_i32_to_f32(i32 %arg) {
 define float @sitofp_i32_to_f32(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub) float @sitofp_i32_to_f32
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = sitofp i32 [[ARG]] to float
 ; CHECK-NEXT:    ret float [[CVT]]
 ;
@@ -1318,7 +1318,7 @@ define float @sitofp_i32_to_f32(i32 %arg) {
 define <2 x float> @uitofp_v2i32_to_v2f32(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub nnorm) <2 x float> @uitofp_v2i32_to_v2f32
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = uitofp <2 x i32> [[ARG]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CVT]]
 ;
@@ -1329,7 +1329,7 @@ define <2 x float> @uitofp_v2i32_to_v2f32(<2 x i32> %arg) {
 define <2 x float> @sitofp_v2i32_to_v2i32(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub) <2 x float> @sitofp_v2i32_to_v2i32
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = sitofp <2 x i32> [[ARG]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CVT]]
 ;
@@ -1340,7 +1340,7 @@ define <2 x float> @sitofp_v2i32_to_v2i32(<2 x i32> %arg) {
 define half @uitofp_i17_to_f16(i17 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan ninf nzero sub nnorm) half @uitofp_i17_to_f16
-; CHECK-SAME: (i17 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i17 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = uitofp i17 [[ARG]] to half
 ; CHECK-NEXT:    ret half [[CVT]]
 ;
@@ -1351,7 +1351,7 @@ define half @uitofp_i17_to_f16(i17 %arg) {
 define half @sitofp_i17_to_f16(i17 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan nzero sub) half @sitofp_i17_to_f16
-; CHECK-SAME: (i17 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i17 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = sitofp i17 [[ARG]] to half
 ; CHECK-NEXT:    ret half [[CVT]]
 ;
@@ -1362,7 +1362,7 @@ define half @sitofp_i17_to_f16(i17 %arg) {
 define <2 x half> @uitofp_v2i17_to_v2f16(<2 x i17> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan ninf nzero sub nnorm) <2 x half> @uitofp_v2i17_to_v2f16
-; CHECK-SAME: (<2 x i17> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i17> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = uitofp <2 x i17> [[ARG]] to <2 x half>
 ; CHECK-NEXT:    ret <2 x half> [[CVT]]
 ;
@@ -1373,7 +1373,7 @@ define <2 x half> @uitofp_v2i17_to_v2f16(<2 x i17> %arg) {
 define <2 x half> @sitofp_v2i17_to_v2i17(<2 x i17> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan nzero sub) <2 x half> @sitofp_v2i17_to_v2i17
-; CHECK-SAME: (<2 x i17> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i17> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CVT:%.*]] = sitofp <2 x i17> [[ARG]] to <2 x half>
 ; CHECK-NEXT:    ret <2 x half> [[CVT]]
 ;
@@ -1386,9 +1386,9 @@ define float @assume_intersection_not_zero_and_not_nan(float %arg) {
 ; CHECK-SAME: (float returned nofpclass(nan zero) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[IS_NOT_ZERO_OR_NAN:%.*]] = fcmp une float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_ZERO_OR_NAN]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NOT_ZERO_OR_NAN]]) #[[ATTR21]]
 ; CHECK-NEXT:    [[IS_ORD:%.*]] = fcmp ord float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_ORD]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_ORD]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(nan zero) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -1405,10 +1405,10 @@ define float @assume_intersection_class(float %arg) {
 ; CHECK-LABEL: define nofpclass(nan inf zero sub nnorm) float @assume_intersection_class
 ; CHECK-SAME: (float returned nofpclass(nan inf zero sub nnorm) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[POS_NORMAL_OR_POS_SUBNORMAL:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(nan inf zero sub nnorm) [[ARG]], i32 noundef 384) #[[ATTR24]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[POS_NORMAL_OR_POS_SUBNORMAL]]) #[[ATTR22]]
-; CHECK-NEXT:    [[IS_NORMAL:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(nan inf zero sub nnorm) [[ARG]], i32 noundef 264) #[[ATTR24]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NORMAL]]) #[[ATTR22]]
+; CHECK-NEXT:    [[POS_NORMAL_OR_POS_SUBNORMAL:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(nan inf zero sub nnorm) [[ARG]], i32 noundef 384) #[[ATTR23]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[POS_NORMAL_OR_POS_SUBNORMAL]]) #[[ATTR21]]
+; CHECK-NEXT:    [[IS_NORMAL:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(nan inf zero sub nnorm) [[ARG]], i32 noundef 264) #[[ATTR23]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[IS_NORMAL]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(nan inf zero sub nnorm) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -1426,10 +1426,10 @@ define float @assume_intersection_none(float %arg) {
 ; CHECK-LABEL: define nofpclass(all) float @assume_intersection_none
 ; CHECK-SAME: (float returned nofpclass(all) [[ARG:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[CLASS1:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(all) [[ARG]], i32 noundef 682) #[[ATTR24]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[CLASS1]]) #[[ATTR22]]
-; CHECK-NEXT:    [[CLASS2:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(all) [[ARG]], i32 noundef 341) #[[ATTR24]]
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[CLASS2]]) #[[ATTR22]]
+; CHECK-NEXT:    [[CLASS1:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(all) [[ARG]], i32 noundef 682) #[[ATTR23]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[CLASS1]]) #[[ATTR21]]
+; CHECK-NEXT:    [[CLASS2:%.*]] = call i1 @llvm.is.fpclass.f32(float nofpclass(all) [[ARG]], i32 noundef 341) #[[ATTR23]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[CLASS2]]) #[[ATTR21]]
 ; CHECK-NEXT:    call void @extern.use(float nofpclass(all) [[ARG]])
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
@@ -1445,7 +1445,7 @@ entry:
 define float @returned_extractelement_dynamic_index(<4 x float> nofpclass(nan) %vec, i32 %idx) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_extractelement_dynamic_index
-; CHECK-SAME: (<4 x float> nofpclass(nan) [[VEC:%.*]], i32 [[IDX:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(nan) [[VEC:%.*]], i32 [[IDX:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[VEC]], i32 [[IDX]]
 ; CHECK-NEXT:    ret float [[EXTRACT]]
 ;
@@ -1456,7 +1456,7 @@ define float @returned_extractelement_dynamic_index(<4 x float> nofpclass(nan) %
 define float @returned_extractelement_index0(<4 x float> nofpclass(nan) %vec) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_extractelement_index0
-; CHECK-SAME: (<4 x float> nofpclass(nan) [[VEC:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(nan) [[VEC:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[VEC]], i32 0
 ; CHECK-NEXT:    ret float [[EXTRACT]]
 ;
@@ -1467,7 +1467,7 @@ define float @returned_extractelement_index0(<4 x float> nofpclass(nan) %vec) {
 define float @returned_extractelement_index_oob(<4 x float> nofpclass(nan) %vec) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_extractelement_index_oob
-; CHECK-SAME: (<4 x float> nofpclass(nan) [[VEC:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(nan) [[VEC:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[VEC]], i32 5
 ; CHECK-NEXT:    ret float [[EXTRACT]]
 ;
@@ -1478,7 +1478,7 @@ define float @returned_extractelement_index_oob(<4 x float> nofpclass(nan) %vec)
 define float @returned_extractelement_scalable(<vscale x 4 x float> nofpclass(nan) %vec) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_extractelement_scalable
-; CHECK-SAME: (<vscale x 4 x float> nofpclass(nan) [[VEC:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<vscale x 4 x float> nofpclass(nan) [[VEC:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <vscale x 4 x float> [[VEC]], i32 0
 ; CHECK-NEXT:    ret float [[EXTRACT]]
 ;
@@ -1489,7 +1489,7 @@ define float @returned_extractelement_scalable(<vscale x 4 x float> nofpclass(na
 define float @returned_extractvalue([4 x float] nofpclass(nan) %array) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) float @returned_extractvalue
-; CHECK-SAME: ([4 x float] nofpclass(nan) [[ARRAY:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: ([4 x float] nofpclass(nan) [[ARRAY:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractvalue [4 x float] [[ARRAY]], 0
 ; CHECK-NEXT:    ret float [[EXTRACT]]
 ;
@@ -1500,7 +1500,7 @@ define float @returned_extractvalue([4 x float] nofpclass(nan) %array) {
 define float @return_nofpclass_freeze_nan_arg(float nofpclass(nan) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @return_nofpclass_freeze_nan_arg
-; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nan) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FREEZE:%.*]] = freeze float [[ARG]]
 ; CHECK-NEXT:    ret float [[FREEZE]]
 ;
@@ -1511,7 +1511,7 @@ define float @return_nofpclass_freeze_nan_arg(float nofpclass(nan) %arg) {
 define float @return_nofpclass_extractelement_freeze_pinf_arg(<2 x float> nofpclass(pinf) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @return_nofpclass_extractelement_freeze_pinf_arg
-; CHECK-SAME: (<2 x float> nofpclass(pinf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> nofpclass(pinf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FREEZE:%.*]] = freeze <2 x float> [[ARG]]
 ; CHECK-NEXT:    [[ELT:%.*]] = extractelement <2 x float> [[FREEZE]], i32 0
 ; CHECK-NEXT:    ret float [[ELT]]
@@ -1524,7 +1524,7 @@ define float @return_nofpclass_extractelement_freeze_pinf_arg(<2 x float> nofpcl
 define <4 x float> @insertelement_constant_chain() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan ninf nzero sub) <4 x float> @insertelement_constant_chain
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    [[INS_0:%.*]] = insertelement <4 x float> poison, float 1.000000e+00, i32 0
 ; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <4 x float> [[INS_0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[INS_2:%.*]] = insertelement <4 x float> [[INS_1]], float -9.000000e+00, i32 2
@@ -1541,7 +1541,7 @@ define <4 x float> @insertelement_constant_chain() {
 define <4 x float> @insertelement_non_constant_chain(i32 %idx) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub) <4 x float> @insertelement_non_constant_chain
-; CHECK-SAME: (i32 [[IDX:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[IDX:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[INS_0:%.*]] = insertelement <4 x float> poison, float 1.000000e+00, i32 0
 ; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <4 x float> [[INS_0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[INS_2:%.*]] = insertelement <4 x float> [[INS_1]], float -9.000000e+00, i32 2
@@ -1560,7 +1560,7 @@ define <4 x float> @insertelement_non_constant_chain(i32 %idx) {
 define <vscale x 4 x float> @insertelement_scalable_constant_chain() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <vscale x 4 x float> @insertelement_scalable_constant_chain
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    [[INS_0:%.*]] = insertelement <vscale x 4 x float> poison, float 1.000000e+00, i32 0
 ; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <vscale x 4 x float> [[INS_0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[INS_2:%.*]] = insertelement <vscale x 4 x float> [[INS_1]], float -9.000000e+00, i32 2
@@ -1577,7 +1577,7 @@ define <vscale x 4 x float> @insertelement_scalable_constant_chain() {
 define <4 x float> @insertelement_unknown_base(<4 x float> %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @insertelement_unknown_base
-; CHECK-SAME: (<4 x float> [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[INSERT:%.*]] = insertelement <4 x float> [[ARG0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    ret <4 x float> [[INSERT]]
 ;
@@ -1588,7 +1588,7 @@ define <4 x float> @insertelement_unknown_base(<4 x float> %arg0) {
 define float @insertelement_extractelement_same(<4 x float> %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub norm) float @insertelement_extractelement_same
-; CHECK-SAME: (<4 x float> [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[INSERT:%.*]] = insertelement <4 x float> [[ARG0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[INSERT]], i32 1
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1601,7 +1601,7 @@ define float @insertelement_extractelement_same(<4 x float> %arg0) {
 define float @insertelement_extractelement_different(<4 x float> nofpclass(zero) %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(zero) float @insertelement_extractelement_different
-; CHECK-SAME: (<4 x float> nofpclass(zero) [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(zero) [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[INSERT:%.*]] = insertelement <4 x float> [[ARG0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[INSERT]], i32 2
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1614,7 +1614,7 @@ define float @insertelement_extractelement_different(<4 x float> nofpclass(zero)
 define float @insertelement_extractelement_unknown(<4 x float> nofpclass(zero) %arg0, i32 %idx) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @insertelement_extractelement_unknown
-; CHECK-SAME: (<4 x float> nofpclass(zero) [[ARG0:%.*]], i32 [[IDX:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(zero) [[ARG0:%.*]], i32 [[IDX:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[INSERT:%.*]] = insertelement <4 x float> [[ARG0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[INSERT]], i32 [[IDX]]
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1627,7 +1627,7 @@ define float @insertelement_extractelement_unknown(<4 x float> nofpclass(zero) %
 define <4 x float> @insertelement_index_oob_chain() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan ninf nzero sub norm) <4 x float> @insertelement_index_oob_chain
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    [[INSERT:%.*]] = insertelement <4 x float> zeroinitializer, float 0x7FF0000000000000, i32 4
 ; CHECK-NEXT:    ret <4 x float> [[INSERT]]
 ;
@@ -1638,7 +1638,7 @@ define <4 x float> @insertelement_index_oob_chain() {
 define <2 x float> @multiple_extractelement(<4 x float> nofpclass(zero) %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(zero) <2 x float> @multiple_extractelement
-; CHECK-SAME: (<4 x float> nofpclass(zero) [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(zero) [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[INSERT:%.*]] = insertelement <4 x float> [[ARG0]], float 0.000000e+00, i32 1
 ; CHECK-NEXT:    [[EXTRACT2:%.*]] = extractelement <4 x float> [[INSERT]], i32 2
 ; CHECK-NEXT:    [[EXTRACT3:%.*]] = extractelement <4 x float> [[INSERT]], i32 3
@@ -1658,7 +1658,7 @@ define <2 x float> @multiple_extractelement(<4 x float> nofpclass(zero) %arg0) {
 define <4 x float> @shufflevector_constexpr() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_constexpr
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret <4 x float> <float 1.000000e+00, float bitcast (i32 ptrtoint (ptr @shufflevector_constexpr to i32) to float), float 4.000000e+00, float 0.000000e+00>
 ;
   ret <4 x float> shufflevector (<2 x float> <float 1.0, float bitcast (i32 ptrtoint (ptr @shufflevector_constexpr to i32) to float)>, <2 x float> <float 4.0, float 0.0>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>)
@@ -1667,7 +1667,7 @@ define <4 x float> @shufflevector_constexpr() {
 define <4 x float> @shufflevector_concat_disjoint(<2 x float> nofpclass(nan) %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_concat_disjoint
-; CHECK-SAME: (<2 x float> nofpclass(nan) [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> nofpclass(nan) [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1678,7 +1678,7 @@ define <4 x float> @shufflevector_concat_disjoint(<2 x float> nofpclass(nan) %ar
 define <4 x float> @shufflevector_concat_overlap(<2 x float> nofpclass(nan norm psub) %arg0, <2 x float> nofpclass(inf nan sub) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan psub) <4 x float> @shufflevector_concat_overlap
-; CHECK-SAME: (<2 x float> nofpclass(nan psub norm) [[ARG0:%.*]], <2 x float> nofpclass(nan inf sub) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> nofpclass(nan psub norm) [[ARG0:%.*]], <2 x float> nofpclass(nan inf sub) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1689,7 +1689,7 @@ define <4 x float> @shufflevector_concat_overlap(<2 x float> nofpclass(nan norm
 define <4 x float> @shufflevector_unknown_lhs(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_unknown_lhs
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1700,7 +1700,7 @@ define <4 x float> @shufflevector_unknown_lhs(<2 x float> %arg0, <2 x float> nof
 define <4 x float> @shufflevector_unknown_rhs(<2 x float> nofpclass(inf) %arg0, <2 x float> %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_unknown_rhs
-; CHECK-SAME: (<2 x float> nofpclass(inf) [[ARG0:%.*]], <2 x float> [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> nofpclass(inf) [[ARG0:%.*]], <2 x float> [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1711,7 +1711,7 @@ define <4 x float> @shufflevector_unknown_rhs(<2 x float> nofpclass(inf) %arg0,
 define <4 x float> @shufflevector_unknown_all(<2 x float> %arg0, <2 x float> %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_unknown_all
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1722,7 +1722,7 @@ define <4 x float> @shufflevector_unknown_all(<2 x float> %arg0, <2 x float> %ar
 define <4 x float> @shufflevector_only_demand_lhs(<2 x float> nofpclass(inf) %arg0, <2 x float> %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf) <4 x float> @shufflevector_only_demand_lhs
-; CHECK-SAME: (<2 x float> nofpclass(inf) [[ARG0:%.*]], <2 x float> [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> nofpclass(inf) [[ARG0:%.*]], <2 x float> [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 0, i32 1, i32 1, i32 0>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1733,7 +1733,7 @@ define <4 x float> @shufflevector_only_demand_lhs(<2 x float> nofpclass(inf) %ar
 define <4 x float> @shufflevector_only_demand_rhs(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf) <4 x float> @shufflevector_only_demand_rhs
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 2, i32 3, i32 3, i32 2>
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1744,7 +1744,7 @@ define <4 x float> @shufflevector_only_demand_rhs(<2 x float> %arg0, <2 x float>
 define <4 x float> @shufflevector_undef_demanded(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_undef_demanded
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> poison
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1755,7 +1755,7 @@ define <4 x float> @shufflevector_undef_demanded(<2 x float> %arg0, <2 x float>
 define <4 x float> @shufflevector_zeroinit_demanded(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <4 x float> @shufflevector_zeroinit_demanded
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> zeroinitializer
 ; CHECK-NEXT:    ret <4 x float> [[SHUFFLE]]
 ;
@@ -1766,7 +1766,7 @@ define <4 x float> @shufflevector_zeroinit_demanded(<2 x float> %arg0, <2 x floa
 define float @shufflevector_extractelt0(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @shufflevector_extractelt0
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 1, i32 3, i32 0, i32 1>
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 0
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1779,7 +1779,7 @@ define float @shufflevector_extractelt0(<2 x float> %arg0, <2 x float> nofpclass
 define float @shufflevector_extractelt1(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf) float @shufflevector_extractelt1
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 1, i32 3, i32 0, i32 1>
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 1
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1792,7 +1792,7 @@ define float @shufflevector_extractelt1(<2 x float> %arg0, <2 x float> nofpclass
 define float @shufflevector_extractelt2(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @shufflevector_extractelt2
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 1, i32 3, i32 0, i32 1>
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 2
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1805,7 +1805,7 @@ define float @shufflevector_extractelt2(<2 x float> %arg0, <2 x float> nofpclass
 define float @shufflevector_extractelt3(<2 x float> %arg0, <2 x float> nofpclass(inf) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @shufflevector_extractelt3
-; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x float> [[ARG0:%.*]], <2 x float> nofpclass(inf) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <2 x float> [[ARG0]], <2 x float> [[ARG1]], <4 x i32> <i32 1, i32 3, i32 0, i32 1>
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x float> [[SHUFFLE]], i32 3
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1818,7 +1818,7 @@ define float @shufflevector_extractelt3(<2 x float> %arg0, <2 x float> nofpclass
 define float @shufflevector_constantdatavector_demanded0() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf zero sub nnorm) float @shufflevector_constantdatavector_demanded0
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x float> <float 1.000000e+00, float 0x7FF8000000000000, float 0.000000e+00>, <3 x float> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x float> [[SHUFFLE]], i32 0
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1831,7 +1831,7 @@ define float @shufflevector_constantdatavector_demanded0() {
 define float @shufflevector_constantdatavector_demanded1() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub norm) float @shufflevector_constantdatavector_demanded1
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE:%.*]] = shufflevector <3 x float> <float 1.000000e+00, float 0x7FF8000000000000, float 0.000000e+00>, <3 x float> poison, <2 x i32> <i32 0, i32 2>
 ; CHECK-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x float> [[SHUFFLE]], i32 1
 ; CHECK-NEXT:    ret float [[EXTRACT]]
@@ -1844,7 +1844,7 @@ define float @shufflevector_constantdatavector_demanded1() {
 define i32 @fptosi(float nofpclass(inf nan) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define i32 @fptosi
-; CHECK-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[FPTOSI:%.*]] = fptosi float [[ARG]] to i32
 ; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[FPTOSI]], 1
 ; CHECK-NEXT:    ret i32 [[ADD]]
@@ -1857,7 +1857,7 @@ define i32 @fptosi(float nofpclass(inf nan) %arg) {
 define float @fptrunc(double noundef nofpclass(inf nan) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nan ninf nzero nsub nnorm) float @fptrunc
-; CHECK-SAME: (double noundef nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (double noundef nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CAST:%.*]] = fptrunc double [[ARG]] to float
 ; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[CAST]], [[CAST]]
 ; CHECK-NEXT:    ret float [[MUL]]
@@ -1870,7 +1870,7 @@ define float @fptrunc(double noundef nofpclass(inf nan) %arg) {
 define double @fpext(float noundef nofpclass(inf nan) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nan ninf nzero nsub nnorm) double @fpext
-; CHECK-SAME: (float noundef nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[CAST:%.*]] = fpext float [[ARG]] to double
 ; CHECK-NEXT:    [[MUL:%.*]] = fmul double [[CAST]], [[CAST]]
 ; CHECK-NEXT:    ret double [[MUL]]
@@ -1883,7 +1883,7 @@ define double @fpext(float noundef nofpclass(inf nan) %arg) {
 define float @atomicrmw_fadd(ptr %ptr, float nofpclass(inf nan) %val) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn memory(argmem: readwrite)
 ; CHECK-LABEL: define float @atomicrmw_fadd
-; CHECK-SAME: (ptr nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], float nofpclass(nan inf) [[VAL:%.*]]) #[[ATTR6:[0-9]+]] {
+; CHECK-SAME: (ptr nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], float nofpclass(nan inf) [[VAL:%.*]]) #[[ATTR5:[0-9]+]] {
 ; CHECK-NEXT:    [[RESULT:%.*]] = atomicrmw fadd ptr [[PTR]], float [[VAL]] seq_cst, align 4
 ; CHECK-NEXT:    ret float [[RESULT]]
 ;
@@ -1894,7 +1894,7 @@ define float @atomicrmw_fadd(ptr %ptr, float nofpclass(inf nan) %val) {
 define float @load(ptr %ptr, float noundef nofpclass(nan inf) %val) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
 ; CHECK-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) float @load
-; CHECK-SAME: (ptr nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], float noundef nofpclass(nan inf) [[VAL:%.*]]) #[[ATTR7:[0-9]+]] {
+; CHECK-SAME: (ptr nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], float noundef nofpclass(nan inf) [[VAL:%.*]]) #[[ATTR6:[0-9]+]] {
 ; CHECK-NEXT:    store float [[VAL]], ptr [[PTR]], align 4
 ; CHECK-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR]], align 4, !noundef [[META0:![0-9]+]]
 ; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[LOAD]], [[LOAD]]
@@ -1909,7 +1909,7 @@ define float @load(ptr %ptr, float noundef nofpclass(nan inf) %val) {
 define float @load_atomic(ptr %ptr, float noundef nofpclass(nan inf) %val) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn memory(argmem: readwrite)
 ; CHECK-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) float @load_atomic
-; CHECK-SAME: (ptr nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], float noundef nofpclass(nan inf) [[VAL:%.*]]) #[[ATTR6]] {
+; CHECK-SAME: (ptr nofree noundef nonnull align 4 captures(none) dereferenceable(4) [[PTR:%.*]], float noundef nofpclass(nan inf) [[VAL:%.*]]) #[[ATTR5]] {
 ; CHECK-NEXT:    store atomic float [[VAL]], ptr [[PTR]] seq_cst, align 4
 ; CHECK-NEXT:    [[LOAD:%.*]] = load atomic float, ptr [[PTR]] seq_cst, align 4, !noundef [[META0]]
 ; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[LOAD]], [[LOAD]]
@@ -1924,7 +1924,7 @@ define float @load_atomic(ptr %ptr, float noundef nofpclass(nan inf) %val) {
 define <8 x float> @shufflevector_shufflevector(<4 x float> nofpclass(inf nan) %arg0, <4 x float> nofpclass(inf nan zero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf) <8 x float> @shufflevector_shufflevector
-; CHECK-SAME: (<4 x float> nofpclass(nan inf) [[ARG0:%.*]], <4 x float> nofpclass(nan inf zero) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<4 x float> nofpclass(nan inf) [[ARG0:%.*]], <4 x float> nofpclass(nan inf zero) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHUFFLE0:%.*]] = shufflevector <4 x float> [[ARG0]], <4 x float> [[ARG0]], <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK-NEXT:    [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[ARG1]], <4 x float> [[ARG1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; CHECK-NEXT:    [[SHUFFLE2:%.*]] = shufflevector <4 x float> [[SHUFFLE0]], <4 x float> [[SHUFFLE1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -1937,22 +1937,22 @@ define <8 x float> @shufflevector_shufflevector(<4 x float> nofpclass(inf nan) %
 }
 
 define float @constrained_sitofp(i32 %arg) strictfp {
-; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite)
-; CHECK-LABEL: define nofpclass(nan nzero sub) float @constrained_sitofp
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR8:[0-9]+]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(nan nzero sub) float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR24]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind strictfp willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan inf nzero sub) float @constrained_sitofp
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR7:[0-9]+]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = sitofp i32 [[ARG]] to float
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
 }
 
 define float @constrained_uitofp(i32 %arg) strictfp {
-; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite)
-; CHECK-LABEL: define nofpclass(nan ninf nzero sub nnorm) float @constrained_uitofp
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR8]] {
-; CHECK-NEXT:    [[VAL:%.*]] = call nofpclass(nan ninf nzero sub nnorm) float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[ARG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR24]]
-; CHECK-NEXT:    ret float [[VAL]]
+; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind strictfp willreturn memory(none)
+; CHECK-LABEL: define nofpclass(nan inf nzero sub nnorm) float @constrained_uitofp
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR7]] {
+; CHECK-NEXT:    [[VAL1:%.*]] = uitofp i32 [[ARG]] to float
+; CHECK-NEXT:    ret float [[VAL1]]
 ;
   %val = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %arg, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %val
@@ -1961,7 +1961,7 @@ define float @constrained_uitofp(i32 %arg) strictfp {
 define float @fadd_p0(float %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_p0
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -1972,7 +1972,7 @@ define float @fadd_p0(float %arg0) {
 define float @fadd_n0(float %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_n0
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], -0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -1983,7 +1983,7 @@ define float @fadd_n0(float %arg0) {
 define float @fsub_p0(float %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fsub_p0
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -1994,7 +1994,7 @@ define float @fsub_p0(float %arg0) {
 define float @fsub_n0(float %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fsub_n0
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG0]], -0.000000e+00
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2005,7 +2005,7 @@ define float @fsub_n0(float %arg0) {
 define float @fsub_p0_commute(float %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fsub_p0_commute
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float 0.000000e+00, [[ARG0]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2016,7 +2016,7 @@ define float @fsub_p0_commute(float %arg0) {
 define float @fsub_n0_commute(float %arg0) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fsub_n0_commute
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float -0.000000e+00, [[ARG0]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2027,7 +2027,7 @@ define float @fsub_n0_commute(float %arg0) {
 define float @fadd_p0_ftz_daz(float %arg0) #3 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(positivezero) memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_p0_ftz_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR9:[0-9]+]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR8:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2038,7 +2038,7 @@ define float @fadd_p0_ftz_daz(float %arg0) #3 {
 define float @fadd_n0_ftz_daz(float %arg0) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fadd_n0_ftz_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10:[0-9]+]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR9:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], -0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2049,7 +2049,7 @@ define float @fadd_n0_ftz_daz(float %arg0) #0 {
 define float @fsub_p0_ftz_daz(float %arg0) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fsub_p0_ftz_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2060,7 +2060,7 @@ define float @fsub_p0_ftz_daz(float %arg0) #0 {
 define float @fsub_n0_ftz_daz(float %arg0) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fsub_n0_ftz_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG0]], -0.000000e+00
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2071,7 +2071,7 @@ define float @fsub_n0_ftz_daz(float %arg0) #0 {
 define float @fsub_p0_commute_ftz_daz(float %arg0) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fsub_p0_commute_ftz_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float 0.000000e+00, [[ARG0]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2082,7 +2082,7 @@ define float @fsub_p0_commute_ftz_daz(float %arg0) #0 {
 define float @fsub_n0_commute_ftz_daz(float %arg0) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fsub_n0_commute_ftz_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float -0.000000e+00, [[ARG0]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2092,7 +2092,7 @@ define float @fsub_n0_commute_ftz_daz(float %arg0) #0 {
 
 define float @fadd_p0_ieee_daz(float %arg0) #2 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_p0_ieee_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR11:[0-9]+]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2102,7 +2102,7 @@ define float @fadd_p0_ieee_daz(float %arg0) #2 {
 
 define float @fadd_p0_dapz_ieee(float %arg0) #4 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_p0_dapz_ieee
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR12:[0-9]+]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR11:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2112,7 +2112,7 @@ define float @fadd_p0_dapz_ieee(float %arg0) #4 {
 
 define float @fadd_n0_ieee_daz(float %arg0) #2 {
 ; CHECK-LABEL: define float @fadd_n0_ieee_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], -0.000000e+00
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2122,7 +2122,7 @@ define float @fadd_n0_ieee_daz(float %arg0) #2 {
 
 define float @fsub_p0_ieee_daz(float %arg0) #2 {
 ; CHECK-LABEL: define float @fsub_p0_ieee_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG0]], 0.000000e+00
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2132,7 +2132,7 @@ define float @fsub_p0_ieee_daz(float %arg0) #2 {
 
 define float @fsub_n0_ieee_daz(float %arg0) #2 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fsub_n0_ieee_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG0]], -0.000000e+00
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2142,7 +2142,7 @@ define float @fsub_n0_ieee_daz(float %arg0) #2 {
 
 define float @fsub_p0_commute_ieee_daz(float %arg0) #2 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fsub_p0_commute_ieee_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float 0.000000e+00, [[ARG0]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2152,7 +2152,7 @@ define float @fsub_p0_commute_ieee_daz(float %arg0) #2 {
 
 define float @fsub_n0_commute_ieee_daz(float %arg0) #1 {
 ; CHECK-LABEL: define float @fsub_n0_commute_ieee_daz
-; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR13:[0-9]+]] {
+; CHECK-SAME: (float [[ARG0:%.*]]) #[[ATTR12:[0-9]+]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float -0.000000e+00, [[ARG0]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -2163,7 +2163,7 @@ define float @fsub_n0_commute_ieee_daz(float %arg0) #1 {
 define float @fadd_never_negzero_or_negsub(float nofpclass(nzero nsub) %a, float nofpclass(nzero nsub) %b) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_negsub
-; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2174,7 +2174,7 @@ define float @fadd_never_negzero_or_negsub(float nofpclass(nzero nsub) %a, float
 define float @fadd_never_negzero_or_ftz_daz(float nofpclass(nzero nsub) %a, float nofpclass(nzero nsub) %b) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fadd_never_negzero_or_ftz_daz
-; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2184,7 +2184,7 @@ define float @fadd_never_negzero_or_ftz_daz(float nofpclass(nzero nsub) %a, floa
 
 define float @fadd_never_negzero_or_negsub_daz(float nofpclass(nzero nsub) %a, float nofpclass(nzero nsub) %b) #2 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_negsub_daz
-; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2194,7 +2194,7 @@ define float @fadd_never_negzero_or_negsub_daz(float nofpclass(nzero nsub) %a, f
 
 define float @fadd_never_negzero_or_negsub_dapz(float nofpclass(nzero nsub) %a, float nofpclass(nzero nsub) %b) #5 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_negsub_dapz
-; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR14:[0-9]+]] {
+; CHECK-SAME: (float nofpclass(nzero nsub) [[A:%.*]], float nofpclass(nzero nsub) [[B:%.*]]) #[[ATTR13:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2205,7 +2205,7 @@ define float @fadd_never_negzero_or_negsub_dapz(float nofpclass(nzero nsub) %a,
 define float @fadd_never_negzero_or_possub(float nofpclass(nzero psub) %a, float nofpclass(nzero psub) %b) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_possub
-; CHECK-SAME: (float nofpclass(nzero psub) [[A:%.*]], float nofpclass(nzero psub) [[B:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nzero psub) [[A:%.*]], float nofpclass(nzero psub) [[B:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2215,7 +2215,7 @@ define float @fadd_never_negzero_or_possub(float nofpclass(nzero psub) %a, float
 
 define float @fadd_never_negzero_or_possub_daz(float nofpclass(nzero psub) %a, float nofpclass(nzero psub) %b) #2 {
 ; CHECK-LABEL: define float @fadd_never_negzero_or_possub_daz
-; CHECK-SAME: (float nofpclass(nzero psub) [[A:%.*]], float nofpclass(nzero psub) [[B:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float nofpclass(nzero psub) [[A:%.*]], float nofpclass(nzero psub) [[B:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2225,7 +2225,7 @@ define float @fadd_never_negzero_or_possub_daz(float nofpclass(nzero psub) %a, f
 
 define float @fadd_never_negzero_or_possub_dapz(float nofpclass(nzero psub) %a, float nofpclass(nzero psub) %b) #5 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_possub_dapz
-; CHECK-SAME: (float nofpclass(nzero psub) [[A:%.*]], float nofpclass(nzero psub) [[B:%.*]]) #[[ATTR14]] {
+; CHECK-SAME: (float nofpclass(nzero psub) [[A:%.*]], float nofpclass(nzero psub) [[B:%.*]]) #[[ATTR13]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2235,7 +2235,7 @@ define float @fadd_never_negzero_or_possub_dapz(float nofpclass(nzero psub) %a,
 
 define float @fadd_never_negzero_or_sub_daz(float nofpclass(nzero sub) %a, float nofpclass(nzero sub) %b) #2 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_sub_daz
-; CHECK-SAME: (float nofpclass(nzero sub) [[A:%.*]], float nofpclass(nzero sub) [[B:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float nofpclass(nzero sub) [[A:%.*]], float nofpclass(nzero sub) [[B:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2245,7 +2245,7 @@ define float @fadd_never_negzero_or_sub_daz(float nofpclass(nzero sub) %a, float
 
 define float @fadd_never_negzero_or_sub_dapz(float nofpclass(nzero sub) %a, float nofpclass(nzero sub) %b) #5 {
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_never_negzero_or_sub_dapz
-; CHECK-SAME: (float nofpclass(nzero sub) [[A:%.*]], float nofpclass(nzero sub) [[B:%.*]]) #[[ATTR14]] {
+; CHECK-SAME: (float nofpclass(nzero sub) [[A:%.*]], float nofpclass(nzero sub) [[B:%.*]]) #[[ATTR13]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2256,7 +2256,7 @@ define float @fadd_never_negzero_or_sub_dapz(float nofpclass(nzero sub) %a, floa
 define float @fadd_known_positive_lhs(float nofpclass(ninf nsub nnorm) %arg0, float %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_known_positive_lhs
-; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2267,7 +2267,7 @@ define float @fadd_known_positive_lhs(float nofpclass(ninf nsub nnorm) %arg0, fl
 define float @fadd_known_positive_rhs(float %arg0, float nofpclass(ninf nsub nnorm) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_known_positive_rhs
-; CHECK-SAME: (float [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2278,7 +2278,7 @@ define float @fadd_known_positive_rhs(float %arg0, float nofpclass(ninf nsub nno
 define float @fadd_known_positive(float nofpclass(ninf nsub nnorm) %arg0, float nofpclass(ninf nsub nnorm) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nsub nnorm) float @fadd_known_positive
-; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2289,7 +2289,7 @@ define float @fadd_known_positive(float nofpclass(ninf nsub nnorm) %arg0, float
 define float @fadd_known_positive_daz(float nofpclass(ninf nsub nnorm) %arg0, float nofpclass(ninf nsub nnorm) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nsub nnorm) float @fadd_known_positive_daz
-; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2344,7 +2344,7 @@ define float @test_fadd_may_nan_from_no_ninf_no_pinf(float nofpclass(nan ninf) %
 define float @fadd_known_positive_nzero_lhs(float nofpclass(ninf nsub nnorm nzero) %arg0, float nofpclass(ninf nsub nnorm) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @fadd_known_positive_nzero_lhs
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nsub nnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2355,7 +2355,7 @@ define float @fadd_known_positive_nzero_lhs(float nofpclass(ninf nsub nnorm nzer
 define float @fadd_known_positive_nzero_rhs(float nofpclass(ninf nsub nnorm) %arg0, float nofpclass(ninf nsub nnorm nzero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @fadd_known_positive_nzero_rhs
-; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2366,7 +2366,7 @@ define float @fadd_known_positive_nzero_rhs(float nofpclass(ninf nsub nnorm) %ar
 define float @fadd_known_positive_nzero(float nofpclass(ninf nsub nnorm nzero) %arg0, float nofpclass(ninf nsub nnorm nzero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @fadd_known_positive_nzero
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2377,7 +2377,7 @@ define float @fadd_known_positive_nzero(float nofpclass(ninf nsub nnorm nzero) %
 define float @fadd_known_positive_nzero_ftz_daz(float nofpclass(ninf nsub nnorm nzero) %arg0, float nofpclass(ninf nsub nnorm nzero) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nsub nnorm) float @fadd_known_positive_nzero_ftz_daz
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2387,7 +2387,7 @@ define float @fadd_known_positive_nzero_ftz_daz(float nofpclass(ninf nsub nnorm
 
 define float @fadd_known_positive_nzero_ftz(float nofpclass(ninf nsub nnorm nzero) %arg0, float nofpclass(ninf nsub nnorm nzero) %arg1) #1 {
 ; CHECK-LABEL: define nofpclass(ninf nsub nnorm) float @fadd_known_positive_nzero_ftz
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR13]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR12]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2397,7 +2397,7 @@ define float @fadd_known_positive_nzero_ftz(float nofpclass(ninf nsub nnorm nzer
 
 define float @fadd_known_positive_nzero_daz(float nofpclass(ninf nsub nnorm nzero) %arg0, float nofpclass(ninf nsub nnorm nzero) %arg1) #2 {
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @fadd_known_positive_nzero_daz
-; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nsub nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nsub nnorm) [[ARG1:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2408,7 +2408,7 @@ define float @fadd_known_positive_nzero_daz(float nofpclass(ninf nsub nnorm nzer
 define float @fadd_known_positive_normal(float nofpclass(ninf nnorm nzero) %arg0, float nofpclass(ninf nnorm nzero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_known_positive_normal
-; CHECK-SAME: (float nofpclass(ninf nzero nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2419,7 +2419,7 @@ define float @fadd_known_positive_normal(float nofpclass(ninf nnorm nzero) %arg0
 define float @fadd_known_positive_normal_daz(float nofpclass(ninf nnorm nzero) %arg0, float nofpclass(ninf nnorm nzero) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fadd_known_positive_normal_daz
-; CHECK-SAME: (float nofpclass(ninf nzero nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2430,7 +2430,7 @@ define float @fadd_known_positive_normal_daz(float nofpclass(ninf nnorm nzero) %
 define float @fadd_known_positive_normal_except0_daz(float nofpclass(ninf nnorm) %arg0, float nofpclass(ninf nnorm) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fadd_known_positive_normal_except0_daz
-; CHECK-SAME: (float nofpclass(ninf nnorm) [[ARG0:%.*]], float nofpclass(ninf nnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(ninf nnorm) [[ARG0:%.*]], float nofpclass(ninf nnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2441,7 +2441,7 @@ define float @fadd_known_positive_normal_except0_daz(float nofpclass(ninf nnorm)
 define float @fadd_known_positive_normal_dapz(float nofpclass(ninf nnorm nzero) %arg0, float nofpclass(ninf nnorm nzero) %arg1) #3 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(positivezero) memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_known_positive_normal_dapz
-; CHECK-SAME: (float nofpclass(ninf nzero nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nnorm) [[ARG1:%.*]]) #[[ATTR9]] {
+; CHECK-SAME: (float nofpclass(ninf nzero nnorm) [[ARG0:%.*]], float nofpclass(ninf nzero nnorm) [[ARG1:%.*]]) #[[ATTR8]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -2452,14 +2452,14 @@ define float @fadd_known_positive_normal_dapz(float nofpclass(ninf nnorm nzero)
 define internal float @returns_fence(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define internal float @returns_fence
-; TUNIT-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float nofpclass(inf) [[ARG]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float nofpclass(inf) [[ARG]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[RET]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define internal float @returns_fence
-; CGSCC-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
-; CGSCC-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float nofpclass(nan inf) [[ARG]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
+; CGSCC-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float nofpclass(nan inf) [[ARG]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.arithmetic.fence.f32(float %arg)
@@ -2470,16 +2470,16 @@ define internal float @returns_fence(float %arg) {
 define internal float @refine_callsite_attribute(float nofpclass(inf) %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define internal float @refine_callsite_attribute
-; TUNIT-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FUNC0:%.*]] = call float @returns_fence(float nofpclass(nan inf) [[ARG]]) #[[ATTR23]]
-; TUNIT-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float [[FUNC0]]) #[[ATTR26]]
+; TUNIT-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FUNC0:%.*]] = call float @returns_fence(float nofpclass(nan inf) [[ARG]]) #[[ATTR22]]
+; TUNIT-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float [[FUNC0]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[RET]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define internal float @refine_callsite_attribute
-; CGSCC-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR4]] {
-; CGSCC-NEXT:    [[FUNC0:%.*]] = call float @returns_fence(float nofpclass(nan inf) [[ARG]]) #[[ATTR23]]
-; CGSCC-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float [[FUNC0]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-NEXT:    [[FUNC0:%.*]] = call float @returns_fence(float nofpclass(nan inf) [[ARG]]) #[[ATTR22]]
+; CGSCC-NEXT:    [[RET:%.*]] = call float @llvm.arithmetic.fence.f32(float [[FUNC0]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[RET]]
 ;
   %func0 = call float @returns_fence(float nofpclass(nan) %arg)
@@ -2490,14 +2490,14 @@ define internal float @refine_callsite_attribute(float nofpclass(inf) %arg) {
 define float @user(float %arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define float @user
-; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
-; TUNIT-NEXT:    [[FUNC1:%.*]] = call float @refine_callsite_attribute(float nofpclass(inf) [[ARG]]) #[[ATTR23]]
+; TUNIT-SAME: (float nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
+; TUNIT-NEXT:    [[FUNC1:%.*]] = call float @refine_callsite_attribute(float nofpclass(inf) [[ARG]]) #[[ATTR22]]
 ; TUNIT-NEXT:    ret float [[FUNC1]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define float @user
-; CGSCC-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR4]] {
-; CGSCC-NEXT:    [[FUNC1:%.*]] = call float @refine_callsite_attribute(float nofpclass(nan inf) [[ARG]]) #[[ATTR23]]
+; CGSCC-SAME: (float nofpclass(nan inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CGSCC-NEXT:    [[FUNC1:%.*]] = call float @refine_callsite_attribute(float nofpclass(nan inf) [[ARG]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[FUNC1]]
 ;
   %func1 = call float @refine_callsite_attribute(float %arg)
@@ -2508,7 +2508,7 @@ define float @user(float %arg) {
 define internal float @through_memory0(ptr %ptr.arg) {
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define internal float @through_memory0
-; CGSCC-SAME: (float [[TMP0:%.*]]) #[[ATTR3]] {
+; CGSCC-SAME: (float [[TMP0:%.*]]) #[[ATTR2]] {
 ; CGSCC-NEXT:    [[PTR_ARG_PRIV:%.*]] = alloca float, align 4
 ; CGSCC-NEXT:    store float [[TMP0]], ptr [[PTR_ARG_PRIV]], align 4
 ; CGSCC-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR_ARG_PRIV]], align 4, !invariant.load [[META0]]
@@ -2522,20 +2522,20 @@ define internal float @through_memory0(ptr %ptr.arg) {
 define internal float @through_memory1(ptr %ptr.arg) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define internal float @through_memory1
-; TUNIT-SAME: (float [[TMP0:%.*]]) #[[ATTR3]] {
+; TUNIT-SAME: (float [[TMP0:%.*]]) #[[ATTR2]] {
 ; TUNIT-NEXT:    [[PTR_ARG_PRIV:%.*]] = alloca float, align 4
 ; TUNIT-NEXT:    store float [[TMP0]], ptr [[PTR_ARG_PRIV]], align 4
 ; TUNIT-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR_ARG_PRIV]], align 4
-; TUNIT-NEXT:    [[CALL:%.*]] = call float @llvm.arithmetic.fence.f32(float [[LOAD]]) #[[ATTR26]]
+; TUNIT-NEXT:    [[CALL:%.*]] = call float @llvm.arithmetic.fence.f32(float [[LOAD]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret float [[CALL]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define internal float @through_memory1
-; CGSCC-SAME: (float [[TMP0:%.*]]) #[[ATTR3]] {
+; CGSCC-SAME: (float [[TMP0:%.*]]) #[[ATTR2]] {
 ; CGSCC-NEXT:    [[PTR_ARG_PRIV:%.*]] = alloca float, align 4
 ; CGSCC-NEXT:    store float [[TMP0]], ptr [[PTR_ARG_PRIV]], align 4
 ; CGSCC-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR_ARG_PRIV]], align 4, !invariant.load [[META0]]
-; CGSCC-NEXT:    [[CALL:%.*]] = call float @llvm.arithmetic.fence.f32(float [[LOAD]]) #[[ATTR23]]
+; CGSCC-NEXT:    [[CALL:%.*]] = call float @llvm.arithmetic.fence.f32(float [[LOAD]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[CALL]]
 ;
   %load = load float, ptr %ptr.arg
@@ -2547,7 +2547,7 @@ define internal float @through_memory1(ptr %ptr.arg) {
 define internal float @through_memory2(ptr %ptr.arg) {
 ; CHECK: Function Attrs: memory(readwrite, argmem: none)
 ; CHECK-LABEL: define internal float @through_memory2
-; CHECK-SAME: (float [[TMP0:%.*]]) #[[ATTR15:[0-9]+]] {
+; CHECK-SAME: (float [[TMP0:%.*]]) #[[ATTR14:[0-9]+]] {
 ; CHECK-NEXT:    [[PTR_ARG_PRIV:%.*]] = alloca float, align 4
 ; CHECK-NEXT:    store float [[TMP0]], ptr [[PTR_ARG_PRIV]], align 4
 ; CHECK-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR_ARG_PRIV]], align 4, !invariant.load [[META0]]
@@ -2562,17 +2562,17 @@ define internal float @through_memory2(ptr %ptr.arg) {
 define float @call_through_memory0(float nofpclass(nan) %val) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define nofpclass(nan) float @call_through_memory0
-; TUNIT-SAME: (float returned nofpclass(nan) [[VAL:%.*]]) #[[ATTR3]] {
+; TUNIT-SAME: (float returned nofpclass(nan) [[VAL:%.*]]) #[[ATTR2]] {
 ; TUNIT-NEXT:    [[ALLOCA:%.*]] = alloca float, align 4
 ; TUNIT-NEXT:    store float [[VAL]], ptr [[ALLOCA]], align 4
 ; TUNIT-NEXT:    ret float [[VAL]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define float @call_through_memory0
-; CGSCC-SAME: (float nofpclass(nan) [[VAL:%.*]]) #[[ATTR4]] {
+; CGSCC-SAME: (float nofpclass(nan) [[VAL:%.*]]) #[[ATTR3]] {
 ; CGSCC-NEXT:    [[ALLOCA:%.*]] = alloca float, align 4
 ; CGSCC-NEXT:    store float [[VAL]], ptr [[ALLOCA]], align 4
-; CGSCC-NEXT:    [[THROUGH_MEMORY:%.*]] = call float @through_memory0(float nofpclass(nan) [[VAL]]) #[[ATTR23]]
+; CGSCC-NEXT:    [[THROUGH_MEMORY:%.*]] = call float @through_memory0(float nofpclass(nan) [[VAL]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[THROUGH_MEMORY]]
 ;
   %alloca = alloca float
@@ -2584,19 +2584,19 @@ define float @call_through_memory0(float nofpclass(nan) %val) {
 define float @call_through_memory1(float nofpclass(nan) %val) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define float @call_through_memory1
-; TUNIT-SAME: (float nofpclass(nan) [[VAL:%.*]]) #[[ATTR3]] {
+; TUNIT-SAME: (float nofpclass(nan) [[VAL:%.*]]) #[[ATTR2]] {
 ; TUNIT-NEXT:    [[ALLOCA:%.*]] = alloca float, align 4
 ; TUNIT-NEXT:    store float [[VAL]], ptr [[ALLOCA]], align 4
 ; TUNIT-NEXT:    [[TMP1:%.*]] = load float, ptr [[ALLOCA]], align 4
-; TUNIT-NEXT:    [[THROUGH_MEMORY:%.*]] = call float @through_memory1(float [[TMP1]]) #[[ATTR25]]
+; TUNIT-NEXT:    [[THROUGH_MEMORY:%.*]] = call float @through_memory1(float [[TMP1]]) #[[ATTR24]]
 ; TUNIT-NEXT:    ret float [[THROUGH_MEMORY]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define float @call_through_memory1
-; CGSCC-SAME: (float nofpclass(nan) [[VAL:%.*]]) #[[ATTR4]] {
+; CGSCC-SAME: (float nofpclass(nan) [[VAL:%.*]]) #[[ATTR3]] {
 ; CGSCC-NEXT:    [[ALLOCA:%.*]] = alloca float, align 4
 ; CGSCC-NEXT:    store float [[VAL]], ptr [[ALLOCA]], align 4
-; CGSCC-NEXT:    [[THROUGH_MEMORY:%.*]] = call float @through_memory1(float nofpclass(nan) [[VAL]]) #[[ATTR23]]
+; CGSCC-NEXT:    [[THROUGH_MEMORY:%.*]] = call float @through_memory1(float nofpclass(nan) [[VAL]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret float [[THROUGH_MEMORY]]
 ;
   %alloca = alloca float
@@ -2660,7 +2660,7 @@ entry:
 define internal float @pow_wrapper(float %arg, float %arg1) {
 ; TUNIT: Function Attrs: norecurse
 ; TUNIT-LABEL: define internal float @pow_wrapper
-; TUNIT-SAME: (float noundef nofpclass(nan inf) [[ARG:%.*]], float noundef nofpclass(nan inf) [[ARG1:%.*]]) #[[ATTR16:[0-9]+]] {
+; TUNIT-SAME: (float noundef nofpclass(nan inf) [[ARG:%.*]], float noundef nofpclass(nan inf) [[ARG1:%.*]]) #[[ATTR15:[0-9]+]] {
 ; TUNIT-NEXT:  bb:
 ; TUNIT-NEXT:    [[I:%.*]] = tail call float @pow_impl(float noundef nofpclass(nan inf) [[ARG]], float noundef nofpclass(nan inf) [[ARG1]])
 ; TUNIT-NEXT:    ret float [[I]]
@@ -2680,14 +2680,14 @@ bb:
 define internal float @pow_impl(float %arg, float %arg1) {
 ; TUNIT: Function Attrs: norecurse
 ; TUNIT-LABEL: define internal float @pow_impl
-; TUNIT-SAME: (float noundef nofpclass(nan inf) [[ARG:%.*]], float noundef nofpclass(nan inf) [[ARG1:%.*]]) #[[ATTR16]] {
+; TUNIT-SAME: (float noundef nofpclass(nan inf) [[ARG:%.*]], float noundef nofpclass(nan inf) [[ARG1:%.*]]) #[[ATTR15]] {
 ; TUNIT-NEXT:  bb:
 ; TUNIT-NEXT:    [[IMPLEMENT_POW:%.*]] = call float asm "
 ; TUNIT-NEXT:    ret float [[IMPLEMENT_POW]]
 ;
 ; CGSCC: Function Attrs: norecurse
 ; CGSCC-LABEL: define internal float @pow_impl
-; CGSCC-SAME: (float [[ARG:%.*]], float [[ARG1:%.*]]) #[[ATTR16:[0-9]+]] {
+; CGSCC-SAME: (float [[ARG:%.*]], float [[ARG1:%.*]]) #[[ATTR15:[0-9]+]] {
 ; CGSCC-NEXT:  bb:
 ; CGSCC-NEXT:    [[IMPLEMENT_POW:%.*]] = call float asm "
 ; CGSCC-NEXT:    ret float [[IMPLEMENT_POW]]
@@ -2700,7 +2700,7 @@ bb:
 define [4 x float] @constant_aggregate_zero() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub norm) [4 x float] @constant_aggregate_zero
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret [4 x float] zeroinitializer
 ;
   ret [4 x float] zeroinitializer
@@ -2709,7 +2709,7 @@ define [4 x float] @constant_aggregate_zero() {
 define <vscale x 4 x float> @scalable_splat_pnorm() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nan inf zero sub nnorm) <vscale x 4 x float> @scalable_splat_pnorm
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret <vscale x 4 x float> splat (float 1.000000e+00)
 ;
   ret <vscale x 4 x float> splat (float 1.0)
@@ -2718,7 +2718,7 @@ define <vscale x 4 x float> @scalable_splat_pnorm() {
 define <vscale x 4 x float> @scalable_splat_zero() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nan inf nzero sub norm) <vscale x 4 x float> @scalable_splat_zero
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret <vscale x 4 x float> zeroinitializer
 ;
   ret <vscale x 4 x float> zeroinitializer
@@ -2727,7 +2727,7 @@ define <vscale x 4 x float> @scalable_splat_zero() {
 define <vscale x 4 x float> @scalable_splat_nnan(float nofpclass(nan) %x) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan) <vscale x 4 x float> @scalable_splat_nnan
-; CHECK-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nan) [[X:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[HEAD:%.*]] = insertelement <vscale x 4 x float> poison, float [[X]], i32 0
 ; CHECK-NEXT:    [[SPLAT:%.*]] = shufflevector <vscale x 4 x float> [[HEAD]], <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer
 ; CHECK-NEXT:    ret <vscale x 4 x float> [[SPLAT]]
@@ -2743,16 +2743,16 @@ define <vscale x 4 x float> @scalable_splat_nnan(float nofpclass(nan) %x) {
 define double @call_abs(double noundef %__x) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) double @call_abs
-; TUNIT-SAME: (double noundef [[__X:%.*]]) #[[ATTR3]] {
+; TUNIT-SAME: (double noundef [[__X:%.*]]) #[[ATTR2]] {
 ; TUNIT-NEXT:  entry:
-; TUNIT-NEXT:    [[ABS:%.*]] = tail call noundef nofpclass(ninf nzero nsub nnorm) double @llvm.fabs.f64(double noundef [[__X]]) #[[ATTR26]]
+; TUNIT-NEXT:    [[ABS:%.*]] = tail call noundef nofpclass(ninf nzero nsub nnorm) double @llvm.fabs.f64(double noundef [[__X]]) #[[ATTR25]]
 ; TUNIT-NEXT:    ret double [[ABS]]
 ;
 ; CGSCC: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) double @call_abs
-; CGSCC-SAME: (double noundef [[__X:%.*]]) #[[ATTR3]] {
+; CGSCC-SAME: (double noundef [[__X:%.*]]) #[[ATTR2]] {
 ; CGSCC-NEXT:  entry:
-; CGSCC-NEXT:    [[ABS:%.*]] = tail call noundef nofpclass(ninf nzero nsub nnorm) double @llvm.fabs.f64(double noundef [[__X]]) #[[ATTR23]]
+; CGSCC-NEXT:    [[ABS:%.*]] = tail call noundef nofpclass(ninf nzero nsub nnorm) double @llvm.fabs.f64(double noundef [[__X]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret double [[ABS]]
 ;
 entry:
@@ -2763,7 +2763,7 @@ entry:
 define float @bitcast_to_float_sign_0(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) float @bitcast_to_float_sign_0
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 [[ARG]], 1
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[SHR]] to float
 ; CHECK-NEXT:    ret float [[CAST]]
@@ -2776,7 +2776,7 @@ define float @bitcast_to_float_sign_0(i32 %arg) {
 define float @bitcast_to_float_nnan(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero nsub nnorm) float @bitcast_to_float_nnan
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 [[ARG]], 2
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[SHR]] to float
 ; CHECK-NEXT:    ret float [[CAST]]
@@ -2789,7 +2789,7 @@ define float @bitcast_to_float_nnan(i32 %arg) {
 define float @bitcast_to_float_sign_1(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) float @bitcast_to_float_sign_1
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or i32 [[ARG]], -2147483648
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[OR]] to float
 ; CHECK-NEXT:    ret float [[CAST]]
@@ -2802,7 +2802,7 @@ define float @bitcast_to_float_sign_1(i32 %arg) {
 define float @bitcast_to_float_nan(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf zero sub norm) float @bitcast_to_float_nan
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or i32 [[ARG]], 2139095041
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[OR]] to float
 ; CHECK-NEXT:    ret float [[CAST]]
@@ -2815,7 +2815,7 @@ define float @bitcast_to_float_nan(i32 %arg) {
 define float @bitcast_to_float_zero(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf sub norm) float @bitcast_to_float_zero
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[ARG]], 31
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[SHL]] to float
 ; CHECK-NEXT:    ret float [[CAST]]
@@ -2828,7 +2828,7 @@ define float @bitcast_to_float_zero(i32 %arg) {
 define float @bitcast_to_float_nzero(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(zero) float @bitcast_to_float_nzero
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or i32 [[ARG]], 134217728
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[OR]] to float
 ; CHECK-NEXT:    ret float [[CAST]]
@@ -2841,7 +2841,7 @@ define float @bitcast_to_float_nzero(i32 %arg) {
 define float @bitcast_to_float_inf(i32 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan zero sub norm) float @bitcast_to_float_inf
-; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = shl i32 [[ARG]], 31
 ; CHECK-NEXT:    [[OR:%.*]] = or i32 [[SHR]], 2139095040
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i32 [[OR]] to float
@@ -2856,7 +2856,7 @@ define float @bitcast_to_float_inf(i32 %arg) {
 define double @bitcast_to_double_sign_0(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) double @bitcast_to_double_sign_0
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr i64 [[ARG]], 1
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[SHR]] to double
 ; CHECK-NEXT:    ret double [[CAST]]
@@ -2869,7 +2869,7 @@ define double @bitcast_to_double_sign_0(i64 %arg) {
 define double @bitcast_to_double_nnan(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero nsub nnorm) double @bitcast_to_double_nnan
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr i64 [[ARG]], 2
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[SHR]] to double
 ; CHECK-NEXT:    ret double [[CAST]]
@@ -2882,7 +2882,7 @@ define double @bitcast_to_double_nnan(i64 %arg) {
 define double @bitcast_to_double_sign_1(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) double @bitcast_to_double_sign_1
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or i64 [[ARG]], -9223372036854775808
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[OR]] to double
 ; CHECK-NEXT:    ret double [[CAST]]
@@ -2895,7 +2895,7 @@ define double @bitcast_to_double_sign_1(i64 %arg) {
 define double @bitcast_to_double_nan(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf zero sub norm) double @bitcast_to_double_nan
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or i64 [[ARG]], -4503599627370495
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[OR]] to double
 ; CHECK-NEXT:    ret double [[CAST]]
@@ -2909,7 +2909,7 @@ define double @bitcast_to_double_nan(i64 %arg) {
 define double @bitcast_to_double_zero(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf sub norm) double @bitcast_to_double_zero
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHL:%.*]] = shl i64 [[ARG]], 63
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[SHL]] to double
 ; CHECK-NEXT:    ret double [[CAST]]
@@ -2922,7 +2922,7 @@ define double @bitcast_to_double_zero(i64 %arg) {
 define double @bitcast_to_double_nzero(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(zero) double @bitcast_to_double_nzero
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or i64 [[ARG]], 1152921504606846976
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[OR]] to double
 ; CHECK-NEXT:    ret double [[CAST]]
@@ -2935,7 +2935,7 @@ define double @bitcast_to_double_nzero(i64 %arg) {
 define double @bitcast_to_double_inf(i64 %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan zero sub norm) double @bitcast_to_double_inf
-; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (i64 [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = shl i64 [[ARG]], 63
 ; CHECK-NEXT:    [[OR:%.*]] = or i64 [[SHR]], 9218868437227405312
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast i64 [[OR]] to double
@@ -2951,7 +2951,7 @@ define double @bitcast_to_double_inf(i64 %arg) {
 define <2 x float> @bitcast_to_float_vect_sign_0(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf nzero nsub nnorm) <2 x float> @bitcast_to_float_vect_sign_0
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr <2 x i32> [[ARG]], <i32 1, i32 2>
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast <2 x i32> [[SHR]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CAST]]
@@ -2964,7 +2964,7 @@ define <2 x float> @bitcast_to_float_vect_sign_0(<2 x i32> %arg) {
 define <2 x float> @bitcast_to_float_vect_nnan(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero nsub nnorm) <2 x float> @bitcast_to_float_vect_nnan
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr <2 x i32> [[ARG]], splat (i32 4)
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast <2 x i32> [[SHR]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CAST]]
@@ -2977,7 +2977,7 @@ define <2 x float> @bitcast_to_float_vect_nnan(<2 x i32> %arg) {
 define <2 x float> @bitcast_to_float_vect_sign_1(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf pzero psub pnorm) <2 x float> @bitcast_to_float_vect_sign_1
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or <2 x i32> [[ARG]], splat (i32 -2147483648)
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CAST]]
@@ -2990,7 +2990,7 @@ define <2 x float> @bitcast_to_float_vect_sign_1(<2 x i32> %arg) {
 define <2 x float> @bitcast_to_float_vect_nan(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(inf zero sub norm) <2 x float> @bitcast_to_float_vect_nan
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or <2 x i32> [[ARG]], splat (i32 2139095041)
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CAST]]
@@ -3003,7 +3003,7 @@ define <2 x float> @bitcast_to_float_vect_nan(<2 x i32> %arg) {
 define <2 x float> @bitcast_to_float_vect_conservative_1(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <2 x float> @bitcast_to_float_vect_conservative_1
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or <2 x i32> [[ARG]], <i32 -2147483648, i32 0>
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CAST]]
@@ -3016,7 +3016,7 @@ define <2 x float> @bitcast_to_float_vect_conservative_1(<2 x i32> %arg) {
 define <2 x float> @bitcast_to_float_vect_conservative_2(<2 x i32> %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define <2 x float> @bitcast_to_float_vect_conservative_2
-; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x i32> [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[OR:%.*]] = or <2 x i32> [[ARG]], <i32 0, i32 2139095041>
 ; CHECK-NEXT:    [[CAST:%.*]] = bitcast <2 x i32> [[OR]] to <2 x float>
 ; CHECK-NEXT:    ret <2 x float> [[CAST]]
@@ -3029,7 +3029,7 @@ define <2 x float> @bitcast_to_float_vect_conservative_2(<2 x i32> %arg) {
 define float @fadd_double(float noundef %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double
-; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3040,7 +3040,7 @@ define float @fadd_double(float noundef %arg) {
 define float @fadd_double_nnan(float noundef nofpclass(nan) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_nnan
-; CHECK-SAME: (float noundef nofpclass(nan) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(nan) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3051,7 +3051,7 @@ define float @fadd_double_nnan(float noundef nofpclass(nan) %arg) {
 define float @fadd_double_no_pinf(float noundef nofpclass(pinf) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_pinf
-; CHECK-SAME: (float noundef nofpclass(pinf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3062,7 +3062,7 @@ define float @fadd_double_no_pinf(float noundef nofpclass(pinf) %arg) {
 define float @fadd_double_no_ninf(float noundef nofpclass(ninf) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_ninf
-; CHECK-SAME: (float noundef nofpclass(ninf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3073,7 +3073,7 @@ define float @fadd_double_no_ninf(float noundef nofpclass(ninf) %arg) {
 define float @fadd_double_no_inf(float noundef nofpclass(inf) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_inf
-; CHECK-SAME: (float noundef nofpclass(inf) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(inf) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3084,7 +3084,7 @@ define float @fadd_double_no_inf(float noundef nofpclass(inf) %arg) {
 define float @fadd_double_no_zero(float noundef nofpclass(zero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(zero) float @fadd_double_no_zero
-; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3095,7 +3095,7 @@ define float @fadd_double_no_zero(float noundef nofpclass(zero) %arg) {
 define float @fadd_double_known_positive_or_zero(float noundef nofpclass(ninf nnorm nsub) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(ninf nsub nnorm) float @fadd_double_known_positive_or_zero
-; CHECK-SAME: (float noundef nofpclass(ninf nsub nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3106,7 +3106,7 @@ define float @fadd_double_known_positive_or_zero(float noundef nofpclass(ninf nn
 define float @fadd_double_known_positive(float noundef nofpclass(ninf nnorm nsub nzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(ninf nzero nsub nnorm) float @fadd_double_known_positive
-; CHECK-SAME: (float noundef nofpclass(ninf nzero nsub nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf nzero nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3117,7 +3117,7 @@ define float @fadd_double_known_positive(float noundef nofpclass(ninf nnorm nsub
 define float @fadd_double_known_positive_non0(float noundef nofpclass(ninf nnorm nsub zero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(ninf zero nsub nnorm) float @fadd_double_known_positive_non0
-; CHECK-SAME: (float noundef nofpclass(ninf zero nsub nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf zero nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3128,7 +3128,7 @@ define float @fadd_double_known_positive_non0(float noundef nofpclass(ninf nnorm
 define float @fadd_double_known_negative_or_zero(float noundef nofpclass(pinf pnorm psub) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pinf psub pnorm) float @fadd_double_known_negative_or_zero
-; CHECK-SAME: (float noundef nofpclass(pinf psub pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf psub pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3139,7 +3139,7 @@ define float @fadd_double_known_negative_or_zero(float noundef nofpclass(pinf pn
 define float @fadd_double_known_negative(float noundef nofpclass(pinf pnorm psub pzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pinf pzero psub pnorm) float @fadd_double_known_negative
-; CHECK-SAME: (float noundef nofpclass(pinf pzero psub pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf pzero psub pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3150,7 +3150,7 @@ define float @fadd_double_known_negative(float noundef nofpclass(pinf pnorm psub
 define float @fadd_double_known_negative_non0(float noundef nofpclass(pinf pnorm psub zero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pinf zero psub pnorm) float @fadd_double_known_negative_non0
-; CHECK-SAME: (float noundef nofpclass(pinf zero psub pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf zero psub pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3161,7 +3161,7 @@ define float @fadd_double_known_negative_non0(float noundef nofpclass(pinf pnorm
 define float @fadd_double_no_nopinf_pnorm(float noundef nofpclass(pinf pnorm) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_nopinf_pnorm
-; CHECK-SAME: (float noundef nofpclass(pinf pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3172,7 +3172,7 @@ define float @fadd_double_no_nopinf_pnorm(float noundef nofpclass(pinf pnorm) %a
 define float @fadd_double_no_noninf_nnorm(float noundef nofpclass(ninf nnorm) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_noninf_nnorm
-; CHECK-SAME: (float noundef nofpclass(ninf nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3183,7 +3183,7 @@ define float @fadd_double_no_noninf_nnorm(float noundef nofpclass(ninf nnorm) %a
 define float @fadd_double_no_pnorm_psub(float noundef nofpclass(pnorm psub) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_pnorm_psub
-; CHECK-SAME: (float noundef nofpclass(psub pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(psub pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3194,7 +3194,7 @@ define float @fadd_double_no_pnorm_psub(float noundef nofpclass(pnorm psub) %arg
 define float @fadd_double_no_nnorm_nsub(float noundef nofpclass(nnorm nsub) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_nnorm_nsub
-; CHECK-SAME: (float noundef nofpclass(nsub nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3205,7 +3205,7 @@ define float @fadd_double_no_nnorm_nsub(float noundef nofpclass(nnorm nsub) %arg
 define float @fadd_double_no_nopsub_pzero(float noundef nofpclass(psub pzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pzero) float @fadd_double_no_nopsub_pzero
-; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3216,7 +3216,7 @@ define float @fadd_double_no_nopsub_pzero(float noundef nofpclass(psub pzero) %a
 define float @fadd_double_no_nonsub_nzero(float noundef nofpclass(nsub nzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_nonsub_nzero
-; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3226,7 +3226,7 @@ define float @fadd_double_no_nonsub_nzero(float noundef nofpclass(nsub nzero) %a
 
 define float @fadd_double_no_nopsub_pzero__ieee_daz(float noundef nofpclass(psub pzero) %arg) #2 {
 ; CHECK-LABEL: define noundef nofpclass(pzero) float @fadd_double_no_nopsub_pzero__ieee_daz
-; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3237,7 +3237,7 @@ define float @fadd_double_no_nopsub_pzero__ieee_daz(float noundef nofpclass(psub
 define float @fadd_double_no_nopsub_pzero__ftz_daz(float noundef nofpclass(psub pzero) %arg) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pzero) float @fadd_double_no_nopsub_pzero__ftz_daz
-; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3247,7 +3247,7 @@ define float @fadd_double_no_nopsub_pzero__ftz_daz(float noundef nofpclass(psub
 
 define float @fadd_double_no_nonsub_nzero__ieee_daz(float noundef nofpclass(nsub nzero) %arg) #2 {
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_nonsub_nzero__ieee_daz
-; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3258,7 +3258,7 @@ define float @fadd_double_no_nonsub_nzero__ieee_daz(float noundef nofpclass(nsub
 define float @fadd_double_no_nonsub_nzero__ftz_daz(float noundef nofpclass(nsub nzero) %arg) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define noundef float @fadd_double_no_nonsub_nzero__ftz_daz
-; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3268,7 +3268,7 @@ define float @fadd_double_no_nonsub_nzero__ftz_daz(float noundef nofpclass(nsub
 
 define float @fadd_double_no_nopsub_pzero__ieee_dynamic(float noundef nofpclass(psub pzero) %arg) #9 {
 ; CHECK-LABEL: define noundef float @fadd_double_no_nopsub_pzero__ieee_dynamic
-; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR17:[0-9]+]] {
+; CHECK-SAME: (float noundef nofpclass(pzero psub) [[ARG:%.*]]) #[[ATTR16:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3278,7 +3278,7 @@ define float @fadd_double_no_nopsub_pzero__ieee_dynamic(float noundef nofpclass(
 
 define float @fadd_double_no_nonsub_nzero__ieee_dynamic(float noundef nofpclass(nsub nzero) %arg) #9 {
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_nonsub_nzero__ieee_dynamic
-; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR17]] {
+; CHECK-SAME: (float noundef nofpclass(nzero nsub) [[ARG:%.*]]) #[[ATTR16]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3289,7 +3289,7 @@ define float @fadd_double_no_nonsub_nzero__ieee_dynamic(float noundef nofpclass(
 define float @fadd_double_known_positive_nonsub_ieee(float noundef nofpclass(ninf nnorm sub zero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(ninf zero sub nnorm) float @fadd_double_known_positive_nonsub_ieee
-; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3299,7 +3299,7 @@ define float @fadd_double_known_positive_nonsub_ieee(float noundef nofpclass(nin
 
 define float @fadd_double_known_positive_nonsub__ieee_daz(float noundef nofpclass(ninf nnorm sub zero) %arg) #2 {
 ; CHECK-LABEL: define noundef nofpclass(ninf zero sub nnorm) float @fadd_double_known_positive_nonsub__ieee_daz
-; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3310,7 +3310,7 @@ define float @fadd_double_known_positive_nonsub__ieee_daz(float noundef nofpclas
 define float @fadd_double_known_positive_nonsub__ftz_daz(float noundef nofpclass(ninf nnorm sub zero) %arg) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define noundef nofpclass(ninf zero sub nnorm) float @fadd_double_known_positive_nonsub__ftz_daz
-; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3320,7 +3320,7 @@ define float @fadd_double_known_positive_nonsub__ftz_daz(float noundef nofpclass
 
 define float @fadd_double_known_positive_nonsub__ieee_dynamic(float noundef nofpclass(ninf nnorm sub zero) %arg) #9 {
 ; CHECK-LABEL: define noundef nofpclass(ninf zero sub nnorm) float @fadd_double_known_positive_nonsub__ieee_dynamic
-; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR17]] {
+; CHECK-SAME: (float noundef nofpclass(ninf zero sub nnorm) [[ARG:%.*]]) #[[ATTR16]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3331,7 +3331,7 @@ define float @fadd_double_known_positive_nonsub__ieee_dynamic(float noundef nofp
 define float @fadd_double_known_negative_nonsub_ieee(float noundef nofpclass(pinf pnorm sub zero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pinf zero sub pnorm) float @fadd_double_known_negative_nonsub_ieee
-; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3341,7 +3341,7 @@ define float @fadd_double_known_negative_nonsub_ieee(float noundef nofpclass(pin
 
 define float @fadd_double_known_negative_nonsub__ieee_daz(float noundef nofpclass(pinf pnorm sub zero) %arg) #2 {
 ; CHECK-LABEL: define noundef nofpclass(pinf zero sub pnorm) float @fadd_double_known_negative_nonsub__ieee_daz
-; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3352,7 +3352,7 @@ define float @fadd_double_known_negative_nonsub__ieee_daz(float noundef nofpclas
 define float @fadd_double_known_negative_nonsub__ftz_daz(float noundef nofpclass(pinf pnorm sub zero) %arg) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define noundef nofpclass(pinf zero sub pnorm) float @fadd_double_known_negative_nonsub__ftz_daz
-; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3362,7 +3362,7 @@ define float @fadd_double_known_negative_nonsub__ftz_daz(float noundef nofpclass
 
 define float @fadd_double_known_negative_nonsub_dynamic(float noundef nofpclass(pinf pnorm sub zero) %arg) #9 {
 ; CHECK-LABEL: define noundef nofpclass(pinf zero sub pnorm) float @fadd_double_known_negative_nonsub_dynamic
-; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR17]] {
+; CHECK-SAME: (float noundef nofpclass(pinf zero sub pnorm) [[ARG:%.*]]) #[[ATTR16]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3373,7 +3373,7 @@ define float @fadd_double_known_negative_nonsub_dynamic(float noundef nofpclass(
 define float @fsub_self(float noundef %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef float @fsub_self
-; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -3384,7 +3384,7 @@ define float @fsub_self(float noundef %arg) {
 define float @fsub_self_known_positive(float noundef nofpclass(ninf nnorm nsub nzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fsub_self_known_positive
-; CHECK-SAME: (float noundef nofpclass(ninf nzero nsub nnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(ninf nzero nsub nnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -3395,7 +3395,7 @@ define float @fsub_self_known_positive(float noundef nofpclass(ninf nnorm nsub n
 define float @fsub_self_known_negative(float noundef nofpclass(pinf pnorm psub pzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fsub_self_known_negative
-; CHECK-SAME: (float noundef nofpclass(pinf pzero psub pnorm) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(pinf pzero psub pnorm) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[SUB:%.*]] = fsub float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
@@ -3406,7 +3406,7 @@ define float @fsub_self_known_negative(float noundef nofpclass(pinf pnorm psub p
 define float @fadd_known_negative_lhs(float nofpclass(pinf psub pnorm) %arg0, float %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_known_negative_lhs
-; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3417,7 +3417,7 @@ define float @fadd_known_negative_lhs(float nofpclass(pinf psub pnorm) %arg0, fl
 define float @fadd_known_negative_rhs(float %arg0, float nofpclass(pinf psub pnorm) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_known_negative_rhs
-; CHECK-SAME: (float [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3428,7 +3428,7 @@ define float @fadd_known_negative_rhs(float %arg0, float nofpclass(pinf psub pno
 define float @fadd_known_negative(float nofpclass(pinf psub pnorm) %arg0, float nofpclass(pinf psub pnorm) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative
-; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3438,7 +3438,7 @@ define float @fadd_known_negative(float nofpclass(pinf psub pnorm) %arg0, float
 
 define float @fadd_known_negative_daz(float nofpclass(pinf psub pnorm) %arg0, float nofpclass(pinf psub pnorm) %arg1) #2 {
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_daz
-; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3449,7 +3449,7 @@ define float @fadd_known_negative_daz(float nofpclass(pinf psub pnorm) %arg0, fl
 define float @fadd_known_negative_pzero_lhs(float nofpclass(pinf psub pnorm pzero) %arg0, float nofpclass(pinf psub pnorm) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_pzero_lhs
-; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf psub pnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3460,7 +3460,7 @@ define float @fadd_known_negative_pzero_lhs(float nofpclass(pinf psub pnorm pzer
 define float @fadd_known_negative_pzero_rhs(float nofpclass(pinf psub pnorm) %arg0, float nofpclass(pinf psub pnorm pzero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_pzero_rhs
-; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3471,7 +3471,7 @@ define float @fadd_known_negative_pzero_rhs(float nofpclass(pinf psub pnorm) %ar
 define float @fadd_known_negative_pzero(float nofpclass(pinf psub pnorm pzero) %arg0, float nofpclass(pinf psub pnorm pzero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_pzero
-; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3482,7 +3482,7 @@ define float @fadd_known_negative_pzero(float nofpclass(pinf psub pnorm pzero) %
 define float @fadd_known_negative_pzero_ftz_daz(float nofpclass(pinf psub pnorm pzero) %arg0, float nofpclass(pinf psub pnorm pzero) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_pzero_ftz_daz
-; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3492,7 +3492,7 @@ define float @fadd_known_negative_pzero_ftz_daz(float nofpclass(pinf psub pnorm
 
 define float @fadd_known_negative_pzero_ftz(float nofpclass(pinf psub pnorm pzero) %arg0, float nofpclass(pinf psub pnorm pzero) %arg1) #1 {
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_pzero_ftz
-; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR13]] {
+; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR12]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3502,7 +3502,7 @@ define float @fadd_known_negative_pzero_ftz(float nofpclass(pinf psub pnorm pzer
 
 define float @fadd_known_negative_pzero_daz(float nofpclass(pinf psub pnorm pzero) %arg0, float nofpclass(pinf psub pnorm pzero) %arg1) #2 {
 ; CHECK-LABEL: define nofpclass(pinf psub pnorm) float @fadd_known_negative_pzero_daz
-; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR11]] {
+; CHECK-SAME: (float nofpclass(pinf pzero psub pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero psub pnorm) [[ARG1:%.*]]) #[[ATTR10]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3513,7 +3513,7 @@ define float @fadd_known_negative_pzero_daz(float nofpclass(pinf psub pnorm pzer
 define float @fadd_known_negative_normal(float nofpclass(pinf pnorm pzero) %arg0, float nofpclass(pinf pnorm pzero) %arg1) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_known_negative_normal
-; CHECK-SAME: (float nofpclass(pinf pzero pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero pnorm) [[ARG1:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pinf pzero pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero pnorm) [[ARG1:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3524,7 +3524,7 @@ define float @fadd_known_negative_normal(float nofpclass(pinf pnorm pzero) %arg0
 define float @fadd_known_negative_normal_daz(float nofpclass(pinf pnorm pzero) %arg0, float nofpclass(pinf pnorm pzero) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fadd_known_negative_normal_daz
-; CHECK-SAME: (float nofpclass(pinf pzero pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero pnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(pinf pzero pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero pnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3535,7 +3535,7 @@ define float @fadd_known_negative_normal_daz(float nofpclass(pinf pnorm pzero) %
 define float @fadd_known_negative_normal_except0_daz(float nofpclass(pinf pnorm) %arg0, float nofpclass(pinf pnorm) %arg1) #0 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(preservesign) memory(none)
 ; CHECK-LABEL: define float @fadd_known_negative_normal_except0_daz
-; CHECK-SAME: (float nofpclass(pinf pnorm) [[ARG0:%.*]], float nofpclass(pinf pnorm) [[ARG1:%.*]]) #[[ATTR10]] {
+; CHECK-SAME: (float nofpclass(pinf pnorm) [[ARG0:%.*]], float nofpclass(pinf pnorm) [[ARG1:%.*]]) #[[ATTR9]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3546,7 +3546,7 @@ define float @fadd_known_negative_normal_except0_daz(float nofpclass(pinf pnorm)
 define float @fadd_known_negative_normal_dapz(float nofpclass(pinf pnorm pzero) %arg0, float nofpclass(pinf pnorm pzero) %arg1) #3 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn denormal_fpenv(positivezero) memory(none)
 ; CHECK-LABEL: define float @fadd_known_negative_normal_dapz
-; CHECK-SAME: (float nofpclass(pinf pzero pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero pnorm) [[ARG1:%.*]]) #[[ATTR9]] {
+; CHECK-SAME: (float nofpclass(pinf pzero pnorm) [[ARG0:%.*]], float nofpclass(pinf pzero pnorm) [[ARG1:%.*]]) #[[ATTR8]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG0]], [[ARG1]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3558,7 +3558,7 @@ define float @fadd_known_negative_normal_dapz(float nofpclass(pinf pnorm pzero)
 define float @fadd_double_no_zero_maybe_undef(float nofpclass(zero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_double_no_zero_maybe_undef
-; CHECK-SAME: (float nofpclass(zero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(zero) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3569,7 +3569,7 @@ define float @fadd_double_no_zero_maybe_undef(float nofpclass(zero) %arg) {
 define float @fadd_double_no_nzero(float noundef nofpclass(nzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_nzero
-; CHECK-SAME: (float noundef nofpclass(nzero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(nzero) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3580,7 +3580,7 @@ define float @fadd_double_no_nzero(float noundef nofpclass(nzero) %arg) {
 define float @fadd_double_no_nzero_dapz_dapz(float noundef nofpclass(nzero) %arg) #10 {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_nzero_dapz_dapz
-; CHECK-SAME: (float noundef nofpclass(nzero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float noundef nofpclass(nzero) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3590,7 +3590,7 @@ define float @fadd_double_no_nzero_dapz_dapz(float noundef nofpclass(nzero) %arg
 
 define float @fadd_double_no_nzero_dapz_ieee(float noundef nofpclass(nzero) %arg) #4 {
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_nzero_dapz_ieee
-; CHECK-SAME: (float noundef nofpclass(nzero) [[ARG:%.*]]) #[[ATTR12]] {
+; CHECK-SAME: (float noundef nofpclass(nzero) [[ARG:%.*]]) #[[ATTR11]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3601,7 +3601,7 @@ define float @fadd_double_no_nzero_dapz_ieee(float noundef nofpclass(nzero) %arg
 define float @fadd_double_no_nzero_maybe_undef(float nofpclass(nzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nzero) float @fadd_double_no_nzero_maybe_undef
-; CHECK-SAME: (float nofpclass(nzero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(nzero) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3612,7 +3612,7 @@ define float @fadd_double_no_nzero_maybe_undef(float nofpclass(nzero) %arg) {
 define float @fadd_double_no_pzero_maybe_undef(float nofpclass(pzero) %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define float @fadd_double_no_pzero_maybe_undef
-; CHECK-SAME: (float nofpclass(pzero) [[ARG:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (float nofpclass(pzero) [[ARG:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3624,7 +3624,7 @@ define float @fadd_double_no_pzero_maybe_undef(float nofpclass(pzero) %arg) {
 ; still be flushed.
 define float @fadd_double_no_zero__output_only_is_ftz(float noundef nofpclass(zero) %arg) #7 {
 ; CHECK-LABEL: define noundef float @fadd_double_no_zero__output_only_is_ftz
-; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR13]] {
+; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR12]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3636,7 +3636,7 @@ define float @fadd_double_no_zero__output_only_is_ftz(float noundef nofpclass(ze
 ; still be flushed.
 define float @fadd_double_no_zero__output_only_is_dynamic(float noundef nofpclass(zero) %arg) #8 {
 ; CHECK-LABEL: define noundef float @fadd_double_no_zero__output_only_is_dynamic
-; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR18:[0-9]+]] {
+; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR17:[0-9]+]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3648,9 +3648,9 @@ define float @fadd_double_no_zero__output_only_is_dynamic(float noundef nofpclas
 define float @assume_select_condition_not_nan(float noundef %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(inaccessiblemem: write)
 ; CHECK-LABEL: define noundef nofpclass(nan inf nzero sub norm) float @assume_select_condition_not_nan
-; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR19:[0-9]+]] {
+; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR18:[0-9]+]] {
 ; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[ORD]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[ORD]]) #[[ATTR21]]
 ; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[ORD]], float 0.000000e+00, float [[ARG]]
 ; CHECK-NEXT:    ret float [[SELECT]]
 ;
@@ -3663,9 +3663,9 @@ define float @assume_select_condition_not_nan(float noundef %arg) {
 define float @assume_select_condition_not_nan_commute(float noundef %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(inaccessiblemem: write)
 ; CHECK-LABEL: define noundef nofpclass(inf nzero sub norm) float @assume_select_condition_not_nan_commute
-; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR19]] {
+; CHECK-SAME: (float noundef [[ARG:%.*]]) #[[ATTR18]] {
 ; CHECK-NEXT:    [[UNO:%.*]] = fcmp uno float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[UNO]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[UNO]]) #[[ATTR21]]
 ; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[UNO]], float [[ARG]], float 0.000000e+00
 ; CHECK-NEXT:    ret float [[SELECT]]
 ;
@@ -3679,10 +3679,10 @@ define float @assume_select_condition_not_nan_commute(float noundef %arg) {
 define float @assume_load(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite, inaccessiblemem: readwrite)
 ; CHECK-LABEL: define nofpclass(nan) float @assume_load
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR20:[0-9]+]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR19:[0-9]+]] {
 ; CHECK-NEXT:    [[VAL:%.*]] = load float, ptr [[PTR]], align 4
 ; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[VAL]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[ORD]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[ORD]]) #[[ATTR21]]
 ; CHECK-NEXT:    ret float [[VAL]]
 ;
   %val = load float, ptr %ptr
@@ -3695,9 +3695,9 @@ define float @assume_load(ptr %ptr) {
 define float @assume_returned_arg(float noundef %arg) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(inaccessiblemem: write)
 ; CHECK-LABEL: define noundef float @assume_returned_arg
-; CHECK-SAME: (float noundef returned [[ARG:%.*]]) #[[ATTR19]] {
+; CHECK-SAME: (float noundef returned [[ARG:%.*]]) #[[ATTR18]] {
 ; CHECK-NEXT:    [[ORD:%.*]] = fcmp ord float [[ARG]], 0.000000e+00
-; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[ORD]]) #[[ATTR22]]
+; CHECK-NEXT:    call void @llvm.assume(i1 noundef [[ORD]]) #[[ATTR21]]
 ; CHECK-NEXT:    ret float [[ARG]]
 ;
   %ord = fcmp ord float %arg, 0.0
@@ -3708,21 +3708,21 @@ define float @assume_returned_arg(float noundef %arg) {
 define double @wrong_context_function_issue178954(i1 %cond) {
 ; TUNIT: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; TUNIT-LABEL: define noundef nofpclass(nan inf nzero sub norm) double @wrong_context_function_issue178954
-; TUNIT-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
+; TUNIT-SAME: (i1 [[COND:%.*]]) #[[ATTR2]] {
 ; TUNIT-NEXT:  entry:
 ; TUNIT-NEXT:    ret double 0.000000e+00
 ;
 ; CGSCC: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(none)
 ; CGSCC-LABEL: define noundef nofpclass(nan inf nzero sub norm) double @wrong_context_function_issue178954
-; CGSCC-SAME: (i1 [[COND:%.*]]) #[[ATTR4]] {
+; CGSCC-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
 ; CGSCC-NEXT:  entry:
-; CGSCC-NEXT:    [[FMA:%.*]] = tail call nofpclass(nan inf sub norm) double @llvm.fma.f64(double noundef nofpclass(nan inf zero sub nnorm) 1.000000e+00, double noundef nofpclass(nan inf nzero sub norm) 0.000000e+00, double noundef nofpclass(nan inf nzero sub norm) 0.000000e+00) #[[ATTR23]]
+; CGSCC-NEXT:    [[FMA:%.*]] = tail call nofpclass(nan inf sub norm) double @llvm.fma.f64(double noundef nofpclass(nan inf zero sub nnorm) 1.000000e+00, double noundef nofpclass(nan inf nzero sub norm) 0.000000e+00, double noundef nofpclass(nan inf nzero sub norm) 0.000000e+00) #[[ATTR22]]
 ; CGSCC-NEXT:    [[COND_I:%.*]] = select i1 [[COND]], double 0.000000e+00, double [[FMA]]
 ; CGSCC-NEXT:    [[ADD_I46_I:%.*]] = fadd double [[COND_I]], 0.000000e+00
 ; CGSCC-NEXT:    [[ADD_I_I33:%.*]] = fadd double 0.000000e+00, [[ADD_I46_I]]
 ; CGSCC-NEXT:    [[ADD_I_I_I39:%.*]] = fadd double [[ADD_I_I33]], 0.000000e+00
 ; CGSCC-NEXT:    [[VECINIT1_I_I_I_I43:%.*]] = insertelement <2 x double> zeroinitializer, double [[ADD_I_I_I39]], i64 0
-; CGSCC-NEXT:    [[CALL7:%.*]] = call noundef nofpclass(nan inf nzero sub norm) double @issue178954_callee(<2 x double> nofpclass(nan inf nzero sub norm) [[VECINIT1_I_I_I_I43]]) #[[ATTR23]]
+; CGSCC-NEXT:    [[CALL7:%.*]] = call noundef nofpclass(nan inf nzero sub norm) double @issue178954_callee(<2 x double> nofpclass(nan inf nzero sub norm) [[VECINIT1_I_I_I_I43]]) #[[ATTR22]]
 ; CGSCC-NEXT:    ret double [[CALL7]]
 ;
 entry:
@@ -3739,7 +3739,7 @@ entry:
 define double @issue178954_callee(<2 x double> %a) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define noundef nofpclass(nan inf nzero sub norm) double @issue178954_callee
-; CHECK-SAME: (<2 x double> [[A:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (<2 x double> [[A:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    ret double 0.000000e+00
 ;
@@ -3749,7 +3749,7 @@ entry:
 
 define float @fadd_double_no_zero__output_only_is_ftpz(float noundef nofpclass(zero) %arg) #4 {
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_zero__output_only_is_ftpz
-; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR12]] {
+; CHECK-SAME: (float noundef nofpclass(zero) [[ARG:%.*]]) #[[ATTR11]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3759,7 +3759,7 @@ define float @fadd_double_no_zero__output_only_is_ftpz(float noundef nofpclass(z
 
 define float @fadd_double_no_zero_or_nsub__output_only_is_ftpz(float noundef nofpclass(zero nsub) %arg) #4 {
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_zero_or_nsub__output_only_is_ftpz
-; CHECK-SAME: (float noundef nofpclass(zero nsub) [[ARG:%.*]]) #[[ATTR12]] {
+; CHECK-SAME: (float noundef nofpclass(zero nsub) [[ARG:%.*]]) #[[ATTR11]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3769,7 +3769,7 @@ define float @fadd_double_no_zero_or_nsub__output_only_is_ftpz(float noundef nof
 
 define float @fadd_double_no_zero_or_psub__output_only_is_ftpz(float noundef nofpclass(zero psub) %arg) #4 {
 ; CHECK-LABEL: define noundef nofpclass(nzero) float @fadd_double_no_zero_or_psub__output_only_is_ftpz
-; CHECK-SAME: (float noundef nofpclass(zero psub) [[ARG:%.*]]) #[[ATTR12]] {
+; CHECK-SAME: (float noundef nofpclass(zero psub) [[ARG:%.*]]) #[[ATTR11]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3779,7 +3779,7 @@ define float @fadd_double_no_zero_or_psub__output_only_is_ftpz(float noundef nof
 
 define float @fadd_double_no_zero_or_sub__output_only_is_ftpz(float noundef nofpclass(zero sub) %arg) #4 {
 ; CHECK-LABEL: define noundef nofpclass(zero) float @fadd_double_no_zero_or_sub__output_only_is_ftpz
-; CHECK-SAME: (float noundef nofpclass(zero sub) [[ARG:%.*]]) #[[ATTR12]] {
+; CHECK-SAME: (float noundef nofpclass(zero sub) [[ARG:%.*]]) #[[ATTR11]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ARG]], [[ARG]]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
@@ -3791,7 +3791,7 @@ define float @fadd_double_no_zero_or_sub__output_only_is_ftpz(float noundef nofp
 define half @known_positive_or_nan__fadd__known_positive_normal_or_inf(half nofpclass(ninf nnorm nsub nzero) %known.positive.or.nan, half nofpclass(ninf nnorm sub zero) %known.pnorm.or.pinf.or.nan) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf zero sub nnorm) half @known_positive_or_nan__fadd__known_positive_normal_or_inf
-; CHECK-SAME: (half nofpclass(ninf nzero nsub nnorm) [[KNOWN_POSITIVE_OR_NAN:%.*]], half nofpclass(ninf zero sub nnorm) [[KNOWN_PNORM_OR_PINF_OR_NAN:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (half nofpclass(ninf nzero nsub nnorm) [[KNOWN_POSITIVE_OR_NAN:%.*]], half nofpclass(ninf zero sub nnorm) [[KNOWN_PNORM_OR_PINF_OR_NAN:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd half [[KNOWN_POSITIVE_OR_NAN]], [[KNOWN_PNORM_OR_PINF_OR_NAN]]
 ; CHECK-NEXT:    ret half [[ADD]]
 ;
@@ -3803,7 +3803,7 @@ define half @known_positive_or_nan__fadd__known_positive_normal_or_inf(half nofp
 define half @known_positive_normal_or_inf__fadd__known_positive_or_nan(half nofpclass(ninf nnorm sub zero) %known.pnorm.or.pinf.or.nan, half nofpclass(ninf nnorm nsub nzero) %known.positive.or.nan) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(ninf zero sub nnorm) half @known_positive_normal_or_inf__fadd__known_positive_or_nan
-; CHECK-SAME: (half nofpclass(ninf zero sub nnorm) [[KNOWN_PNORM_OR_PINF_OR_NAN:%.*]], half nofpclass(ninf nzero nsub nnorm) [[KNOWN_POSITIVE_OR_NAN:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (half nofpclass(ninf zero sub nnorm) [[KNOWN_PNORM_OR_PINF_OR_NAN:%.*]], half nofpclass(ninf nzero nsub nnorm) [[KNOWN_POSITIVE_OR_NAN:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd half [[KNOWN_PNORM_OR_PINF_OR_NAN]], [[KNOWN_POSITIVE_OR_NAN]]
 ; CHECK-NEXT:    ret half [[ADD]]
 ;
@@ -3815,7 +3815,7 @@ define half @known_positive_normal_or_inf__fadd__known_positive_or_nan(half nofp
 define half @known_negative_or_nan__fadd__known_negative_normal_or_inf(half nofpclass(pinf pnorm psub pzero) %known.negative.or.nan, half nofpclass(pinf pnorm sub zero) %known.nnorm.or.ninf.or.nan) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf zero sub pnorm) half @known_negative_or_nan__fadd__known_negative_normal_or_inf
-; CHECK-SAME: (half nofpclass(pinf pzero psub pnorm) [[KNOWN_NEGATIVE_OR_NAN:%.*]], half nofpclass(pinf zero sub pnorm) [[KNOWN_NNORM_OR_NINF_OR_NAN:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (half nofpclass(pinf pzero psub pnorm) [[KNOWN_NEGATIVE_OR_NAN:%.*]], half nofpclass(pinf zero sub pnorm) [[KNOWN_NNORM_OR_NINF_OR_NAN:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd half [[KNOWN_NEGATIVE_OR_NAN]], [[KNOWN_NNORM_OR_NINF_OR_NAN]]
 ; CHECK-NEXT:    ret half [[ADD]]
 ;
@@ -3827,7 +3827,7 @@ define half @known_negative_or_nan__fadd__known_negative_normal_or_inf(half nofp
 define half @known_negative_normal_or_inf__fadd__known_negative_or_nan(half nofpclass(pinf pnorm sub zero) %known.nnorm.or.ninf.or.nan, half nofpclass(pinf pnorm psub pzero) %known.negative.or.nan) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(pinf zero sub pnorm) half @known_negative_normal_or_inf__fadd__known_negative_or_nan
-; CHECK-SAME: (half nofpclass(pinf zero sub pnorm) [[KNOWN_NNORM_OR_NINF_OR_NAN:%.*]], half nofpclass(pinf pzero psub pnorm) [[KNOWN_NEGATIVE_OR_NAN:%.*]]) #[[ATTR3]] {
+; CHECK-SAME: (half nofpclass(pinf zero sub pnorm) [[KNOWN_NNORM_OR_NINF_OR_NAN:%.*]], half nofpclass(pinf pzero psub pnorm) [[KNOWN_NEGATIVE_OR_NAN:%.*]]) #[[ATTR2]] {
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd half [[KNOWN_NNORM_OR_NINF_OR_NAN]], [[KNOWN_NEGATIVE_OR_NAN]]
 ; CHECK-NEXT:    ret half [[ADD]]
 ;
@@ -3838,7 +3838,7 @@ define half @known_negative_normal_or_inf__fadd__known_negative_or_nan(half nofp
 define float @infer_return_from_load_nofpclass_md_f32_0(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
 ; CHECK-LABEL: define nofpclass(nan) float @infer_return_from_load_nofpclass_md_f32_0
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR5]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR4]] {
 ; CHECK-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR]], align 4, !nofpclass [[META1:![0-9]+]]
 ; CHECK-NEXT:    ret float [[LOAD]]
 ;
@@ -3849,7 +3849,7 @@ define float @infer_return_from_load_nofpclass_md_f32_0(ptr %ptr) {
 define float @infer_return_from_load_nofpclass_md_f32_1(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
 ; CHECK-LABEL: define nofpclass(pinf) float @infer_return_from_load_nofpclass_md_f32_1
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR5]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(4) [[PTR:%.*]]) #[[ATTR4]] {
 ; CHECK-NEXT:    [[LOAD:%.*]] = load float, ptr [[PTR]], align 4, !nofpclass [[META2:![0-9]+]]
 ; CHECK-NEXT:    ret float [[LOAD]]
 ;
@@ -3860,7 +3860,7 @@ define float @infer_return_from_load_nofpclass_md_f32_1(ptr %ptr) {
 define <2 x float> @infer_return_from_load_nofpclass_md_v2f32(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
 ; CHECK-LABEL: define nofpclass(nan) <2 x float> @infer_return_from_load_nofpclass_md_v2f32
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 8 captures(none) dereferenceable(8) [[PTR:%.*]]) #[[ATTR5]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 8 captures(none) dereferenceable(8) [[PTR:%.*]]) #[[ATTR4]] {
 ; CHECK-NEXT:    [[LOAD:%.*]] = load <2 x float>, ptr [[PTR]], align 8, !nofpclass [[META1]]
 ; CHECK-NEXT:    ret <2 x float> [[LOAD]]
 ;
@@ -3871,7 +3871,7 @@ define <2 x float> @infer_return_from_load_nofpclass_md_v2f32(ptr %ptr) {
 define { float, float } @infer_return_from_load_nofpclass_md_struct(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
 ; CHECK-LABEL: define nofpclass(nan) { float, float } @infer_return_from_load_nofpclass_md_struct
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(8) [[PTR:%.*]]) #[[ATTR5]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(8) [[PTR:%.*]]) #[[ATTR4]] {
 ; CHECK-NEXT:    [[LOAD:%.*]] = load { float, float }, ptr [[PTR]], align 4, !nofpclass [[META1]]
 ; CHECK-NEXT:    ret { float, float } [[LOAD]]
 ;
@@ -3882,7 +3882,7 @@ define { float, float } @infer_return_from_load_nofpclass_md_struct(ptr %ptr) {
 define [4 x float] @infer_return_from_load_nofpclass_md_array(ptr %ptr) {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
 ; CHECK-LABEL: define nofpclass(nan) [4 x float] @infer_return_from_load_nofpclass_md_array
-; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(16) [[PTR:%.*]]) #[[ATTR5]] {
+; CHECK-SAME: (ptr nofree noundef nonnull readonly align 4 captures(none) dereferenceable(16) [[PTR:%.*]]) #[[ATTR4]] {
 ; CHECK-NEXT:    [[LOAD:%.*]] = load [4 x float], ptr [[PTR]], align 4, !nofpclass [[META1]]
 ; CHECK-NEXT:    ret [4 x float] [[LOAD]]
 ;
@@ -3893,7 +3893,7 @@ define [4 x float] @infer_return_from_load_nofpclass_md_array(ptr %ptr) {
 define [2 x float] @constant_data_array_0() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub nnorm) [2 x float] @constant_data_array_0
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret [2 x float] [float 0.000000e+00, float 1.000000e+00]
 ;
   ret [2 x float] [float 0.0, float 1.0]
@@ -3902,7 +3902,7 @@ define [2 x float] @constant_data_array_0() {
 define [2 x float] @constant_data_array_1() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(snan inf zero sub norm) [2 x float] @constant_data_array_1
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret [2 x float] [float 0x7FF8000000000000, float 0x7FF8000000000000]
 ;
   ret [2 x float] [float 0x7FF8000000000000, float 0x7FF8000000000000]
@@ -3911,7 +3911,7 @@ define [2 x float] @constant_data_array_1() {
 define { float, float } @constant_data_struct_0() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(nan inf nzero sub nnorm) { float, float } @constant_data_struct_0
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret { float, float } { float 0.000000e+00, float 1.000000e+00 }
 ;
   ret { float, float } { float 0.0, float 1.0 }
@@ -3920,7 +3920,7 @@ define { float, float } @constant_data_struct_0() {
 define { float, float } @constant_data_struct_1() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define nofpclass(snan inf zero sub norm) { float, float } @constant_data_struct_1
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret { float, float } { float 0x7FF8000000000000, float 0x7FF8000000000000 }
 ;
   ret { float, float } { float 0x7FF8000000000000, float 0x7FF8000000000000 }
@@ -3929,7 +3929,7 @@ define { float, float } @constant_data_struct_1() {
 define { float, { float, float } } @constant_data_nested_struct() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define { float, { float, float } } @constant_data_nested_struct
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret { float, { float, float } } { float 0x7FF8000000000000, { float, float } { float 0x7FF8000000000000, float 0x7FF8000000000000 } }
 ;
   ret { float, { float, float } } { float 0x7FF8000000000000, { float, float } { float 0x7FF8000000000000, float 0x7FF8000000000000 } }
@@ -3938,7 +3938,7 @@ define { float, { float, float } } @constant_data_nested_struct() {
 define { float, double } @constant_data_struct_heterogeneous() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define { float, double } @constant_data_struct_heterogeneous
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret { float, double } { float 0x7FF8000000000000, double 0x7FF8000000000000 }
 ;
   ret { float, double } { float 0x7FF8000000000000, double 0x7FF8000000000000 }
@@ -3947,7 +3947,7 @@ define { float, double } @constant_data_struct_heterogeneous() {
 define { float, [2 x float] } @constant_data_struct_array() {
 ; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
 ; CHECK-LABEL: define { float, [2 x float] } @constant_data_struct_array
-; CHECK-SAME: () #[[ATTR3]] {
+; CHECK-SAME: () #[[ATTR2]] {
 ; CHECK-NEXT:    ret { float, [2 x float] } { float 0x7FF8000000000000, [2 x float] [float 0x7FF8000000000000, float 0x7FF8000000000000] }
 ;
   ret { float, [2 x float] } { float 0x7FF8000000000000, [2 x float] [float 0x7FF8000000000000, float 0x7FF8000000000000] }
diff --git a/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll b/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll
index 3871822c9dc17..b5bce1d53a7a5 100644
--- a/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll
@@ -7,8 +7,8 @@
 
 define double @multiple_fadd(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fadd(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
   %1 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -19,7 +19,7 @@ define double @multiple_fadd(double %a, double %b) #0 {
 
 define double @multiple_fadd_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fadd_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -33,7 +33,7 @@ define double @multiple_fadd_split(double %a, double %b) #0 {
 
 define double @multiple_fsub(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fsub(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -45,7 +45,7 @@ define double @multiple_fsub(double %a, double %b) #0 {
 
 define double @multiple_fsub_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fsub_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -59,7 +59,7 @@ define double @multiple_fsub_split(double %a, double %b) #0 {
 
 define double @multiple_fmul(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fmul(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -71,7 +71,7 @@ define double @multiple_fmul(double %a, double %b) #0 {
 
 define double @multiple_fmul_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fmul_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -85,7 +85,7 @@ define double @multiple_fmul_split(double %a, double %b) #0 {
 
 define double @multiple_fdiv(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fdiv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -97,7 +97,7 @@ define double @multiple_fdiv(double %a, double %b) #0 {
 
 define double @multiple_fdiv_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fdiv_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -111,7 +111,7 @@ define double @multiple_fdiv_split(double %a, double %b) #0 {
 
 define double @multiple_frem(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_frem(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -123,7 +123,7 @@ define double @multiple_frem(double %a, double %b) #0 {
 
 define double @multiple_frem_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_frem_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -137,8 +137,9 @@ define double @multiple_frem_split(double %a, double %b) #0 {
 
 define i32 @multiple_fptoui(double %a) #0 {
 ; CHECK-LABEL: @multiple_fptoui(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -149,9 +150,10 @@ define i32 @multiple_fptoui(double %a) #0 {
 
 define i32 @multiple_fptoui_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fptoui_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -163,7 +165,7 @@ define i32 @multiple_fptoui_split(double %a, double %b) #0 {
 
 define double @multiple_uitofp(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_uitofp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -175,7 +177,7 @@ define double @multiple_uitofp(i32 %a) #0 {
 
 define double @multiple_uitofp_split(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_uitofp_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -189,8 +191,9 @@ define double @multiple_uitofp_split(i32 %a) #0 {
 
 define i32 @multiple_fptosi(double %a) #0 {
 ; CHECK-LABEL: @multiple_fptosi(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -201,9 +204,10 @@ define i32 @multiple_fptosi(double %a) #0 {
 
 define i32 @multiple_fptosi_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fptosi_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -215,7 +219,7 @@ define i32 @multiple_fptosi_split(double %a, double %b) #0 {
 
 define double @multiple_sitofp(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_sitofp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -227,7 +231,7 @@ define double @multiple_sitofp(i32 %a) #0 {
 
 define double @multiple_sitofp_split(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_sitofp_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
@@ -241,10 +245,12 @@ define double @multiple_sitofp_split(i32 %a) #0 {
 
 define i1 @multiple_fcmp(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fcmp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
   %2 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
@@ -256,11 +262,13 @@ define i1 @multiple_fcmp(double %a, double %b) #0 {
 
 define i1 @multiple_fcmp_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fcmp_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
   call void @arbitraryfunc() #0
@@ -273,7 +281,7 @@ define i1 @multiple_fcmp_split(double %a, double %b) #0 {
 
 define i1 @multiple_fcmps(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fcmps(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 [[TMP1]]
@@ -288,7 +296,7 @@ define i1 @multiple_fcmps(double %a, double %b) #0 {
 
 define i1 @multiple_fcmps_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fcmps_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
diff --git a/llvm/test/Transforms/EarlyCSE/ebstrict-strictfp.ll b/llvm/test/Transforms/EarlyCSE/ebstrict-strictfp.ll
index f2675ce7816a4..8daf27fb980ee 100644
--- a/llvm/test/Transforms/EarlyCSE/ebstrict-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/ebstrict-strictfp.ll
@@ -8,9 +8,9 @@
 
 define double @fadd_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @fadd_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
   %1 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -21,8 +21,8 @@ define double @fadd_strict(double %a, double %b) #0 {
 
 define double @fsub_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @fsub_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -34,8 +34,8 @@ define double @fsub_strict(double %a, double %b) #0 {
 
 define double @fmul_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @fmul_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -47,8 +47,8 @@ define double @fmul_strict(double %a, double %b) #0 {
 
 define double @fdiv_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @fdiv_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -60,8 +60,8 @@ define double @fdiv_strict(double %a, double %b) #0 {
 
 define double @frem_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @frem_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -73,9 +73,8 @@ define double @frem_strict(double %a, double %b) #0 {
 
 define i32 @fptoui_strict(double %a) #0 {
 ; CHECK-LABEL: @fptoui_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A]], metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = fptoui double [[A:%.*]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.strict") #0
@@ -86,8 +85,8 @@ define i32 @fptoui_strict(double %a) #0 {
 
 define double @uitofp_strict(i32 %a) #0 {
 ; CHECK-LABEL: @uitofp_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -99,9 +98,8 @@ define double @uitofp_strict(i32 %a) #0 {
 
 define i32 @fptosi_strict(double %a) #0 {
 ; CHECK-LABEL: @fptosi_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A]], metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = fptosi double [[A:%.*]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.strict") #0
@@ -112,8 +110,8 @@ define i32 @fptosi_strict(double %a) #0 {
 
 define double @sitofp_strict(i32 %a) #0 {
 ; CHECK-LABEL: @sitofp_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -125,12 +123,12 @@ define double @sitofp_strict(i32 %a) #0 {
 
 define i1 @fcmp_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmp_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
-; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP2]]
+; CHECK-NEXT:    [[TMP6:%.*]] = zext i1 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP4]], i32 [[TMP6]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.strict") #0
   %2 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.strict") #0
@@ -142,8 +140,8 @@ define i1 @fcmp_strict(double %a, double %b) #0 {
 
 define i1 @fcmps_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmps_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmps.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
diff --git a/llvm/test/Transforms/EarlyCSE/mixed-strictfp.ll b/llvm/test/Transforms/EarlyCSE/mixed-strictfp.ll
index b79f7018b8d0d..a93d2f9983bc1 100644
--- a/llvm/test/Transforms/EarlyCSE/mixed-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/mixed-strictfp.ll
@@ -8,9 +8,9 @@
 
 define double @mixed_fadd_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fadd_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
   %1 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -21,8 +21,8 @@ define double @mixed_fadd_neginf(double %a, double %b) #0 {
 
 define double @mixed_fadd_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fadd_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -34,8 +34,8 @@ define double @mixed_fadd_maytrap(double %a, double %b) #0 {
 
 define double @mixed_fadd_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fadd_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -47,8 +47,8 @@ define double @mixed_fadd_strict(double %a, double %b) #0 {
 
 define double @mixed_fsub_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fsub_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A]], double [[B]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -60,8 +60,8 @@ define double @mixed_fsub_neginf(double %a, double %b) #0 {
 
 define double @mixed_fsub_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fsub_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -73,8 +73,8 @@ define double @mixed_fsub_maytrap(double %a, double %b) #0 {
 
 define double @mixed_fsub_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fsub_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -86,8 +86,8 @@ define double @mixed_fsub_strict(double %a, double %b) #0 {
 
 define double @mixed_fmul_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fmul_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A]], double [[B]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -98,8 +98,8 @@ define double @mixed_fmul_neginf(double %a, double %b) #0 {
 }
 define double @mixed_fmul_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fmul_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -111,8 +111,8 @@ define double @mixed_fmul_maytrap(double %a, double %b) #0 {
 
 define double @mixed_fmul_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fmul_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -124,8 +124,8 @@ define double @mixed_fmul_strict(double %a, double %b) #0 {
 
 define double @mixed_fdiv_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fdiv_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[B]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -137,8 +137,8 @@ define double @mixed_fdiv_neginf(double %a, double %b) #0 {
 
 define double @mixed_fdiv_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fdiv_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -150,8 +150,8 @@ define double @mixed_fdiv_maytrap(double %a, double %b) #0 {
 
 define double @mixed_fdiv_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fdiv_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -163,8 +163,8 @@ define double @mixed_fdiv_strict(double %a, double %b) #0 {
 
 define double @mixed_frem_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_frem_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A]], double [[B]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -176,8 +176,8 @@ define double @mixed_frem_neginf(double %a, double %b) #0 {
 
 define double @mixed_frem_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_frem_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -189,8 +189,8 @@ define double @mixed_frem_maytrap(double %a, double %b) #0 {
 
 define double @mixed_frem_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_frem_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A]], double [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -202,8 +202,8 @@ define double @mixed_frem_strict(double %a, double %b) #0 {
 
 define i32 @mixed_fptoui_maytrap(double %a) #0 {
 ; CHECK-LABEL: @mixed_fptoui_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A]], metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
@@ -215,8 +215,8 @@ define i32 @mixed_fptoui_maytrap(double %a) #0 {
 
 define i32 @mixed_fptoui_strict(double %a) #0 {
 ; CHECK-LABEL: @mixed_fptoui_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A]], metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = fptoui double [[A]] to i32
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
@@ -228,8 +228,8 @@ define i32 @mixed_fptoui_strict(double %a) #0 {
 
 define double @mixed_uitofp_neginf(i32 %a) #0 {
 ; CHECK-LABEL: @mixed_uitofp_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -241,8 +241,8 @@ define double @mixed_uitofp_neginf(i32 %a) #0 {
 
 define double @mixed_uitofp_maytrap(i32 %a) #0 {
 ; CHECK-LABEL: @mixed_uitofp_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -254,8 +254,8 @@ define double @mixed_uitofp_maytrap(i32 %a) #0 {
 
 define double @mixed_uitofp_strict(i32 %a) #0 {
 ; CHECK-LABEL: @mixed_uitofp_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -267,8 +267,8 @@ define double @mixed_uitofp_strict(i32 %a) #0 {
 
 define i32 @mixed_fptosi_maytrap(double %a) #0 {
 ; CHECK-LABEL: @mixed_fptosi_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A]], metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
@@ -280,8 +280,8 @@ define i32 @mixed_fptosi_maytrap(double %a) #0 {
 
 define i32 @mixed_fptosi_strict(double %a) #0 {
 ; CHECK-LABEL: @mixed_fptosi_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A]], metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = fptosi double [[A]] to i32
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
@@ -293,8 +293,8 @@ define i32 @mixed_fptosi_strict(double %a) #0 {
 
 define double @mixed_sitofp_neginf(i32 %a) #0 {
 ; CHECK-LABEL: @mixed_sitofp_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -306,8 +306,8 @@ define double @mixed_sitofp_neginf(i32 %a) #0 {
 
 define double @mixed_sitofp_maytrap(i32 %a) #0 {
 ; CHECK-LABEL: @mixed_sitofp_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -319,8 +319,8 @@ define double @mixed_sitofp_maytrap(i32 %a) #0 {
 
 define double @mixed_sitofp_strict(i32 %a) #0 {
 ; CHECK-LABEL: @mixed_sitofp_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -332,8 +332,8 @@ define double @mixed_sitofp_strict(i32 %a) #0 {
 
 define i1 @mixed_fcmp_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fcmp_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
@@ -349,8 +349,8 @@ define i1 @mixed_fcmp_maytrap(double %a, double %b) #0 {
 
 define i1 @mixed_fcmp_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fcmp_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
@@ -366,8 +366,8 @@ define i1 @mixed_fcmp_strict(double %a, double %b) #0 {
 
 define i1 @mixed_fcmps_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fcmps_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmps.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
@@ -383,8 +383,8 @@ define i1 @mixed_fcmps_maytrap(double %a, double %b) #0 {
 
 define i1 @mixed_fcmps_strict(double %a, double %b) #0 {
 ; CHECK-LABEL: @mixed_fcmps_strict(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmps.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
diff --git a/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll b/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll
index 3acf5597dfc3f..bdcc0cfbd11c4 100644
--- a/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll
@@ -9,8 +9,8 @@
 
 define double @fadd_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fadd_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
   %1 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -21,7 +21,7 @@ define double @fadd_defaultenv(double %a, double %b) #0 {
 
 define double @fadd_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @fadd_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -33,7 +33,7 @@ define double @fadd_neginf(double %a, double %b) #0 {
 
 define double @fadd_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @fadd_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -45,7 +45,7 @@ define double @fadd_maytrap(double %a, double %b) #0 {
 
 define double @fsub_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fsub_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -57,7 +57,7 @@ define double @fsub_defaultenv(double %a, double %b) #0 {
 
 define double @fsub_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @fsub_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -69,7 +69,7 @@ define double @fsub_neginf(double %a, double %b) #0 {
 
 define double @fsub_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @fsub_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -81,7 +81,7 @@ define double @fsub_maytrap(double %a, double %b) #0 {
 
 define double @fmul_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fmul_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -93,7 +93,7 @@ define double @fmul_defaultenv(double %a, double %b) #0 {
 
 define double @fmul_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @fmul_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -104,7 +104,7 @@ define double @fmul_neginf(double %a, double %b) #0 {
 }
 define double @fmul_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @fmul_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -116,7 +116,7 @@ define double @fmul_maytrap(double %a, double %b) #0 {
 
 define double @fdiv_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fdiv_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -128,7 +128,7 @@ define double @fdiv_defaultenv(double %a, double %b) #0 {
 
 define double @fdiv_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @fdiv_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -140,7 +140,7 @@ define double @fdiv_neginf(double %a, double %b) #0 {
 
 define double @fdiv_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @fdiv_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -152,7 +152,7 @@ define double @fdiv_maytrap(double %a, double %b) #0 {
 
 define double @frem_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @frem_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -164,7 +164,7 @@ define double @frem_defaultenv(double %a, double %b) #0 {
 
 define double @frem_neginf(double %a, double %b) #0 {
 ; CHECK-LABEL: @frem_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -176,7 +176,7 @@ define double @frem_neginf(double %a, double %b) #0 {
 
 define double @frem_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @frem_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -188,8 +188,9 @@ define double @frem_maytrap(double %a, double %b) #0 {
 
 define i32 @fptoui_defaultenv(double %a) #0 {
 ; CHECK-LABEL: @fptoui_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -200,8 +201,9 @@ define i32 @fptoui_defaultenv(double %a) #0 {
 
 define i32 @fptoui_maytrap(double %a) #0 {
 ; CHECK-LABEL: @fptoui_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double [[A:%.*]], metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.maytrap") #0
@@ -212,7 +214,7 @@ define i32 @fptoui_maytrap(double %a) #0 {
 
 define double @uitofp_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @uitofp_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -224,7 +226,7 @@ define double @uitofp_defaultenv(i32 %a) #0 {
 
 define double @uitofp_neginf(i32 %a) #0 {
 ; CHECK-LABEL: @uitofp_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -236,7 +238,7 @@ define double @uitofp_neginf(i32 %a) #0 {
 
 define double @uitofp_maytrap(i32 %a) #0 {
 ; CHECK-LABEL: @uitofp_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -248,8 +250,9 @@ define double @uitofp_maytrap(i32 %a) #0 {
 
 define i32 @fptosi_defaultenv(double %a) #0 {
 ; CHECK-LABEL: @fptosi_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -260,8 +263,9 @@ define i32 @fptosi_defaultenv(double %a) #0 {
 
 define i32 @fptosi_maytrap(double %a) #0 {
 ; CHECK-LABEL: @fptosi_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double [[A:%.*]], metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i32 [[TMP1]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.maytrap") #0
@@ -272,7 +276,7 @@ define i32 @fptosi_maytrap(double %a) #0 {
 
 define double @sitofp_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @sitofp_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -284,7 +288,7 @@ define double @sitofp_defaultenv(i32 %a) #0 {
 
 define double @sitofp_neginf(i32 %a) #0 {
 ; CHECK-LABEL: @sitofp_neginf(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -296,7 +300,7 @@ define double @sitofp_neginf(i32 %a) #0 {
 
 define double @sitofp_maytrap(i32 %a) #0 {
 ; CHECK-LABEL: @sitofp_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP1]]
 ;
@@ -308,10 +312,12 @@ define double @sitofp_maytrap(i32 %a) #0 {
 
 define i1 @fcmp_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmp_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
   %2 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
@@ -323,10 +329,12 @@ define i1 @fcmp_defaultenv(double %a, double %b) #0 {
 
 define i1 @fcmp_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmp_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.maytrap") #0
   %2 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.maytrap") #0
@@ -338,7 +346,7 @@ define i1 @fcmp_maytrap(double %a, double %b) #0 {
 
 define i1 @fcmps_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmps_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 [[TMP1]]
@@ -353,7 +361,7 @@ define i1 @fcmps_defaultenv(double %a, double %b) #0 {
 
 define i1 @fcmps_maytrap(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmps_maytrap(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmps.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 [[TMP1]]
diff --git a/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll b/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll
index c33e022f53be2..9cd1953a38f9d 100644
--- a/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll
@@ -9,9 +9,9 @@
 
 define double @multiple_fadd(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fadd(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
   %1 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -22,9 +22,9 @@ define double @multiple_fadd(double %a, double %b) #0 {
 
 define double @multiple_fadd_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fadd_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -37,8 +37,8 @@ define double @multiple_fadd_split(double %a, double %b) #0 {
 
 define double @multiple_fsub(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fsub(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -50,9 +50,9 @@ define double @multiple_fsub(double %a, double %b) #0 {
 
 define double @multiple_fsub_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fsub_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -65,8 +65,8 @@ define double @multiple_fsub_split(double %a, double %b) #0 {
 
 define double @multiple_fmul(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fmul(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -78,9 +78,9 @@ define double @multiple_fmul(double %a, double %b) #0 {
 
 define double @multiple_fmul_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fmul_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -93,8 +93,8 @@ define double @multiple_fmul_split(double %a, double %b) #0 {
 
 define double @multiple_fdiv(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fdiv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -106,9 +106,9 @@ define double @multiple_fdiv(double %a, double %b) #0 {
 
 define double @multiple_fdiv_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fdiv_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -121,8 +121,8 @@ define double @multiple_fdiv_split(double %a, double %b) #0 {
 
 define double @multiple_frem(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_frem(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -134,9 +134,9 @@ define double @multiple_frem(double %a, double %b) #0 {
 
 define double @multiple_frem_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_frem_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A:%.*]], double [[B:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.frem.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -149,8 +149,8 @@ define double @multiple_frem_split(double %a, double %b) #0 {
 
 define double @multiple_uitofp(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_uitofp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -162,9 +162,9 @@ define double @multiple_uitofp(i32 %a) #0 {
 
 define double @multiple_uitofp_split(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_uitofp_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 [[A]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -177,8 +177,8 @@ define double @multiple_uitofp_split(i32 %a) #0 {
 
 define double @multiple_sitofp(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_sitofp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
@@ -190,9 +190,9 @@ define double @multiple_sitofp(i32 %a) #0 {
 
 define double @multiple_sitofp_split(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_sitofp_split(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 [[A]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret double [[TMP2]]
 ;
diff --git a/llvm/test/Transforms/EarlyCSE/tfpropagation.ll b/llvm/test/Transforms/EarlyCSE/tfpropagation.ll
index d07c9627f9b52..7f1ca30b6f367 100644
--- a/llvm/test/Transforms/EarlyCSE/tfpropagation.ll
+++ b/llvm/test/Transforms/EarlyCSE/tfpropagation.ll
@@ -64,11 +64,11 @@ out:
 
 define double @branching_exceptignore(i64 %a) #0 {
 ; CHECK-LABEL: @branching_exceptignore(
-; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.uitofp.f64.i64(i64 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[IF_THEN3:%.*]], label [[IF_END3:%.*]]
 ; CHECK:       if.then3:
-; CHECK-NEXT:    [[C:%.*]] = call double @truefunc.f64.i1(i1 true) #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call double @truefunc.f64.i1(i1 true) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    br label [[OUT:%.*]]
 ; CHECK:       if.end3:
 ; CHECK-NEXT:    [[D:%.*]] = call double @falsefunc.f64.i1(i1 false) #[[ATTR0]]
@@ -94,8 +94,8 @@ out:
 
 define double @branching_exceptignore_dynround(i64 %a) #0 {
 ; CHECK-LABEL: @branching_exceptignore_dynround(
-; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.uitofp.f64.i64(i64 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[IF_THEN3:%.*]], label [[IF_END3:%.*]]
 ; CHECK:       if.then3:
 ; CHECK-NEXT:    [[C:%.*]] = call double @truefunc.f64.i1(i1 true) #[[ATTR0]]
@@ -124,8 +124,8 @@ out:
 
 define double @branching_maytrap(i64 %a) #0 {
 ; CHECK-LABEL: @branching_maytrap(
-; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.uitofp.f64.i64(i64 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[IF_THEN3:%.*]], label [[IF_END3:%.*]]
 ; CHECK:       if.then3:
 ; CHECK-NEXT:    [[C:%.*]] = call double @truefunc.f64.i1(i1 true) #[[ATTR0]]
@@ -156,8 +156,8 @@ out:
 ; TODO: This may or may not be worth the added complication and risk.
 define double @branching_ebstrict(i64 %a) #0 {
 ; CHECK-LABEL: @branching_ebstrict(
-; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.uitofp.f64.i64(i64 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[CMP2:%.*]] = call i1 @llvm.fcmps.f64(double 1.000000e+00, double [[CONV1]], metadata !"ogt")
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[IF_THEN3:%.*]], label [[IF_END3:%.*]]
 ; CHECK:       if.then3:
 ; CHECK-NEXT:    [[C:%.*]] = call double @truefunc.f64.i1(i1 [[CMP2]]) #[[ATTR0]]
diff --git a/llvm/test/Transforms/FunctionSpecialization/solver-constant-strictfpmetadata.ll b/llvm/test/Transforms/FunctionSpecialization/solver-constant-strictfpmetadata.ll
index 99224b4efba6b..f715d48cdabd3 100644
--- a/llvm/test/Transforms/FunctionSpecialization/solver-constant-strictfpmetadata.ll
+++ b/llvm/test/Transforms/FunctionSpecialization/solver-constant-strictfpmetadata.ll
@@ -5,7 +5,7 @@ define float @test(ptr %this, float %cm, i1 %0) strictfp {
 ; CHECK-LABEL: define float @test(
 ; CHECK-SAME: ptr [[THIS:%.*]], float [[CM:%.*]], i1 [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[CM]], float 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[CMP1:%.*]] = call i1 @llvm.fcmps.f32(float [[CM]], float 0.000000e+00, metadata !"ole")
 ; CHECK-NEXT:    [[CALL295:%.*]] = call float @test.specialized.1(ptr null, float 0.000000e+00, i1 false)
 ; CHECK-NEXT:    ret float 0.000000e+00
 ;
diff --git a/llvm/test/Transforms/Inline/inline-strictfp.ll b/llvm/test/Transforms/Inline/inline-strictfp.ll
index bc42fafd63943..2c5819f736f7b 100644
--- a/llvm/test/Transforms/Inline/inline-strictfp.ll
+++ b/llvm/test/Transforms/Inline/inline-strictfp.ll
@@ -1,60 +1,91 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt -passes=inline %s -S | FileCheck %s
 
 
 ; Ordinary function is inlined into strictfp function.
 
 define float @inlined_01(float %a) {
+; CHECK-LABEL: define float @inlined_01(
+; CHECK-SAME: float [[A:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[A]]
+; CHECK-NEXT:    ret float [[ADD]]
+;
 entry:
   %add = fadd float %a, %a
   ret float %add
 }
 
 define float @host_02(float %a) #0 {
+; CHECK-LABEL: define float @host_02(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADD_I:%.*]] = fadd float [[A]], [[A]]
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd float [[ADD_I]], 2.000000e+00
+; CHECK-NEXT:    ret float [[ADD1]]
+;
 entry:
   %0 = call float @inlined_01(float %a) #0
   %add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret float %add
-; CHECK-LABEL: @host_02
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
 }
 
 
 ; strictfp function is inlined into another strictfp function.
 
 define float @inlined_03(float %a) #0 {
+; CHECK-LABEL: define float @inlined_03(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float [[A]], float [[A]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[ADD1]]
+;
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.downward", metadata !"fpexcept.maytrap") #0
   ret float %add
 }
 
 define float @host_04(float %a) #0 {
+; CHECK-LABEL: define float @host_04(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADD1_I:%.*]] = call float @llvm.fadd.f32(float [[A]], float [[A]]) #[[ATTR0]] [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd float [[ADD1_I]], 2.000000e+00
+; CHECK-NEXT:    ret float [[ADD1]]
+;
 entry:
   %0 = call float @inlined_03(float %a) #0
   %add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret float %add
-; CHECK-LABEL: @host_04
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.downward", metadata !"fpexcept.maytrap") #0
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
 }
 
 
 ; strictfp function is NOT inlined into ordinary function.
 
 define float @inlined_05(float %a) strictfp {
+; CHECK-LABEL: define float @inlined_05(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float [[A]], float [[A]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[ADD1]]
+;
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.downward", metadata !"fpexcept.maytrap") #0
   ret float %add
 }
 
 define float @host_06(float %a) {
+; CHECK-LABEL: define float @host_06(
+; CHECK-SAME: float [[A:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call float @inlined_05(float [[A]])
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], 2.000000e+00
+; CHECK-NEXT:    ret float [[ADD]]
+;
 entry:
   %0 = call float @inlined_05(float %a)
   %add = fadd float %0, 2.000000e+00
   ret float %add
-; CHECK-LABEL: @host_06
-; CHECK: call float @inlined_05(float %a)
-; CHECK: fadd float %0, 2.000000e+00
 }
 
 
@@ -63,6 +94,13 @@ entry:
 declare float @func_ext(float);
 
 define float @inlined_07(float %a) {
+; CHECK-LABEL: define float @inlined_07(
+; CHECK-SAME: float [[A:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call float @func_ext(float [[A]])
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[A]]
+; CHECK-NEXT:    ret float [[ADD]]
+;
 entry:
   %0 = call float @func_ext(float %a)
   %add = fadd float %0, %a
@@ -71,14 +109,18 @@ entry:
 }
 
 define float @host_08(float %a) #0 {
+; CHECK-LABEL: define float @host_08(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call float @func_ext(float [[A]]) #[[ATTR0]]
+; CHECK-NEXT:    [[ADD_I:%.*]] = fadd float [[TMP0]], [[A]]
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd float [[ADD_I]], 2.000000e+00
+; CHECK-NEXT:    ret float [[ADD1]]
+;
 entry:
   %0 = call float @inlined_07(float %a) #0
   %add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret float %add
-; CHECK-LABEL: @host_08
-; CHECK: call float @func_ext(float {{.*}}) #0
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
 }
 
 
@@ -86,53 +128,83 @@ entry:
 
 ; fpext has two overloaded types.
 define double @inlined_09(float %a) {
+; CHECK-LABEL: define double @inlined_09(
+; CHECK-SAME: float [[A:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[T:%.*]] = fpext float [[A]] to double
+; CHECK-NEXT:    ret double [[T]]
+;
 entry:
   %t = fpext float %a to double
   ret double %t
 }
 
 define double @host_10(float %a) #0 {
+; CHECK-LABEL: define double @host_10(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[T_I:%.*]] = fpext float [[A]] to double
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd double [[T_I]], 2.000000e+00
+; CHECK-NEXT:    ret double [[ADD1]]
+;
 entry:
   %0 = call double @inlined_09(float %a) #0
   %add = call double @llvm.experimental.constrained.fadd.f64(double %0, double 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret double %add
-; CHECK-LABEL: @host_10
-; CHECK: call double @llvm.experimental.constrained.fpext.f64.f32(float {{.*}}, metadata !"fpexcept.ignore") #0
-; CHECK: call double @llvm.experimental.constrained.fadd.f64(double {{.*}}, double 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
 }
 
 ; fcmp does not depend on rounding mode and has metadata argument.
 define i1 @inlined_11(float %a, float %b) {
+; CHECK-LABEL: define i1 @inlined_11(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[T:%.*]] = fcmp oeq float [[A]], [[B]]
+; CHECK-NEXT:    ret i1 [[T]]
+;
 entry:
   %t = fcmp oeq float %a, %b
   ret i1 %t
 }
 
 define i1 @host_12(float %a, float %b) #0 {
+; CHECK-LABEL: define i1 @host_12(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT:    [[T_I:%.*]] = fcmp oeq float [[A]], [[B]]
+; CHECK-NEXT:    ret i1 [[T_I]]
+;
 entry:
   %add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   %cmp = call i1 @inlined_11(float %a, float %b) #0
   ret i1 %cmp
-; CHECK-LABEL: @host_12
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
-; CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float {{.*}}, metadata !"oeq", metadata !"fpexcept.ignore") #0
 }
 
 ; Intrinsic 'ceil' has constrained variant.
 define float @inlined_13(float %a) {
+; CHECK-LABEL: define float @inlined_13(
+; CHECK-SAME: float [[A:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[T:%.*]] = call float @llvm.ceil.f32(float [[A]])
+; CHECK-NEXT:    ret float [[T]]
+;
 entry:
   %t = call float @llvm.ceil.f32(float %a)
   ret float %t
 }
 
 define float @host_14(float %a) #0 {
+; CHECK-LABEL: define float @host_14(
+; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[T_I:%.*]] = call float @llvm.ceil.f32(float [[A]]) #[[ATTR0]]
+; CHECK-NEXT:    [[ADD1:%.*]] = fadd float [[T_I]], 2.000000e+00
+; CHECK-NEXT:    ret float [[ADD1]]
+;
 entry:
   %0 = call float @inlined_13(float %a) #0
   %add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret float %add
-; CHECK-LABEL: @host_14
-; CHECK: call float @llvm.experimental.constrained.ceil.f32(float %a, metadata !"fpexcept.ignore") #0
-; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
 }
 
 attributes #0 = { strictfp }
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
index c1621069abf71..809f45fffdcec 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
@@ -6036,7 +6036,8 @@ define double @trig_preop_strip_copysign(double %mag, double %sign, i32 %idx) {
 
 define double @trig_preop_strip_fabs_strictfp(double %val, i32 %idx) strictfp {
 ; CHECK-LABEL: @trig_preop_strip_fabs_strictfp(
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double [[VAL:%.*]], i32 [[IDX:%.*]]) #[[ATTR20]]
+; CHECK-NEXT:    [[FABS:%.*]] = call double @llvm.fabs.f64(double [[VAL:%.*]])
+; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double [[VAL]], i32 [[IDX:%.*]]) #[[ATTR20]]
 ; CHECK-NEXT:    ret double [[RESULT]]
 ;
   %fabs = call double @llvm.fabs.f64(double %val)
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/fmed3-fpext-fold.ll b/llvm/test/Transforms/InstCombine/AMDGPU/fmed3-fpext-fold.ll
index 66011ad1ac76f..09e208249f172 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/fmed3-fpext-fold.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/fmed3-fpext-fold.ll
@@ -608,26 +608,26 @@ define float @fmed3_f32_fpext_f16_unrepresentable_k2(half %arg0, half %arg1) #1
 define float @fmed3_f32_fpext_f16_strictfp(half %arg0, half %arg1, half %arg2) #2 {
 ; UNKNOWN-LABEL: define float @fmed3_f32_fpext_f16_strictfp
 ; UNKNOWN-SAME: (half [[ARG0:%.*]], half [[ARG1:%.*]], half [[ARG2:%.*]]) #[[ATTR2:[0-9]+]] {
-; UNKNOWN-NEXT:    [[ARG0_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG0]], metadata !"fpexcept.strict")
-; UNKNOWN-NEXT:    [[ARG1_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG1]], metadata !"fpexcept.strict")
-; UNKNOWN-NEXT:    [[ARG2_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG2]], metadata !"fpexcept.strict")
-; UNKNOWN-NEXT:    [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[ARG0_EXT]], float [[ARG1_EXT]], float [[ARG2_EXT]]) #[[ATTR2]]
+; UNKNOWN-NEXT:    [[ARG0_EXT1:%.*]] = fpext half [[ARG0]] to float
+; UNKNOWN-NEXT:    [[ARG1_EXT2:%.*]] = fpext half [[ARG1]] to float
+; UNKNOWN-NEXT:    [[ARG2_EXT3:%.*]] = fpext half [[ARG2]] to float
+; UNKNOWN-NEXT:    [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[ARG0_EXT1]], float [[ARG1_EXT2]], float [[ARG2_EXT3]]) #[[ATTR2]]
 ; UNKNOWN-NEXT:    ret float [[MED3]]
 ;
 ; GFX8-LABEL: define float @fmed3_f32_fpext_f16_strictfp
 ; GFX8-SAME: (half [[ARG0:%.*]], half [[ARG1:%.*]], half [[ARG2:%.*]]) #[[ATTR2:[0-9]+]] {
-; GFX8-NEXT:    [[ARG0_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG0]], metadata !"fpexcept.strict")
-; GFX8-NEXT:    [[ARG1_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG1]], metadata !"fpexcept.strict")
-; GFX8-NEXT:    [[ARG2_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG2]], metadata !"fpexcept.strict")
-; GFX8-NEXT:    [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[ARG0_EXT]], float [[ARG1_EXT]], float [[ARG2_EXT]]) #[[ATTR4:[0-9]+]]
+; GFX8-NEXT:    [[ARG0_EXT1:%.*]] = fpext half [[ARG0]] to float
+; GFX8-NEXT:    [[ARG1_EXT2:%.*]] = fpext half [[ARG1]] to float
+; GFX8-NEXT:    [[ARG2_EXT3:%.*]] = fpext half [[ARG2]] to float
+; GFX8-NEXT:    [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[ARG0_EXT1]], float [[ARG1_EXT2]], float [[ARG2_EXT3]]) #[[ATTR3:[0-9]+]]
 ; GFX8-NEXT:    ret float [[MED3]]
 ;
 ; GFX9-LABEL: define float @fmed3_f32_fpext_f16_strictfp
 ; GFX9-SAME: (half [[ARG0:%.*]], half [[ARG1:%.*]], half [[ARG2:%.*]]) #[[ATTR2:[0-9]+]] {
-; GFX9-NEXT:    [[ARG0_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG0]], metadata !"fpexcept.strict")
-; GFX9-NEXT:    [[ARG1_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG1]], metadata !"fpexcept.strict")
-; GFX9-NEXT:    [[ARG2_EXT:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[ARG2]], metadata !"fpexcept.strict")
-; GFX9-NEXT:    [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[ARG0_EXT]], float [[ARG1_EXT]], float [[ARG2_EXT]]) #[[ATTR5:[0-9]+]]
+; GFX9-NEXT:    [[ARG0_EXT1:%.*]] = fpext half [[ARG0]] to float
+; GFX9-NEXT:    [[ARG1_EXT2:%.*]] = fpext half [[ARG1]] to float
+; GFX9-NEXT:    [[ARG2_EXT3:%.*]] = fpext half [[ARG2]] to float
+; GFX9-NEXT:    [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[ARG0_EXT1]], float [[ARG1_EXT2]], float [[ARG2_EXT3]]) #[[ATTR4:[0-9]+]]
 ; GFX9-NEXT:    ret float [[MED3]]
 ;
   %arg0.ext = call float @llvm.experimental.constrained.fpext.f32.f16(half %arg0, metadata !"fpexcept.strict")
diff --git a/llvm/test/Transforms/InstCombine/constrained.ll b/llvm/test/Transforms/InstCombine/constrained.ll
index 9b51c2856e9b5..3eaea8a88110d 100644
--- a/llvm/test/Transforms/InstCombine/constrained.ll
+++ b/llvm/test/Transforms/InstCombine/constrained.ll
@@ -7,6 +7,7 @@
 define float @f_unused_precise() #0 {
 ; CHECK-LABEL: @f_unused_precise(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 1.000000e+00, float 1.000000e+00) [ "fp.control"(metadata !"rtp") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
 entry:
@@ -18,7 +19,7 @@ entry:
 define float @f_unused_strict() #0 {
 ; CHECK-LABEL: @f_unused_strict(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float 1.000000e+00, float 3.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
 entry:
@@ -30,6 +31,7 @@ entry:
 define float @f_unused_ignore() #0 {
 ; CHECK-LABEL: @f_unused_ignore(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
 entry:
@@ -41,6 +43,7 @@ entry:
 define float @f_unused_dynamic_ignore() #0 {
 ; CHECK-LABEL: @f_unused_dynamic_ignore(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
 entry:
@@ -52,6 +55,7 @@ entry:
 define float @f_unused_maytrap() #0 {
 ; CHECK-LABEL: @f_unused_maytrap(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
 entry:
@@ -65,6 +69,7 @@ entry:
 define float @f_eval_precise() #0 {
 ; CHECK-LABEL: @f_eval_precise(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 1.000000e+00, float 1.000000e+00) [ "fp.control"(metadata !"rtp") ]
 ; CHECK-NEXT:    ret float 2.000000e+00
 ;
 entry:
@@ -76,7 +81,7 @@ entry:
 define float @f_eval_strict() #0 {
 ; CHECK-LABEL: @f_eval_strict(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float 1.000000e+00, float 3.000000e+00, metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rtp") ]
 ; CHECK-NEXT:    ret float [[RESULT]]
 ;
 entry:
@@ -88,6 +93,7 @@ entry:
 define float @f_eval_ignore() #0 {
 ; CHECK-LABEL: @f_eval_ignore(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x3FD5555540000000
 ;
 entry:
@@ -99,7 +105,7 @@ entry:
 define float @f_eval_dynamic_ignore() #0 {
 ; CHECK-LABEL: @f_eval_dynamic_ignore(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float 1.000000e+00, float 3.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RESULT]]
 ;
 entry:
@@ -111,6 +117,7 @@ entry:
 define float @f_eval_maytrap() #0 {
 ; CHECK-LABEL: @f_eval_maytrap(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x3FD5555560000000
 ;
 entry:
diff --git a/llvm/test/Transforms/InstCombine/fpclass-check-idioms.ll b/llvm/test/Transforms/InstCombine/fpclass-check-idioms.ll
index 4695749cd7be8..b442db9a180fd 100644
--- a/llvm/test/Transforms/InstCombine/fpclass-check-idioms.ll
+++ b/llvm/test/Transforms/InstCombine/fpclass-check-idioms.ll
@@ -92,6 +92,7 @@ define i1 @f32_fcinf(float %a) {
 define i1 @f32_fcinf_strictfp(float %a) strictfp {
 ; CHECK-LABEL: define i1 @f32_fcinf_strictfp(
 ; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[A]])
 ; CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.is.fpclass.f32(float [[A]], i32 516)
 ; CHECK-NEXT:    ret i1 [[CMP]]
 ;
@@ -204,6 +205,7 @@ define i1 @f32_fczero(float %a) {
 define i1 @f32_fczero_strictfp(float %a) strictfp {
 ; CHECK-LABEL: define i1 @f32_fczero_strictfp(
 ; CHECK-SAME: float [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.fabs.f32(float [[A]])
 ; CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.is.fpclass.f32(float [[A]], i32 96)
 ; CHECK-NEXT:    ret i1 [[CMP]]
 ;
@@ -290,6 +292,7 @@ define <2 x i1> @f32_fcinf_vec(<2 x float> %a) {
 define <2 x i1> @f32_fcinf_vec_strictfp(<2 x float> %a) strictfp {
 ; CHECK-LABEL: define <2 x i1> @f32_fcinf_vec_strictfp(
 ; CHECK-SAME: <2 x float> [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x float> @llvm.fabs.v2f32(<2 x float> [[A]])
 ; CHECK-NEXT:    [[CMP:%.*]] = call <2 x i1> @llvm.is.fpclass.v2f32(<2 x float> [[A]], i32 516)
 ; CHECK-NEXT:    ret <2 x i1> [[CMP]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/fsqrtdiv-transform.ll b/llvm/test/Transforms/InstCombine/fsqrtdiv-transform.ll
index 940be31c39604..db9493818e7b6 100644
--- a/llvm/test/Transforms/InstCombine/fsqrtdiv-transform.ll
+++ b/llvm/test/Transforms/InstCombine/fsqrtdiv-transform.ll
@@ -594,13 +594,12 @@ define void @strict_fp_metadata(double %a) {
 ; CHECK-LABEL: define void @strict_fp_metadata(
 ; CHECK-SAME: double [[A:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 1, metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    [[CALL:%.*]] = call double @llvm.sqrt.f64(double noundef [[A]])
-; CHECK-NEXT:    [[DIV:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[CONV]], double [[CALL]], metadata !"round.dynamic", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[DIV:%.*]] = fdiv double 1.000000e+00, [[CALL]]
 ; CHECK-NEXT:    store double [[DIV]], ptr @x, align 8
-; CHECK-NEXT:    [[MUL:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DIV]], double [[DIV]], metadata !"round.dynamic", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[MUL:%.*]] = fmul double [[DIV]], [[DIV]]
 ; CHECK-NEXT:    store double [[MUL]], ptr @r1, align 8
-; CHECK-NEXT:    [[DIV2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A]], double [[CALL]], metadata !"round.dynamic", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[DIV2:%.*]] = fdiv double [[A]], [[CALL]]
 ; CHECK-NEXT:    store double [[DIV2]], ptr @r2, align 8
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/InstCombine/is_fpclass.ll b/llvm/test/Transforms/InstCombine/is_fpclass.ll
index 70a7663e5768a..b21597c8ca435 100644
--- a/llvm/test/Transforms/InstCombine/is_fpclass.ll
+++ b/llvm/test/Transforms/InstCombine/is_fpclass.ll
@@ -856,6 +856,8 @@ define i1 @test_class_is_not_nan_nnan_src(float %x) {
 
 define i1 @test_class_is_not_nan_nnan_src_strict(float %x) strictfp {
 ; CHECK-LABEL: @test_class_is_not_nan_nnan_src_strict(
+; CHECK-NEXT:    [[NNAN:%.*]] = fadd nnan float [[X:%.*]], 1.000000e+00
+; CHECK-NEXT:    [[CLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[NNAN]], i32 988) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 true
 ;
   %nnan = fadd nnan float %x, 1.0
@@ -916,6 +918,8 @@ define i1 @test_class_is_not_inf_ninf_src(float %x) {
 
 define i1 @test_class_is_not_inf_ninf_src_strict(float %x) strictfp {
 ; CHECK-LABEL: @test_class_is_not_inf_ninf_src_strict(
+; CHECK-NEXT:    [[NINF:%.*]] = fadd ninf float [[X:%.*]], 1.000000e+00
+; CHECK-NEXT:    [[CLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[NINF]], i32 475) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 true
 ;
   %ninf = fadd ninf float %x, 1.0
@@ -2075,7 +2079,8 @@ define i1 @test_class_fabs_posinf_negnormal_possubnormal_negzero_nan(float %arg)
 ; -> pinf|psubnormal|snan
 define i1 @test_class_fabs_posinf_negnormal_possubnormal_negzero_snan_strictfp(float %arg) strictfp {
 ; CHECK-LABEL: @test_class_fabs_posinf_negnormal_possubnormal_negzero_snan_strictfp(
-; CHECK-NEXT:    [[CLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[ARG:%.*]], i32 661) #[[ATTR0]]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[ARG:%.*]]) #[[ATTR0]]
+; CHECK-NEXT:    [[CLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[ARG]], i32 661) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 [[CLASS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %arg) strictfp
@@ -2406,7 +2411,8 @@ define i1 @test_class_fneg_fabs_posinf_negnormal_possubnormal_negzero_nan(float
 ; strictfp doesn't matter
 define i1 @test_class_fneg_fabs_posinf_negnormal_possubnormal_negzero_snan_strictfp(float %arg) strictfp {
 ; CHECK-LABEL: @test_class_fneg_fabs_posinf_negnormal_possubnormal_negzero_snan_strictfp(
-; CHECK-NEXT:    [[CLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[ARG:%.*]], i32 361) #[[ATTR0]]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[ARG:%.*]]) #[[ATTR0]]
+; CHECK-NEXT:    [[CLASS:%.*]] = call i1 @llvm.is.fpclass.f32(float [[ARG]], i32 361) #[[ATTR0]]
 ; CHECK-NEXT:    ret i1 [[CLASS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %arg) strictfp
diff --git a/llvm/test/Transforms/InstCombine/ldexp.ll b/llvm/test/Transforms/InstCombine/ldexp.ll
index 9586fffe9956c..7f49fb1eb40c5 100644
--- a/llvm/test/Transforms/InstCombine/ldexp.ll
+++ b/llvm/test/Transforms/InstCombine/ldexp.ll
@@ -1060,9 +1060,10 @@ define float @ldexp_f32_mask_select_0_swap_multi_use(i1 %cond, float %x, i32 %y,
 define float @ldexp_f32_mask_select_0_strictfp(i1 %cond, float %x, i32 %y) #0 {
 ; CHECK-LABEL: define float @ldexp_f32_mask_select_0_strictfp
 ; CHECK-SAME: (i1 [[COND:%.*]], float [[X:%.*]], i32 [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
-; CHECK-NEXT:    [[SELECT:%.*]] = select i1 [[COND]], i32 [[Y]], i32 0
-; CHECK-NEXT:    [[LDEXP:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float [[X]], i32 [[SELECT]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[LDEXP]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call float @llvm.ldexp.f32.i32(float [[X]], i32 [[Y]])
+; CHECK-NEXT:    [[LDEXP1:%.*]] = select i1 [[COND]], float [[TMP1]], float [[X]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call float @llvm.ldexp.f32.i32(float [[X]], i32 [[Y]])
+; CHECK-NEXT:    ret float [[LDEXP1]]
 ;
   %select = select i1 %cond, i32 %y, i32 0
   %ldexp = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 %select, metadata !"round.dynamic", metadata !"fpexcept.strict")
diff --git a/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll b/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll
index a9f0662629347..dc2cbb79a1ce0 100644
--- a/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll
@@ -7,8 +7,7 @@
 
 define float @fadd_nan_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -16,7 +15,8 @@ define float @fadd_nan_op0_strict(float %x) #0 {
 
 define float @fadd_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -24,7 +24,8 @@ define float @fadd_nan_op0_maytrap(float %x) #0 {
 
 define float @fadd_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -32,7 +33,8 @@ define float @fadd_nan_op0_upward(float %x) #0 {
 
 define float @fadd_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -40,8 +42,7 @@ define float @fadd_nan_op0_defaultfp(float %x) #0 {
 
 define float @fadd_nan_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -49,7 +50,8 @@ define float @fadd_nan_op1_strict(float %x) #0 {
 
 define float @fadd_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -57,7 +59,8 @@ define float @fadd_nan_op1_maytrap(float %x) #0 {
 
 define float @fadd_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -65,7 +68,8 @@ define float @fadd_nan_op1_upward(float %x) #0 {
 
 define float @fadd_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -77,8 +81,7 @@ define float @fadd_nan_op1_defaultfp(float %x) #0 {
 
 define float @fsub_nan_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -86,7 +89,8 @@ define float @fsub_nan_op0_strict(float %x) #0 {
 
 define float @fsub_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -94,7 +98,8 @@ define float @fsub_nan_op0_maytrap(float %x) #0 {
 
 define float @fsub_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -102,7 +107,8 @@ define float @fsub_nan_op0_upward(float %x) #0 {
 
 define float @fsub_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -110,8 +116,7 @@ define float @fsub_nan_op0_defaultfp(float %x) #0 {
 
 define float @fsub_nan_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   ret float %r
@@ -119,7 +124,8 @@ define float @fsub_nan_op1_strict(float %x) #0 {
 
 define float @fsub_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -127,7 +133,8 @@ define float @fsub_nan_op1_maytrap(float %x) #0 {
 
 define float @fsub_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -135,7 +142,8 @@ define float @fsub_nan_op1_upward(float %x) #0 {
 
 define float @fsub_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -147,8 +155,7 @@ define float @fsub_nan_op1_defaultfp(float %x) #0 {
 
 define float @fmul_nan_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -156,7 +163,8 @@ define float @fmul_nan_op0_strict(float %x) #0 {
 
 define float @fmul_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -164,7 +172,8 @@ define float @fmul_nan_op0_maytrap(float %x) #0 {
 
 define float @fmul_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -172,7 +181,8 @@ define float @fmul_nan_op0_upward(float %x) #0 {
 
 define float @fmul_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -180,8 +190,7 @@ define float @fmul_nan_op0_defaultfp(float %x) #0 {
 
 define float @fmul_nan_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -189,7 +198,8 @@ define float @fmul_nan_op1_strict(float %x) #0 {
 
 define float @fmul_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -197,7 +207,8 @@ define float @fmul_nan_op1_maytrap(float %x) #0 {
 
 define float @fmul_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -205,7 +216,8 @@ define float @fmul_nan_op1_upward(float %x) #0 {
 
 define float @fmul_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -217,8 +229,7 @@ define float @fmul_nan_op1_defaultfp(float %x) #0 {
 
 define float @fdiv_nan_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -226,7 +237,8 @@ define float @fdiv_nan_op0_strict(float %x) #0 {
 
 define float @fdiv_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -234,7 +246,8 @@ define float @fdiv_nan_op0_maytrap(float %x) #0 {
 
 define float @fdiv_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -242,7 +255,8 @@ define float @fdiv_nan_op0_upward(float %x) #0 {
 
 define float @fdiv_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -250,8 +264,7 @@ define float @fdiv_nan_op0_defaultfp(float %x) #0 {
 
 define float @fdiv_nan_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -259,7 +272,8 @@ define float @fdiv_nan_op1_strict(float %x) #0 {
 
 define float @fdiv_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -267,7 +281,8 @@ define float @fdiv_nan_op1_maytrap(float %x) #0 {
 
 define float @fdiv_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -275,7 +290,8 @@ define float @fdiv_nan_op1_upward(float %x) #0 {
 
 define float @fdiv_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -287,8 +303,7 @@ define float @fdiv_nan_op1_defaultfp(float %x) #0 {
 
 define float @frem_nan_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -296,7 +311,8 @@ define float @frem_nan_op0_strict(float %x) #0 {
 
 define float @frem_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -304,7 +320,8 @@ define float @frem_nan_op0_maytrap(float %x) #0 {
 
 define float @frem_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -312,7 +329,8 @@ define float @frem_nan_op0_upward(float %x) #0 {
 
 define float @frem_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -320,8 +338,7 @@ define float @frem_nan_op0_defaultfp(float %x) #0 {
 
 define float @frem_nan_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float [[X:%.*]], float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -329,7 +346,8 @@ define float @frem_nan_op1_strict(float %x) #0 {
 
 define float @frem_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_maytrap(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -337,7 +355,8 @@ define float @frem_nan_op1_maytrap(float %x) #0 {
 
 define float @frem_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_upward(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -345,7 +364,8 @@ define float @frem_nan_op1_upward(float %x) #0 {
 
 define float @frem_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -357,8 +377,8 @@ define float @frem_nan_op1_defaultfp(float %x) #0 {
 
 define float @fma_nan_op0_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]])
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -366,6 +386,7 @@ define float @fma_nan_op0_strict(float %x, float %y) #0 {
 
 define float @fma_nan_op0_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op0_maytrap(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -374,6 +395,7 @@ define float @fma_nan_op0_maytrap(float %x, float %y) #0 {
 
 define float @fma_nan_op0_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op0_upward(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -382,6 +404,7 @@ define float @fma_nan_op0_upward(float %x, float %y) #0 {
 
 define float @fma_nan_op0_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op0_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -390,8 +413,8 @@ define float @fma_nan_op0_defaultfp(float %x, float %y) #0 {
 
 define float @fma_nan_op1_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]])
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float 0x7FF8000000000000, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -399,6 +422,7 @@ define float @fma_nan_op1_strict(float %x, float %y) #0 {
 
 define float @fma_nan_op1_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op1_maytrap(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float 0x7FF8000000000000, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -407,6 +431,7 @@ define float @fma_nan_op1_maytrap(float %x, float %y) #0 {
 
 define float @fma_nan_op1_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op1_upward(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float 0x7FF8000000000000, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -415,6 +440,7 @@ define float @fma_nan_op1_upward(float %x, float %y) #0 {
 
 define float @fma_nan_op1_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op1_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float 0x7FF8000000000000, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -423,8 +449,8 @@ define float @fma_nan_op1_defaultfp(float %x, float %y) #0 {
 
 define float @fma_nan_op2_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op2_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000)
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -432,6 +458,7 @@ define float @fma_nan_op2_strict(float %x, float %y) #0 {
 
 define float @fma_nan_op2_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op2_maytrap(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -440,6 +467,7 @@ define float @fma_nan_op2_maytrap(float %x, float %y) #0 {
 
 define float @fma_nan_op2_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op2_upward(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -448,6 +476,7 @@ define float @fma_nan_op2_upward(float %x, float %y) #0 {
 
 define float @fma_nan_op2_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op2_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll b/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll
new file mode 100644
index 0000000000000..b08bb4fe2edc8
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll
@@ -0,0 +1,98 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instsimplify < %s | FileCheck %s
+
+; 0xB810000000000000 = -0x1.0p-126
+; 0x3800000000000000 =  0x1.0p-127 denormal
+
+define float @test_float_fadd_ieee_strict() #0 {
+; CHECK-LABEL: define float @test_float_fadd_ieee_strict(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore")
+  ret float %result
+}
+
+define float @test_float_fadd_strict_ieee() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_ieee(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=ieee") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_inzero() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_inzero(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=zero", metadata !"denorm.out=ieee") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_inpzero() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_inpzero(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=pzero", metadata !"denorm.out=ieee") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_indyn() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_indyn(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=dyn", metadata !"denorm.out=ieee") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_ieee_outzero() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_ieee_outzero(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=zero") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_ieee_outpzero() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_ieee_outpzero(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=pzero") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_ieee_outdyn() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_ieee_outdyn(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=dyn") ]
+  ret float %result
+}
+
+define float @test_float_fadd_strict_zero_outdef() #0 {
+; CHECK-LABEL: define float @test_float_fadd_strict_zero_outdef(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0xB800000000000000
+;
+  %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee") ]
+  ret float %result
+}
+
+attributes #0 = { nounwind strictfp "denormal-fp-math"="ieee,ieee" }
+attributes #5 = { nounwind "denormal-fp-math"="ieee,ieee" "denormal-fp-math-f32"="positive-zero,ieee" }
diff --git a/llvm/test/Transforms/InstSimplify/constfold-constrained.ll b/llvm/test/Transforms/InstSimplify/constfold-constrained.ll
index a9ef7f6a765d1..07d55ca750037 100644
--- a/llvm/test/Transforms/InstSimplify/constfold-constrained.ll
+++ b/llvm/test/Transforms/InstSimplify/constfold-constrained.ll
@@ -6,6 +6,7 @@
 define double @floor_01() #0 {
 ; CHECK-LABEL: @floor_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.floor.f64(double 1.010000e+01) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -17,7 +18,7 @@ entry:
 define double @floor_02() #0 {
 ; CHECK-LABEL: @floor_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.floor.f64(double -1.010000e+01, metadata !"fpexcept.strict") #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.floor.f64(double -1.010000e+01)
 ; CHECK-NEXT:    ret double -1.100000e+01
 ;
 entry:
@@ -29,6 +30,7 @@ entry:
 define double @ceil_01() #0 {
 ; CHECK-LABEL: @ceil_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.ceil.f64(double 1.010000e+01) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 1.100000e+01
 ;
 entry:
@@ -40,7 +42,7 @@ entry:
 define double @ceil_02() #0 {
 ; CHECK-LABEL: @ceil_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.ceil.f64(double -1.010000e+01, metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.ceil.f64(double -1.010000e+01)
 ; CHECK-NEXT:    ret double -1.000000e+01
 ;
 entry:
@@ -52,6 +54,7 @@ entry:
 define double @trunc_01() #0 {
 ; CHECK-LABEL: @trunc_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.trunc.f64(double 1.010000e+01) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -63,7 +66,7 @@ entry:
 define double @trunc_02() #0 {
 ; CHECK-LABEL: @trunc_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.trunc.f64(double -1.010000e+01, metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.trunc.f64(double -1.010000e+01)
 ; CHECK-NEXT:    ret double -1.000000e+01
 ;
 entry:
@@ -75,6 +78,7 @@ entry:
 define double @round_01() #0 {
 ; CHECK-LABEL: @round_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.round.f64(double 1.050000e+01) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 1.100000e+01
 ;
 entry:
@@ -86,7 +90,7 @@ entry:
 define double @round_02() #0 {
 ; CHECK-LABEL: @round_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.round.f64(double -1.050000e+01, metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.round.f64(double -1.050000e+01)
 ; CHECK-NEXT:    ret double -1.100000e+01
 ;
 entry:
@@ -98,7 +102,8 @@ entry:
 define double @nearbyint_01() #0 {
 ; CHECK-LABEL: @nearbyint_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    ret double 1.100000e+01
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.nearbyint.f64(double 1.050000e+01) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
   %result = call double @llvm.experimental.constrained.nearbyint.f64(double 1.050000e+01, metadata !"round.upward", metadata !"fpexcept.ignore") #0
@@ -109,6 +114,7 @@ entry:
 define double @nearbyint_02() #0 {
 ; CHECK-LABEL: @nearbyint_02(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.nearbyint.f64(double 1.050000e+01) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -120,7 +126,7 @@ entry:
 define double @nearbyint_03() #0 {
 ; CHECK-LABEL: @nearbyint_03(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.nearbyint.f64(double 1.050000e+01, metadata !"round.towardzero", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.nearbyint.f64(double 1.050000e+01) [ "fp.control"(metadata !"rtz") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -132,7 +138,7 @@ entry:
 define double @nearbyint_04() #0 {
 ; CHECK-LABEL: @nearbyint_04(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.nearbyint.f64(double 1.050000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.nearbyint.f64(double 1.050000e+01) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -144,8 +150,8 @@ entry:
 define double @nearbyint_05() #0 {
 ; CHECK-LABEL: @nearbyint_05(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.nearbyint.f64(double 1.050000e+01, metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    ret double [[RESULT]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.nearbyint.f64(double 1.050000e+01)
+; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
   %result = call double @llvm.experimental.constrained.nearbyint.f64(double 1.050000e+01, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
@@ -156,8 +162,8 @@ entry:
 define double @nonfinite_01() #0 {
 ; CHECK-LABEL: @nonfinite_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.trunc.f64(double 0x7FF4000000000000, metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    ret double [[RESULT]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.trunc.f64(double 0x7FF4000000000000)
+; CHECK-NEXT:    ret double 0x7FFC000000000000
 ;
 entry:
   %result = call double @llvm.experimental.constrained.trunc.f64(double 0x7ff4000000000000, metadata !"fpexcept.strict") #0
@@ -168,7 +174,8 @@ entry:
 define double @nonfinite_02() #0 {
 ; CHECK-LABEL: @nonfinite_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    ret double 0x7FF8000000000000
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.trunc.f64(double 0x7FF4000000000000) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret double 0x7FFC000000000000
 ;
 entry:
   %result = call double @llvm.experimental.constrained.trunc.f64(double 0x7ff4000000000000, metadata !"fpexcept.ignore") #0
@@ -179,7 +186,7 @@ entry:
 define double @nonfinite_03() #0 {
 ; CHECK-LABEL: @nonfinite_03(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.trunc.f64(double 0x7FF8000000000000, metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.trunc.f64(double 0x7FF8000000000000)
 ; CHECK-NEXT:    ret double 0x7FF8000000000000
 ;
 entry:
@@ -191,7 +198,7 @@ entry:
 define double @nonfinite_04() #0 {
 ; CHECK-LABEL: @nonfinite_04(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.trunc.f64(double 0x7FF0000000000000, metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.trunc.f64(double 0x7FF0000000000000)
 ; CHECK-NEXT:    ret double 0x7FF0000000000000
 ;
 entry:
@@ -203,7 +210,7 @@ entry:
 define double @rint_01() #0 {
 ; CHECK-LABEL: @rint_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.rint.f64(double 1.000000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.rint.f64(double 1.000000e+01) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -215,8 +222,8 @@ entry:
 define double @rint_02() #0 {
 ; CHECK-LABEL: @rint_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.rint.f64(double 1.010000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    ret double [[RESULT]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.rint.f64(double 1.010000e+01) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
   %result = call double @llvm.experimental.constrained.rint.f64(double 1.010000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -227,6 +234,7 @@ entry:
 define double @rint_03() #0 {
 ; CHECK-LABEL: @rint_03(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.rint.f64(double 1.010000e+01) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -237,6 +245,7 @@ entry:
 define float @fadd_01() #0 {
 ; CHECK-LABEL: @fadd_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 1.000000e+01, float 2.000000e+01) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 3.000000e+01
 ;
 entry:
@@ -249,6 +258,7 @@ entry:
 define double @fadd_02() #0 {
 ; CHECK-LABEL: @fadd_02(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 2.000000e+00
 ;
 entry:
@@ -259,6 +269,7 @@ entry:
 define double @fadd_03() #0 {
 ; CHECK-LABEL: @fadd_03(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 0x4000000000000001
 ;
 entry:
@@ -270,7 +281,7 @@ entry:
 define double @fadd_04() #0 {
 ; CHECK-LABEL: @fadd_04(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret double [[RESULT]]
 ;
 entry:
@@ -282,7 +293,7 @@ entry:
 define double @fadd_05() #0 {
 ; CHECK-LABEL: @fadd_05(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double 1.000000e+00, double 2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret double 3.000000e+00
 ;
 entry:
@@ -294,7 +305,6 @@ entry:
 define double @fadd_06() #0 {
 ; CHECK-LABEL: @fadd_06(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double 1.000000e+00, double 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
 ; CHECK-NEXT:    ret double 3.000000e+00
 ;
 entry:
@@ -306,7 +316,7 @@ entry:
 define double @fadd_07() #0 {
 ; CHECK-LABEL: @fadd_07(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001, metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[RESULT]]
 ;
 entry:
@@ -318,6 +328,7 @@ entry:
 define double @fadd_08() #0 {
 ; CHECK-LABEL: @fadd_08(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 0x7FEFFFFFFFFFFFFF, double 0x7FEFFFFFFFFFFFFF) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 0x7FF0000000000000
 ;
 entry:
@@ -328,7 +339,7 @@ entry:
 define double @fadd_09() #0 {
 ; CHECK-LABEL: @fadd_09(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double 0x7FEFFFFFFFFFFFFF, double 0x7FEFFFFFFFFFFFFF, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call double @llvm.fadd.f64(double 0x7FEFFFFFFFFFFFFF, double 0x7FEFFFFFFFFFFFFF) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret double [[RESULT]]
 ;
 entry:
@@ -339,6 +350,7 @@ entry:
 define half @fadd_10() #0 {
 ; CHECK-LABEL: @fadd_10(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call half @llvm.fadd.f16(half 0xH3C00, half 0xH4000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret half 0xH4200
 ;
 entry:
@@ -349,6 +361,7 @@ entry:
 define bfloat @fadd_11() #0 {
 ; CHECK-LABEL: @fadd_11(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call bfloat @llvm.fadd.bf16(bfloat 0xR3F80, bfloat 0xR4000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret bfloat 0xR4040
 ;
 entry:
@@ -359,6 +372,7 @@ entry:
 define double @fsub_01() #0 {
 ; CHECK-LABEL: @fsub_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fsub.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double -1.000000e+00
 ;
 entry:
@@ -369,6 +383,7 @@ entry:
 define double @fmul_01() #0 {
 ; CHECK-LABEL: @fmul_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fmul.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 2.000000e+00
 ;
 entry:
@@ -379,6 +394,7 @@ entry:
 define double @fdiv_01() #0 {
 ; CHECK-LABEL: @fdiv_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fdiv.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 5.000000e-01
 ;
 entry:
@@ -389,6 +405,7 @@ entry:
 define double @frem_01() #0 {
 ; CHECK-LABEL: @frem_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.frem.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 1.000000e+00
 ;
 entry:
@@ -399,6 +416,7 @@ entry:
 define double @fma_01() #0 {
 ; CHECK-LABEL: @fma_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fma.f64(double 1.000000e+00, double 2.000000e+00, double 3.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 5.000000e+00
 ;
 entry:
@@ -409,6 +427,7 @@ entry:
 define double @fmuladd_01() #0 {
 ; CHECK-LABEL: @fmuladd_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fmuladd.f64(double 1.000000e+00, double 2.000000e+00, double 3.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 5.000000e+00
 ;
 entry:
@@ -421,6 +440,7 @@ entry:
 define i1 @cmp_eq_01() #0 {
 ; CHECK-LABEL: @cmp_eq_01(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call i1 @llvm.fcmp.f64(double 1.000000e+00, double 2.000000e+00, metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 entry:
@@ -431,6 +451,7 @@ entry:
 define i1 @cmp_eq_02() #0 {
 ; CHECK-LABEL: @cmp_eq_02(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call i1 @llvm.fcmp.f64(double 2.000000e+00, double 2.000000e+00, metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 true
 ;
 entry:
@@ -441,6 +462,7 @@ entry:
 define <2 x i1> @cmp_eq_02a() #0 {
 ; CHECK-LABEL: @cmp_eq_02a(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> <double 2.000000e+00, double 3.000000e+00>, <2 x double> splat (double 2.000000e+00), metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x i1> <i1 true, i1 false>
 ;
 entry:
@@ -451,6 +473,7 @@ entry:
 define i1 @cmp_eq_03() #0 {
 ; CHECK-LABEL: @cmp_eq_03(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call i1 @llvm.fcmp.f64(double 2.000000e+00, double 0x7FF8000000000000, metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 entry:
@@ -461,6 +484,7 @@ entry:
 define i1 @cmp_eq_04() #0 {
 ; CHECK-LABEL: @cmp_eq_04(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RESULT1:%.*]] = call i1 @llvm.fcmp.f64(double 2.000000e+00, double 0x7FF4000000000000, metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 entry:
@@ -492,8 +516,8 @@ entry:
 define i1 @cmp_eq_nan_01() #0 {
 ; CHECK-LABEL: @cmp_eq_nan_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double 0x7FF4000000000000, double 1.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[RESULT]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call i1 @llvm.fcmp.f64(double 0x7FF4000000000000, double 1.000000e+00, metadata !"oeq") [ "fp.except"(metadata !"strict") ]
+; CHECK-NEXT:    ret i1 [[RESULT1]]
 ;
 entry:
   %result = call i1 @llvm.experimental.constrained.fcmp.f64(double 0x7ff4000000000000, double 1.0, metadata !"oeq", metadata !"fpexcept.strict") #0
@@ -503,11 +527,11 @@ entry:
 define i1 @cmp_eq_nan_02() #0 {
 ; CHECK-LABEL: @cmp_eq_nan_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double 0x7FF4000000000000, double 1.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call i1 @llvm.fcmps.f64(double 0x7FF4000000000000, double 1.000000e+00, metadata !"oeq")
 ; CHECK-NEXT:    ret i1 [[RESULT]]
 ;
 entry:
-  %result = call i1 @llvm.experimental.constrained.fcmps.f64(double 0x7ff4000000000000, double 1.0, metadata !"oeq", metadata !"fpexcept.strict") #0
+  %result = call i1 @llvm.fcmps.f64(double 0x7ff4000000000000, double 1.0, metadata !"oeq")
   ret i1 %result
 }
 
@@ -515,7 +539,7 @@ entry:
 define i1 @cmp_eq_nan_03() #0 {
 ; CHECK-LABEL: @cmp_eq_nan_03(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double 0x7FF8000000000000, double 1.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT1:%.*]] = call i1 @llvm.fcmp.f64(double 0x7FF8000000000000, double 1.000000e+00, metadata !"oeq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 entry:
@@ -526,11 +550,11 @@ entry:
 define i1 @cmp_eq_nan_04() #0 {
 ; CHECK-LABEL: @cmp_eq_nan_04(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double 0x7FF8000000000000, double 1.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RESULT:%.*]] = call i1 @llvm.fcmps.f64(double 0x7FF8000000000000, double 1.000000e+00, metadata !"oeq")
 ; CHECK-NEXT:    ret i1 [[RESULT]]
 ;
 entry:
-  %result = call i1 @llvm.experimental.constrained.fcmps.f64(double 0x7ff8000000000000, double 1.0, metadata !"oeq", metadata !"fpexcept.strict") #0
+  %result = call i1 @llvm.fcmps.f64(double 0x7ff8000000000000, double 1.0, metadata !"oeq")
   ret i1 %result
 }
 
@@ -555,5 +579,5 @@ declare double @llvm.experimental.constrained.fma.f64(double, double, double, me
 declare double @llvm.experimental.constrained.fmuladd.f64(double, double, double, metadata, metadata)
 declare i1 @llvm.experimental.constrained.fcmp.f64(double, double, metadata, metadata)
 declare <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double>, <2 x double>, metadata, metadata)
-declare i1 @llvm.experimental.constrained.fcmps.f64(double, double, metadata, metadata)
+declare i1 @llvm.fcmps.f64(double, double, metadata)
 
diff --git a/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll b/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll
index 963953ad2b3bc..6e06e2cdca1a4 100644
--- a/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll
@@ -4,7 +4,8 @@
 ;; x * 0 ==> 0 when no-nans and no-signed-zero
 define float @mul_zero_1(float %a) #0 {
 ; CHECK-LABEL: @mul_zero_1(
-; CHECK-NEXT:    ret float 0.000000e+00
+; CHECK-NEXT:    [[B1:%.*]] = call nnan nsz float @llvm.fmul.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[B1]]
 ;
   %b = call nsz nnan float @llvm.experimental.constrained.fmul.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %b
@@ -12,7 +13,8 @@ define float @mul_zero_1(float %a) #0 {
 
 define float @mul_zero_2(float %a) #0 {
 ; CHECK-LABEL: @mul_zero_2(
-; CHECK-NEXT:    ret float 0.000000e+00
+; CHECK-NEXT:    [[B1:%.*]] = call fast float @llvm.fmul.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[B1]]
 ;
   %b = call fast float @llvm.experimental.constrained.fmul.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %b
@@ -20,7 +22,8 @@ define float @mul_zero_2(float %a) #0 {
 
 define <2 x float> @mul_zero_nsz_nnan_vec_poison(<2 x float> %a) #0 {
 ; CHECK-LABEL: @mul_zero_nsz_nnan_vec_poison(
-; CHECK-NEXT:    ret <2 x float> zeroinitializer
+; CHECK-NEXT:    [[B1:%.*]] = call nnan nsz <2 x float> @llvm.fmul.v2f32(<2 x float> [[A:%.*]], <2 x float> <float 0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[B1]]
 ;
   %b = call nsz nnan <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> %a, <2 x float><float 0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %b
@@ -29,7 +32,7 @@ define <2 x float> @mul_zero_nsz_nnan_vec_poison(<2 x float> %a) #0 {
 ;; x * 0 =/=> 0 when there could be nans or -0
 define float @no_mul_zero_1(float %a) #0 {
 ; CHECK-LABEL: @no_mul_zero_1(
-; CHECK-NEXT:    [[B:%.*]] = call nsz float @llvm.experimental.constrained.fmul.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[B:%.*]] = call nsz float @llvm.fmul.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[B]]
 ;
   %b = call nsz float @llvm.experimental.constrained.fmul.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -38,7 +41,7 @@ define float @no_mul_zero_1(float %a) #0 {
 
 define float @no_mul_zero_2(float %a) #0 {
 ; CHECK-LABEL: @no_mul_zero_2(
-; CHECK-NEXT:    [[B:%.*]] = call nnan float @llvm.experimental.constrained.fmul.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[B:%.*]] = call nnan float @llvm.fmul.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[B]]
 ;
   %b = call nnan float @llvm.experimental.constrained.fmul.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -47,7 +50,7 @@ define float @no_mul_zero_2(float %a) #0 {
 
 define float @no_mul_zero_3(float %a) #0 {
 ; CHECK-LABEL: @no_mul_zero_3(
-; CHECK-NEXT:    [[B:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[B:%.*]] = call float @llvm.fmul.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[B]]
 ;
   %b = call float @llvm.experimental.constrained.fmul.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -58,8 +61,8 @@ define float @no_mul_zero_3(float %a) #0 {
 
 define float @fadd_binary_fnegx(float %x) #0 {
 ; CHECK-LABEL: @fadd_binary_fnegx(
-; CHECK-NEXT:    [[NEGX:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call nnan float @llvm.experimental.constrained.fadd.f32(float [[NEGX]], float [[X]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEGX2:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call nnan float @llvm.fadd.f32(float [[NEGX2]], float [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %negx = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -69,7 +72,9 @@ define float @fadd_binary_fnegx(float %x) #0 {
 
 define float @fadd_unary_fnegx(float %x) #0 {
 ; CHECK-LABEL: @fadd_unary_fnegx(
-; CHECK-NEXT:    ret float 0.000000e+00
+; CHECK-NEXT:    [[NEGX:%.*]] = fneg float [[X:%.*]]
+; CHECK-NEXT:    [[R1:%.*]] = call nnan float @llvm.fadd.f32(float [[NEGX]], float [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %negx = fneg float %x
   %r = call nnan float @llvm.experimental.constrained.fadd.f32(float %negx, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -80,8 +85,8 @@ define float @fadd_unary_fnegx(float %x) #0 {
 
 define <2 x float> @fadd_binary_fnegx_commute_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_binary_fnegx_commute_vec(
-; CHECK-NEXT:    [[NEGX:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[X]], <2 x float> [[NEGX]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEGX2:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[X]], <2 x float> [[NEGX2]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[R]]
 ;
   %negx = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float -0.0, float -0.0>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -91,7 +96,9 @@ define <2 x float> @fadd_binary_fnegx_commute_vec(<2 x float> %x) #0 {
 
 define <2 x float> @fadd_unary_fnegx_commute_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_unary_fnegx_commute_vec(
-; CHECK-NEXT:    ret <2 x float> zeroinitializer
+; CHECK-NEXT:    [[NEGX:%.*]] = fneg <2 x float> [[X:%.*]]
+; CHECK-NEXT:    [[R1:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[X]], <2 x float> [[NEGX]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[R1]]
 ;
   %negx = fneg <2 x float> %x
   %r = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> %negx, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -100,8 +107,8 @@ define <2 x float> @fadd_unary_fnegx_commute_vec(<2 x float> %x) #0 {
 
 define <2 x float> @fadd_fnegx_commute_vec_poison(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_fnegx_commute_vec_poison(
-; CHECK-NEXT:    [[NEGX:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[X]], <2 x float> [[NEGX]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEGX2:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[X]], <2 x float> [[NEGX2]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[R]]
 ;
   %negx = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float poison, float -0.0>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -114,8 +121,8 @@ define <2 x float> @fadd_fnegx_commute_vec_poison(<2 x float> %x) #0 {
 
 define float @fadd_binary_fneg_nan(float %x) #0 {
 ; CHECK-LABEL: @fadd_binary_fneg_nan(
-; CHECK-NEXT:    [[T:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call ninf float @llvm.experimental.constrained.fadd.f32(float [[T]], float [[X]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T2:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call ninf float @llvm.fadd.f32(float [[T2]], float [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[COULD_BE_NAN]]
 ;
   %t = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.0, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -126,7 +133,7 @@ define float @fadd_binary_fneg_nan(float %x) #0 {
 define float @fadd_unary_fneg_nan(float %x) #0 {
 ; CHECK-LABEL: @fadd_unary_fneg_nan(
 ; CHECK-NEXT:    [[T:%.*]] = fneg nnan float [[X:%.*]]
-; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call ninf float @llvm.experimental.constrained.fadd.f32(float [[T]], float [[X]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call ninf float @llvm.fadd.f32(float [[T]], float [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[COULD_BE_NAN]]
 ;
   %t = fneg nnan float %x
@@ -136,8 +143,8 @@ define float @fadd_unary_fneg_nan(float %x) #0 {
 
 define float @fadd_binary_fneg_nan_commute(float %x) #0 {
 ; CHECK-LABEL: @fadd_binary_fneg_nan_commute(
-; CHECK-NEXT:    [[T:%.*]] = call nnan ninf float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X]], float [[T]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T2:%.*]] = call nnan ninf float @llvm.fsub.f32(float -0.000000e+00, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call float @llvm.fadd.f32(float [[X]], float [[T2]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[COULD_BE_NAN]]
 ;
   %t = call nnan ninf float @llvm.experimental.constrained.fsub.f32(float -0.0, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -148,7 +155,7 @@ define float @fadd_binary_fneg_nan_commute(float %x) #0 {
 define float @fadd_unary_fneg_nan_commute(float %x) #0 {
 ; CHECK-LABEL: @fadd_unary_fneg_nan_commute(
 ; CHECK-NEXT:    [[T:%.*]] = fneg nnan ninf float [[X:%.*]]
-; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X]], float [[T]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[COULD_BE_NAN:%.*]] = call float @llvm.fadd.f32(float [[X]], float [[T]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[COULD_BE_NAN]]
 ;
   %t = fneg nnan ninf float %x
@@ -160,8 +167,8 @@ define float @fadd_unary_fneg_nan_commute(float %x) #0 {
 
 define float @fadd_fsub_nnan_ninf(float %x) #0 {
 ; CHECK-LABEL: @fadd_fsub_nnan_ninf(
-; CHECK-NEXT:    [[SUB:%.*]] = call nnan ninf float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[ZERO:%.*]] = call nnan ninf float @llvm.experimental.constrained.fadd.f32(float [[X]], float [[SUB]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SUB2:%.*]] = call nnan ninf float @llvm.fsub.f32(float 0.000000e+00, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[ZERO:%.*]] = call nnan ninf float @llvm.fadd.f32(float [[X]], float [[SUB2]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[ZERO]]
 ;
   %sub = call nnan ninf float @llvm.experimental.constrained.fsub.f32(float 0.0, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -173,8 +180,8 @@ define float @fadd_fsub_nnan_ninf(float %x) #0 {
 
 define <2 x float> @fadd_fsub_nnan_ninf_commute_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_fsub_nnan_ninf_commute_vec(
-; CHECK-NEXT:    [[SUB:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[ZERO:%.*]] = call nnan ninf <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[SUB]], <2 x float> [[X]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SUB2:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[ZERO:%.*]] = call nnan ninf <2 x float> @llvm.fadd.v2f32(<2 x float> [[SUB2]], <2 x float> [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[ZERO]]
 ;
   %sub = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -187,8 +194,8 @@ define <2 x float> @fadd_fsub_nnan_ninf_commute_vec(<2 x float> %x) #0 {
 
 define float @fadd_fsub_nnan(float %x) #0 {
 ; CHECK-LABEL: @fadd_fsub_nnan(
-; CHECK-NEXT:    [[SUB:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[ZERO:%.*]] = call nnan float @llvm.experimental.constrained.fadd.f32(float [[SUB]], float [[X]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SUB2:%.*]] = call float @llvm.fsub.f32(float 0.000000e+00, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[ZERO:%.*]] = call nnan float @llvm.fadd.f32(float [[SUB2]], float [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[ZERO]]
 ;
   %sub = call float @llvm.experimental.constrained.fsub.f32(float 0.0, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -200,9 +207,11 @@ define float @fadd_fsub_nnan(float %x) #0 {
 define float @fsub_x_x(float %a) #0 {
 ; X - X ==> 0
 ; CHECK-LABEL: @fsub_x_x(
-; CHECK-NEXT:    [[NO_ZERO1:%.*]] = call ninf float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[NO_ZERO2:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[NO_ZERO:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[NO_ZERO1]], float [[NO_ZERO2]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[ZERO15:%.*]] = call nnan float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO16:%.*]] = call ninf float @llvm.fsub.f32(float [[A]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO27:%.*]] = call float @llvm.fsub.f32(float [[A]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO3:%.*]] = call float @llvm.fadd.f32(float [[NO_ZERO16]], float [[NO_ZERO27]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO:%.*]] = call nsz float @llvm.fadd.f32(float [[NO_ZERO3]], float [[ZERO15]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[NO_ZERO]]
 ;
   %zero1 = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -221,8 +230,8 @@ define float @fsub_x_x(float %a) #0 {
 ; fsub nsz 0.0, (fsub 0.0, X) ==> X
 define float @fsub_0_0_x(float %a) #0 {
 ; CHECK-LABEL: @fsub_0_0_x(
-; CHECK-NEXT:    [[T1:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %t1 = call float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -233,7 +242,9 @@ define float @fsub_0_0_x(float %a) #0 {
 ; fsub nsz 0.0, (fneg X) ==> X
 define float @fneg_x(float %a) #0 {
 ; CHECK-LABEL: @fneg_x(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[T1:%.*]] = fneg float [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %t1 = fneg float %a
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -242,8 +253,8 @@ define float @fneg_x(float %a) #0 {
 
 define <2 x float> @fsub_0_0_x_vec_poison1(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fsub_0_0_x_vec_poison1(
-; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nsz <2 x float> @llvm.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %t1 = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float 0.0, float poison>, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -253,7 +264,9 @@ define <2 x float> @fsub_0_0_x_vec_poison1(<2 x float> %a) #0 {
 
 define <2 x float> @fneg_x_vec_poison1(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fneg_x_vec_poison1(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[T1:%.*]] = fneg <2 x float> [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %t1 = fneg <2 x float> %a
   %ret = call nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float 0.0, float poison>, <2 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -262,8 +275,8 @@ define <2 x float> @fneg_x_vec_poison1(<2 x float> %a) #0 {
 
 define <2 x float> @fsub_0_0_x_vec_poison2(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fsub_0_0_x_vec_poison2(
-; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nsz <2 x float> @llvm.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %t1 = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> zeroinitializer, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -275,7 +288,8 @@ define <2 x float> @fsub_0_0_x_vec_poison2(<2 x float> %a) #0 {
 
 define <2 x float> @fadd_zero_nsz_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_zero_nsz_vec(
-; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
+; CHECK-NEXT:    [[X:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[X1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[X]]
 ;
   %r = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %r
@@ -283,7 +297,8 @@ define <2 x float> @fadd_zero_nsz_vec(<2 x float> %x) #0 {
 
 define <2 x float> @fadd_zero_nsz_vec_poison(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_zero_nsz_vec_poison(
-; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
+; CHECK-NEXT:    [[X:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[X1:%.*]], <2 x float> <float 0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[X]]
 ;
   %r = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> <float 0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %r
@@ -292,9 +307,9 @@ define <2 x float> @fadd_zero_nsz_vec_poison(<2 x float> %x) #0 {
 define float @nofold_fadd_x_0(float %a) #0 {
 ; Dont fold
 ; CHECK-LABEL: @nofold_fadd_x_0(
-; CHECK-NEXT:    [[NO_ZERO1:%.*]] = call ninf float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[NO_ZERO2:%.*]] = call nnan float @llvm.experimental.constrained.fadd.f32(float [[A]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[NO_ZERO:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[NO_ZERO1]], float [[NO_ZERO2]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NO_ZERO11:%.*]] = call ninf float @llvm.fadd.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO22:%.*]] = call nnan float @llvm.fadd.f32(float [[A]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO:%.*]] = call float @llvm.fadd.f32(float [[NO_ZERO11]], float [[NO_ZERO22]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[NO_ZERO]]
 ;
   %no_zero1 = call ninf float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -305,7 +320,8 @@ define float @nofold_fadd_x_0(float %a) #0 {
 
 define float @fold_fadd_nsz_x_0(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %add
@@ -315,8 +331,8 @@ define float @fold_fadd_nsz_x_0(float %a) #0 {
 
 define float @fold_fadd_cannot_be_neg0_nsz_src_x_0(float %a, float %b) #0 {
 ; CHECK-LABEL: @fold_fadd_cannot_be_neg0_nsz_src_x_0(
-; CHECK-NEXT:    [[NSZ:%.*]] = call nsz float @llvm.experimental.constrained.fmul.f32(float [[A:%.*]], float [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[NSZ]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NSZ2:%.*]] = call nsz float @llvm.fmul.f32(float [[A:%.*]], float [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[NSZ2]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %nsz = call nsz float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -326,7 +342,8 @@ define float @fold_fadd_cannot_be_neg0_nsz_src_x_0(float %a, float %b) #0 {
 
 define float @fold_fadd_cannot_be_neg0_fabs_src_x_0(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_cannot_be_neg0_fabs_src_x_0(
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[FABS1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fadd.f32(float [[FABS1]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %a) #0
@@ -338,9 +355,9 @@ define float @fold_fadd_cannot_be_neg0_fabs_src_x_0(float %a) #0 {
 
 define float @fold_fadd_cannot_be_neg0_sqrt_nsz_src_x_0(float %a, float %b) #0 {
 ; CHECK-LABEL: @fold_fadd_cannot_be_neg0_sqrt_nsz_src_x_0(
-; CHECK-NEXT:    [[NSZ:%.*]] = call nsz float @llvm.experimental.constrained.fmul.f32(float [[A:%.*]], float [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[SQRT:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[NSZ]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[SQRT]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NSZ3:%.*]] = call nsz float @llvm.fmul.f32(float [[A:%.*]], float [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRT1:%.*]] = call float @llvm.sqrt.f32(float [[NSZ3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[SQRT1]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %nsz = call nsz float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -353,9 +370,9 @@ define float @fold_fadd_cannot_be_neg0_sqrt_nsz_src_x_0(float %a, float %b) #0 {
 
 define float @fold_fadd_cannot_be_neg0_canonicalize_nsz_src_x_0(float %a, float %b) #0 {
 ; CHECK-LABEL: @fold_fadd_cannot_be_neg0_canonicalize_nsz_src_x_0(
-; CHECK-NEXT:    [[NSZ:%.*]] = call nsz float @llvm.experimental.constrained.fmul.f32(float [[A:%.*]], float [[B:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[CANON:%.*]] = call float @llvm.canonicalize.f32(float [[NSZ]]) #[[ATTR0]]
-; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[CANON]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NSZ2:%.*]] = call nsz float @llvm.fmul.f32(float [[A:%.*]], float [[B:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[CANON:%.*]] = call float @llvm.canonicalize.f32(float [[NSZ2]]) #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[CANON]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %nsz = call nsz float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -369,7 +386,8 @@ define float @fold_fadd_cannot_be_neg0_canonicalize_nsz_src_x_0(float %a, float
 
 define double @fdiv_zero_by_x(double %x) #0 {
 ; CHECK-LABEL: @fdiv_zero_by_x(
-; CHECK-NEXT:    ret double 0.000000e+00
+; CHECK-NEXT:    [[R1:%.*]] = call nnan nsz double @llvm.fdiv.f64(double 0.000000e+00, double [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret double [[R1]]
 ;
   %r = call nnan nsz double @llvm.experimental.constrained.fdiv.f64(double 0.0, double %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %r
@@ -377,7 +395,8 @@ define double @fdiv_zero_by_x(double %x) #0 {
 
 define <2 x double> @fdiv_zero_by_x_vec_poison(<2 x double> %x) #0 {
 ; CHECK-LABEL: @fdiv_zero_by_x_vec_poison(
-; CHECK-NEXT:    ret <2 x double> zeroinitializer
+; CHECK-NEXT:    [[R1:%.*]] = call nnan nsz <2 x double> @llvm.fdiv.v2f64(<2 x double> <double 0.000000e+00, double poison>, <2 x double> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x double> [[R1]]
 ;
   %r = call nnan nsz <2 x double> @llvm.experimental.constrained.fdiv.v2f64(<2 x double> <double 0.0, double poison>, <2 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x double> %r
@@ -388,7 +407,8 @@ define <2 x double> @fdiv_zero_by_x_vec_poison(<2 x double> %x) #0 {
 
 define double @frem_zero_by_x(double %x) #0 {
 ; CHECK-LABEL: @frem_zero_by_x(
-; CHECK-NEXT:    ret double 0.000000e+00
+; CHECK-NEXT:    [[R1:%.*]] = call nnan double @llvm.frem.f64(double 0.000000e+00, double [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret double [[R1]]
 ;
   %r = call nnan double @llvm.experimental.constrained.frem.f64(double 0.0, double %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %r
@@ -396,7 +416,8 @@ define double @frem_zero_by_x(double %x) #0 {
 
 define <2 x double> @frem_poszero_by_x_vec_poison(<2 x double> %x) #0 {
 ; CHECK-LABEL: @frem_poszero_by_x_vec_poison(
-; CHECK-NEXT:    ret <2 x double> zeroinitializer
+; CHECK-NEXT:    [[R1:%.*]] = call nnan <2 x double> @llvm.frem.v2f64(<2 x double> <double 0.000000e+00, double poison>, <2 x double> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x double> [[R1]]
 ;
   %r = call nnan <2 x double> @llvm.experimental.constrained.frem.v2f64(<2 x double> <double 0.0, double poison>, <2 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x double> %r
@@ -407,7 +428,8 @@ define <2 x double> @frem_poszero_by_x_vec_poison(<2 x double> %x) #0 {
 
 define double @frem_negzero_by_x(double %x) #0 {
 ; CHECK-LABEL: @frem_negzero_by_x(
-; CHECK-NEXT:    ret double -0.000000e+00
+; CHECK-NEXT:    [[R1:%.*]] = call nnan double @llvm.frem.f64(double -0.000000e+00, double [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret double [[R1]]
 ;
   %r = call nnan double @llvm.experimental.constrained.frem.f64(double -0.0, double %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %r
@@ -415,7 +437,8 @@ define double @frem_negzero_by_x(double %x) #0 {
 
 define <2 x double> @frem_negzero_by_x_vec_poison(<2 x double> %x) #0 {
 ; CHECK-LABEL: @frem_negzero_by_x_vec_poison(
-; CHECK-NEXT:    ret <2 x double> splat (double -0.000000e+00)
+; CHECK-NEXT:    [[R1:%.*]] = call nnan <2 x double> @llvm.frem.v2f64(<2 x double> <double poison, double -0.000000e+00>, <2 x double> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x double> [[R1]]
 ;
   %r = call nnan <2 x double> @llvm.experimental.constrained.frem.v2f64(<2 x double> <double poison, double -0.0>, <2 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x double> %r
@@ -423,7 +446,8 @@ define <2 x double> @frem_negzero_by_x_vec_poison(<2 x double> %x) #0 {
 
 define float @fdiv_self(float %f) #0 {
 ; CHECK-LABEL: @fdiv_self(
-; CHECK-NEXT:    ret float 1.000000e+00
+; CHECK-NEXT:    [[DIV1:%.*]] = call nnan float @llvm.fdiv.f32(float [[F:%.*]], float [[F]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[DIV1]]
 ;
   %div = call nnan float @llvm.experimental.constrained.fdiv.f32(float %f, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %div
@@ -431,7 +455,7 @@ define float @fdiv_self(float %f) #0 {
 
 define float @fdiv_self_invalid(float %f) #0 {
 ; CHECK-LABEL: @fdiv_self_invalid(
-; CHECK-NEXT:    [[DIV:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[F:%.*]], float [[F]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[DIV:%.*]] = call float @llvm.fdiv.f32(float [[F:%.*]], float [[F]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[DIV]]
 ;
   %div = call float @llvm.experimental.constrained.fdiv.f32(float %f, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -440,8 +464,8 @@ define float @fdiv_self_invalid(float %f) #0 {
 
 define float @fdiv_neg1(float %f) #0 {
 ; CHECK-LABEL: @fdiv_neg1(
-; CHECK-NEXT:    [[NEG:%.*]] = call fast float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.experimental.constrained.fdiv.f32(float [[NEG]], float [[F]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEG1:%.*]] = call fast float @llvm.fsub.f32(float -0.000000e+00, float [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.fdiv.f32(float [[NEG1]], float [[F]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[DIV]]
 ;
   %neg = call fast float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -451,8 +475,8 @@ define float @fdiv_neg1(float %f) #0 {
 
 define float @fdiv_neg2(float %f) #0 {
 ; CHECK-LABEL: @fdiv_neg2(
-; CHECK-NEXT:    [[NEG:%.*]] = call fast float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.experimental.constrained.fdiv.f32(float [[NEG]], float [[F]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEG1:%.*]] = call fast float @llvm.fsub.f32(float 0.000000e+00, float [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.fdiv.f32(float [[NEG1]], float [[F]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[DIV]]
 ;
   %neg = call fast float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -462,8 +486,8 @@ define float @fdiv_neg2(float %f) #0 {
 
 define float @fdiv_neg_invalid(float %f) #0 {
 ; CHECK-LABEL: @fdiv_neg_invalid(
-; CHECK-NEXT:    [[NEG:%.*]] = call fast float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[DIV:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[NEG]], float [[F]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEG1:%.*]] = call fast float @llvm.fsub.f32(float -0.000000e+00, float [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[DIV:%.*]] = call float @llvm.fdiv.f32(float [[NEG1]], float [[F]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[DIV]]
 ;
   %neg = call fast float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -473,8 +497,8 @@ define float @fdiv_neg_invalid(float %f) #0 {
 
 define float @fdiv_neg_swapped1(float %f) #0 {
 ; CHECK-LABEL: @fdiv_neg_swapped1(
-; CHECK-NEXT:    [[NEG:%.*]] = call fast float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.experimental.constrained.fdiv.f32(float [[F]], float [[NEG]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEG1:%.*]] = call fast float @llvm.fsub.f32(float -0.000000e+00, float [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.fdiv.f32(float [[F]], float [[NEG1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[DIV]]
 ;
   %neg = call fast float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -484,8 +508,8 @@ define float @fdiv_neg_swapped1(float %f) #0 {
 
 define float @fdiv_neg_swapped2(float %f) #0 {
 ; CHECK-LABEL: @fdiv_neg_swapped2(
-; CHECK-NEXT:    [[NEG:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.experimental.constrained.fdiv.f32(float [[F]], float [[NEG]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEG1:%.*]] = call float @llvm.fsub.f32(float 0.000000e+00, float [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[DIV:%.*]] = call nnan float @llvm.fdiv.f32(float [[F]], float [[NEG1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[DIV]]
 ;
   %neg = call float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -495,8 +519,8 @@ define float @fdiv_neg_swapped2(float %f) #0 {
 
 define <2 x float> @fdiv_neg_vec_poison_elt(<2 x float> %f) #0 {
 ; CHECK-LABEL: @fdiv_neg_vec_poison_elt(
-; CHECK-NEXT:    [[NEG:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[DIV:%.*]] = call nnan <2 x float> @llvm.experimental.constrained.fdiv.v2f32(<2 x float> [[F]], <2 x float> [[NEG]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEG1:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[DIV:%.*]] = call nnan <2 x float> @llvm.fdiv.v2f32(<2 x float> [[F]], <2 x float> [[NEG1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[DIV]]
 ;
   %neg = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -511,8 +535,8 @@ declare double @llvm.sqrt.f64(double)
 
 define double @sqrt_squared(double %f) #0 {
 ; CHECK-LABEL: @sqrt_squared(
-; CHECK-NEXT:    [[SQRT:%.*]] = call double @llvm.experimental.constrained.sqrt.f64(double [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nnan nsz double @llvm.experimental.constrained.fmul.f64(double [[SQRT]], double [[SQRT]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SQRT1:%.*]] = call double @llvm.sqrt.f64(double [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nnan nsz double @llvm.fmul.f64(double [[SQRT1]], double [[SQRT1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[MUL]]
 ;
   %sqrt = call double @llvm.experimental.constrained.sqrt.f64(double %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -524,8 +548,8 @@ define double @sqrt_squared(double %f) #0 {
 
 define double @sqrt_squared_not_fast_enough1(double %f) #0 {
 ; CHECK-LABEL: @sqrt_squared_not_fast_enough1(
-; CHECK-NEXT:    [[SQRT:%.*]] = call double @llvm.experimental.constrained.sqrt.f64(double [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[MUL:%.*]] = call nnan nsz double @llvm.experimental.constrained.fmul.f64(double [[SQRT]], double [[SQRT]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SQRT1:%.*]] = call double @llvm.sqrt.f64(double [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[MUL:%.*]] = call nnan nsz double @llvm.fmul.f64(double [[SQRT1]], double [[SQRT1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[MUL]]
 ;
   %sqrt = call double @llvm.experimental.constrained.sqrt.f64(double %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -535,8 +559,8 @@ define double @sqrt_squared_not_fast_enough1(double %f) #0 {
 
 define double @sqrt_squared_not_fast_enough2(double %f) #0 {
 ; CHECK-LABEL: @sqrt_squared_not_fast_enough2(
-; CHECK-NEXT:    [[SQRT:%.*]] = call double @llvm.experimental.constrained.sqrt.f64(double [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nnan double @llvm.experimental.constrained.fmul.f64(double [[SQRT]], double [[SQRT]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SQRT1:%.*]] = call double @llvm.sqrt.f64(double [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nnan double @llvm.fmul.f64(double [[SQRT1]], double [[SQRT1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[MUL]]
 ;
   %sqrt = call double @llvm.experimental.constrained.sqrt.f64(double %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -546,8 +570,8 @@ define double @sqrt_squared_not_fast_enough2(double %f) #0 {
 
 define double @sqrt_squared_not_fast_enough3(double %f) #0 {
 ; CHECK-LABEL: @sqrt_squared_not_fast_enough3(
-; CHECK-NEXT:    [[SQRT:%.*]] = call double @llvm.experimental.constrained.sqrt.f64(double [[F:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nsz double @llvm.experimental.constrained.fmul.f64(double [[SQRT]], double [[SQRT]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SQRT1:%.*]] = call double @llvm.sqrt.f64(double [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nsz double @llvm.fmul.f64(double [[SQRT1]], double [[SQRT1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[MUL]]
 ;
   %sqrt = call double @llvm.experimental.constrained.sqrt.f64(double %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll b/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
index a0fd8fe74bc67..39a636cafe447 100644
--- a/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
@@ -3,6 +3,7 @@
 
 define float @fdiv_constant_fold() #0 {
 ; CHECK-LABEL: @fdiv_constant_fold(
+; CHECK-NEXT:    [[F1:%.*]] = call float @llvm.fdiv.f32(float 3.000000e+00, float 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.500000e+00
 ;
   %f = call float @llvm.experimental.constrained.fdiv.f32(float 3.0, float 2.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -12,7 +13,7 @@ define float @fdiv_constant_fold() #0 {
 
 define float @fdiv_constant_fold_strict() #0 {
 ; CHECK-LABEL: @fdiv_constant_fold_strict(
-; CHECK-NEXT:    [[F:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float 3.000000e+00, float 2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[F1:%.*]] = call float @llvm.fdiv.f32(float 3.000000e+00, float 2.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float 1.500000e+00
 ;
   %f = call float @llvm.experimental.constrained.fdiv.f32(float 3.0, float 2.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -22,7 +23,7 @@ define float @fdiv_constant_fold_strict() #0 {
 
 define float @fdiv_constant_fold_strict2() #0 {
 ; CHECK-LABEL: @fdiv_constant_fold_strict2(
-; CHECK-NEXT:    [[F:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float 2.000000e+00, float 3.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[F:%.*]] = call float @llvm.fdiv.f32(float 2.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[F]]
 ;
   %f = call float @llvm.experimental.constrained.fdiv.f32(float 2.0, float 3.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -32,6 +33,7 @@ define float @fdiv_constant_fold_strict2() #0 {
 
 define float @frem_constant_fold() #0 {
 ; CHECK-LABEL: @frem_constant_fold(
+; CHECK-NEXT:    [[F1:%.*]] = call float @llvm.frem.f32(float 3.000000e+00, float 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
   %f = call float @llvm.experimental.constrained.frem.f32(float 3.0, float 2.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -40,7 +42,7 @@ define float @frem_constant_fold() #0 {
 
 define float @frem_constant_fold_strict() #0 {
 ; CHECK-LABEL: @frem_constant_fold_strict(
-; CHECK-NEXT:    [[F:%.*]] = call float @llvm.experimental.constrained.frem.f32(float 3.000000e+00, float 2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[F1:%.*]] = call float @llvm.frem.f32(float 3.000000e+00, float 2.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
   %f = call float @llvm.experimental.constrained.frem.f32(float 3.0, float 2.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -49,8 +51,8 @@ define float @frem_constant_fold_strict() #0 {
 
 define double @fmul_fdiv_common_operand(double %x, double %y) #0 {
 ; CHECK-LABEL: @fmul_fdiv_common_operand(
-; CHECK-NEXT:    [[M:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[X:%.*]], double [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[D:%.*]] = call reassoc nnan double @llvm.experimental.constrained.fdiv.f64(double [[M]], double [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[M1:%.*]] = call double @llvm.fmul.f64(double [[X:%.*]], double [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[D:%.*]] = call reassoc nnan double @llvm.fdiv.f64(double [[M1]], double [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[D]]
 ;
   %m = call double @llvm.experimental.constrained.fmul.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -62,8 +64,8 @@ define double @fmul_fdiv_common_operand(double %x, double %y) #0 {
 
 define double @fmul_fdiv_common_operand_too_strict(double %x, double %y) #0 {
 ; CHECK-LABEL: @fmul_fdiv_common_operand_too_strict(
-; CHECK-NEXT:    [[M:%.*]] = call fast double @llvm.experimental.constrained.fmul.f64(double [[X:%.*]], double [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[D:%.*]] = call reassoc double @llvm.experimental.constrained.fdiv.f64(double [[M]], double [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[M1:%.*]] = call fast double @llvm.fmul.f64(double [[X:%.*]], double [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[D:%.*]] = call reassoc double @llvm.fdiv.f64(double [[M1]], double [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double [[D]]
 ;
   %m = call fast double @llvm.experimental.constrained.fmul.f64(double %x, double %y, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -75,8 +77,8 @@ define double @fmul_fdiv_common_operand_too_strict(double %x, double %y) #0 {
 
 define <2 x float> @fmul_fdiv_common_operand_commute_vec(<2 x float> %x, <2 x float> %y) #0 {
 ; CHECK-LABEL: @fmul_fdiv_common_operand_commute_vec(
-; CHECK-NEXT:    [[M:%.*]] = call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> [[Y:%.*]], <2 x float> [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[D:%.*]] = call fast <2 x float> @llvm.experimental.constrained.fdiv.v2f32(<2 x float> [[M]], <2 x float> [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[M1:%.*]] = call <2 x float> @llvm.fmul.v2f32(<2 x float> [[Y:%.*]], <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[D:%.*]] = call fast <2 x float> @llvm.fdiv.v2f32(<2 x float> [[M1]], <2 x float> [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[D]]
 ;
   %m = call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> %y, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll b/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
index 9a078a8f569da..e4f7b8e9f59d0 100644
--- a/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
@@ -4,7 +4,7 @@
 ; fneg (fsub -0.0, X) ==> X
 define float @fsub_-0_x(float %a) #0 {
 ; CHECK-LABEL: @fsub_-0_x(
-; CHECK-NEXT:    [[T1:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T1:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[RET:%.*]] = fneg float [[T1]]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
@@ -15,7 +15,7 @@ define float @fsub_-0_x(float %a) #0 {
 
 define <2 x float> @fsub_-0_x_vec(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fsub_-0_x_vec(
-; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[RET:%.*]] = fneg <2 x float> [[T1]]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
@@ -26,7 +26,7 @@ define <2 x float> @fsub_-0_x_vec(<2 x float> %a) #0 {
 
 define <2 x float> @fsub_-0_x_vec_poison_elts(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fsub_-0_x_vec_poison_elts(
-; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float -0.000000e+00, float poison>, <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float -0.000000e+00, float poison>, <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[RET:%.*]] = fneg <2 x float> [[T1]]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
@@ -37,7 +37,7 @@ define <2 x float> @fsub_-0_x_vec_poison_elts(<2 x float> %a) #0 {
 
 define <2 x float> @fsub_negzero_vec_poison_elts(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fsub_negzero_vec_poison_elts(
-; CHECK-NEXT:    [[R:%.*]] = call nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call nsz <2 x float> @llvm.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[R]]
 ;
   %r = call nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float poison, float -0.0>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -47,8 +47,8 @@ define <2 x float> @fsub_negzero_vec_poison_elts(<2 x float> %x) #0 {
 ; fsub -0.0, (fsub -0.0, X) ==> X
 define float @fsub_-0_-0_x(float %a) #0 {
 ; CHECK-LABEL: @fsub_-0_-0_x(
-; CHECK-NEXT:    [[T1:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %t1 = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -59,7 +59,9 @@ define float @fsub_-0_-0_x(float %a) #0 {
 ; fsub -0.0, (fneg X) ==> X
 define float @fneg_x(float %a) #0 {
 ; CHECK-LABEL: @fneg_x(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[T1:%.*]] = fneg float [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %t1 = fneg float %a
   %ret = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -68,8 +70,8 @@ define float @fneg_x(float %a) #0 {
 
 define <2 x float> @fsub_-0_-0_x_vec(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fsub_-0_-0_x_vec(
-; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %t1 = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float -0.0, float -0.0>, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -79,7 +81,9 @@ define <2 x float> @fsub_-0_-0_x_vec(<2 x float> %a) #0 {
 
 define <2 x float> @fneg_x_vec(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fneg_x_vec(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[T1:%.*]] = fneg <2 x float> [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %t1 = fneg <2 x float> %a
   %ret = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float -0.0, float -0.0>, <2 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -88,8 +92,8 @@ define <2 x float> @fneg_x_vec(<2 x float> %a) #0 {
 
 define <2 x float> @fsub_-0_-0_x_vec_poison_elts(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fsub_-0_-0_x_vec_poison_elts(
-; CHECK-NEXT:    [[T1:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float -0.000000e+00, float poison>, <2 x float> [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float poison, float -0.000000e+00>, <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float -0.000000e+00, float poison>, <2 x float> [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %t1 = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float poison, float -0.0>, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -99,7 +103,9 @@ define <2 x float> @fsub_-0_-0_x_vec_poison_elts(<2 x float> %a) #0 {
 
 define <2 x float> @fneg_x_vec_poison_elts(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fneg_x_vec_poison_elts(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[T1:%.*]] = fneg <2 x float> [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float -0.000000e+00, float poison>, <2 x float> [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %t1 = fneg <2 x float> %a
   %ret = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float -0.0, float poison>, <2 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -109,8 +115,8 @@ define <2 x float> @fneg_x_vec_poison_elts(<2 x float> %a) #0 {
 ; fsub -0.0, (fsub 0.0, X) != X
 define float @fsub_-0_0_x(float %a) #0 {
 ; CHECK-LABEL: @fsub_-0_0_x(
-; CHECK-NEXT:    [[T1:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %t1 = call float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -121,8 +127,8 @@ define float @fsub_-0_0_x(float %a) #0 {
 ; fsub 0.0, (fsub -0.0, X) != X
 define float @fsub_0_-0_x(float %a) #0 {
 ; CHECK-LABEL: @fsub_0_-0_x(
-; CHECK-NEXT:    [[T1:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[T1]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[T11:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float 0.000000e+00, float [[T11]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %t1 = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -133,7 +139,8 @@ define float @fsub_0_-0_x(float %a) #0 {
 ; fsub X, 0 ==> X
 define float @fsub_x_0(float %x) #0 {
 ; CHECK-LABEL: @fsub_x_0(
-; CHECK-NEXT:    ret float [[X:%.*]]
+; CHECK-NEXT:    [[X:%.*]] = call float @llvm.fsub.f32(float [[X1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[X]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -141,7 +148,8 @@ define float @fsub_x_0(float %x) #0 {
 
 define <2 x float> @fsub_x_0_vec_poison(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fsub_x_0_vec_poison(
-; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
+; CHECK-NEXT:    [[X:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> [[X1:%.*]], <2 x float> <float poison, float 0.000000e+00>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[X]]
 ;
   %r = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> %x, <2 x float><float poison, float 0.0>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %r
@@ -150,7 +158,8 @@ define <2 x float> @fsub_x_0_vec_poison(<2 x float> %x) #0 {
 ; fadd X, -0 ==> X
 define float @fadd_x_n0(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %ret
@@ -158,7 +167,8 @@ define float @fadd_x_n0(float %a) #0 {
 
 define <2 x float> @fadd_x_n0_vec_poison_elt(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_vec_poison_elt(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> <float -0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> <float -0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %ret
@@ -167,7 +177,7 @@ define <2 x float> @fadd_x_n0_vec_poison_elt(<2 x float> %a) #0 {
 ; fadd X, 0 ==> X
 define float @fadd_x_p0(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_p0(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -176,7 +186,7 @@ define float @fadd_x_p0(float %a) #0 {
 
 define <2 x float> @fadd_x_p0_vec_poison_elt(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_x_p0_vec_poison_elt(
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> <float 0.000000e+00, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> <float 0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> <float 0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -186,7 +196,8 @@ define <2 x float> @fadd_x_p0_vec_poison_elt(<2 x float> %a) #0 {
 ; fmul X, 1.0 ==> X
 define double @fmul_X_1(double %a) #0 {
 ; CHECK-LABEL: @fmul_X_1(
-; CHECK-NEXT:    ret double [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call double @llvm.fmul.f64(double 1.000000e+00, double [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret double [[A]]
 ;
   %b = call double @llvm.experimental.constrained.fmul.f64(double 1.0, double %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %b
@@ -195,7 +206,8 @@ define double @fmul_X_1(double %a) #0 {
 ; Originally PR2642
 define <4 x float> @fmul_X_1_vec(<4 x float> %x) #0 {
 ; CHECK-LABEL: @fmul_X_1_vec(
-; CHECK-NEXT:    ret <4 x float> [[X:%.*]]
+; CHECK-NEXT:    [[X:%.*]] = call <4 x float> @llvm.fmul.v4f32(<4 x float> [[X1:%.*]], <4 x float> splat (float 1.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <4 x float> [[X]]
 ;
   %m = call <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float> %x, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <4 x float> %m
@@ -204,7 +216,8 @@ define <4 x float> @fmul_X_1_vec(<4 x float> %x) #0 {
 ; fdiv X, 1.0 ==> X
 define float @fdiv_x_1(float %a) #0 {
 ; CHECK-LABEL: @fdiv_x_1(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fdiv.f32(float [[A1:%.*]], float 1.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call float @llvm.experimental.constrained.fdiv.f32(float %a, float 1.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
@@ -216,8 +229,8 @@ define float @fdiv_x_1(float %a) #0 {
 ; an arbitrary sign bit.
 define float @fabs_sqrt(float %a) #0 {
 ; CHECK-LABEL: @fabs_sqrt(
-; CHECK-NEXT:    [[SQRT:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT]]) #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[SQRT1:%.*]] = call float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT1]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -228,8 +241,8 @@ define float @fabs_sqrt(float %a) #0 {
 ; The fabs can't be eliminated because the nnan sqrt may still return -0.
 define float @fabs_sqrt_nnan(float %a) #0 {
 ; CHECK-LABEL: @fabs_sqrt_nnan(
-; CHECK-NEXT:    [[SQRT:%.*]] = call nnan float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT]]) #[[ATTR0]]
+; CHECK-NEXT:    [[SQRT1:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %sqrt = call nnan float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -240,8 +253,8 @@ define float @fabs_sqrt_nnan(float %a) #0 {
 ; The fabs can't be eliminated because the nsz sqrt may still return NaN.
 define float @fabs_sqrt_nsz(float %a) #0 {
 ; CHECK-LABEL: @fabs_sqrt_nsz(
-; CHECK-NEXT:    [[SQRT:%.*]] = call nsz float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT]]) #[[ATTR0]]
+; CHECK-NEXT:    [[SQRT1:%.*]] = call nsz float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %sqrt = call nsz float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -251,7 +264,8 @@ define float @fabs_sqrt_nsz(float %a) #0 {
 
 define float @fabs_sqrt_nnan_nsz(float %a) #0 {
 ; CHECK-LABEL: @fabs_sqrt_nnan_nsz(
-; CHECK-NEXT:    [[SQRT:%.*]] = call nnan nsz float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SQRT:%.*]] = call nnan nsz float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[SQRT]]
 ;
   %sqrt = call nnan nsz float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -262,7 +276,8 @@ define float @fabs_sqrt_nnan_nsz(float %a) #0 {
 define float @fabs_sqrt_nnan_fabs(float %a) #0 {
 ; CHECK-LABEL: @fabs_sqrt_nnan_fabs(
 ; CHECK-NEXT:    [[B:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
-; CHECK-NEXT:    [[SQRT:%.*]] = call nnan float @llvm.experimental.constrained.sqrt.f32(float [[B]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[SQRT:%.*]] = call nnan float @llvm.sqrt.f32(float [[B]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[SQRT]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[SQRT]]
 ;
   %b = call float @llvm.fabs.f32(float %a) #0
@@ -275,8 +290,8 @@ define float @fabs_sqrt_nnan_fabs(float %a) #0 {
 
 define float @fsub_fsub_common_op(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_common_op(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[S]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[S1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -286,8 +301,8 @@ define float @fsub_fsub_common_op(float %x, float %y) #0 {
 
 define <2 x float> @fsub_fsub_common_op_vec(<2 x float> %x, <2 x float> %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_common_op_vec(
-; CHECK-NEXT:    [[S:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> [[Y:%.*]], <2 x float> [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> [[Y]], <2 x float> [[S]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S1:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> [[Y:%.*]], <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz <2 x float> @llvm.fsub.v2f32(<2 x float> [[Y]], <2 x float> [[S1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[R]]
 ;
   %s = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> %y, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -300,8 +315,8 @@ define <2 x float> @fsub_fsub_common_op_vec(<2 x float> %x, <2 x float> %y) #0 {
 
 define float @fsub_fsub_wrong_common_op(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_wrong_common_op(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[S]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[S1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -314,8 +329,8 @@ define float @fsub_fsub_wrong_common_op(float %x, float %y) #0 {
 
 define float @fsub_fsub_common_op_wrong_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_common_op_wrong_commute(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[S]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[S1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -328,8 +343,8 @@ define float @fsub_fsub_common_op_wrong_commute(float %x, float %y) #0 {
 
 define float @fsub_fsub_wrong_common_op_wrong_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_wrong_common_op_wrong_commute(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[S]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[S1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -341,8 +356,8 @@ define float @fsub_fsub_wrong_common_op_wrong_commute(float %x, float %y) #0 {
 
 define float @fadd_fsub_common_op(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_common_op(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[A]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[A1:%.*]] = call float @llvm.fadd.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[A1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %a = call float @llvm.experimental.constrained.fadd.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -354,8 +369,8 @@ define float @fadd_fsub_common_op(float %x, float %y) #0 {
 
 define <2 x float> @fadd_fsub_common_op_commute_vec(<2 x float> %x, <2 x float> %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_common_op_commute_vec(
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[X:%.*]], <2 x float> [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> [[A]], <2 x float> [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[A1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[X:%.*]], <2 x float> [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz <2 x float> @llvm.fsub.v2f32(<2 x float> [[A1]], <2 x float> [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[R]]
 ;
   %a = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -368,8 +383,8 @@ define <2 x float> @fadd_fsub_common_op_commute_vec(<2 x float> %x, <2 x float>
 
 define float @fadd_fsub_common_op_wrong_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_common_op_wrong_commute(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[A1:%.*]] = call float @llvm.fadd.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[A1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %a = call float @llvm.experimental.constrained.fadd.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -382,8 +397,8 @@ define float @fadd_fsub_common_op_wrong_commute(float %x, float %y) #0 {
 
 define float @fadd_fsub_common_op_wrong_commute_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_common_op_wrong_commute_commute(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[A1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[A1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %a = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -395,8 +410,8 @@ define float @fadd_fsub_common_op_wrong_commute_commute(float %x, float %y) #0 {
 
 define <2 x float> @fsub_fadd_common_op_vec(<2 x float> %x, <2 x float> %y) #0 {
 ; CHECK-LABEL: @fsub_fadd_common_op_vec(
-; CHECK-NEXT:    [[S:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> [[X:%.*]], <2 x float> [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[Y]], <2 x float> [[S]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S2:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> [[X:%.*]], <2 x float> [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[Y]], <2 x float> [[S2]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[R]]
 ;
   %s = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> %x, <2 x float> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -408,8 +423,8 @@ define <2 x float> @fsub_fadd_common_op_vec(<2 x float> %x, <2 x float> %y) #0 {
 
 define float @fsub_fadd_common_op_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fadd_common_op_commute(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fadd.f32(float [[S]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S2:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fadd.f32(float [[S2]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -422,8 +437,8 @@ define float @fsub_fadd_common_op_commute(float %x, float %y) #0 {
 
 define float @fsub_fadd_common_op_wrong_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fadd_common_op_wrong_commute(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fadd.f32(float [[Y]], float [[S]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S2:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fadd.f32(float [[Y]], float [[S2]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -436,8 +451,8 @@ define float @fsub_fadd_common_op_wrong_commute(float %x, float %y) #0 {
 
 define float @fsub_fadd_common_op_wrong_commute_commute(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fadd_common_op_wrong_commute_commute(
-; CHECK-NEXT:    [[S:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fadd.f32(float [[S]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[S2:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[R:%.*]] = call reassoc nsz float @llvm.fadd.f32(float [[S2]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %s = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -449,8 +464,8 @@ define float @fsub_fadd_common_op_wrong_commute_commute(float %x, float %y) #0 {
 
 define float @maxnum_with_poszero_op(float %a) #0 {
 ; CHECK-LABEL: @maxnum_with_poszero_op(
-; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[A:%.*]], float 0.000000e+00, metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) #[[ATTR0]]
+; CHECK-NEXT:    [[MAX1:%.*]] = call float @llvm.maxnum.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %max = call float @llvm.experimental.constrained.maxnum.f32(float %a, float 0.0, metadata !"fpexcept.ignore")
@@ -460,9 +475,9 @@ define float @maxnum_with_poszero_op(float %a) #0 {
 
 define float @maxnum_with_poszero_op_commute(float %a) #0 {
 ; CHECK-LABEL: @maxnum_with_poszero_op_commute(
-; CHECK-NEXT:    [[SQRT:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float 0.000000e+00, float [[SQRT]], metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) #[[ATTR0]]
+; CHECK-NEXT:    [[SQRT2:%.*]] = call float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[MAX1:%.*]] = call float @llvm.maxnum.f32(float 0.000000e+00, float [[SQRT2]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %sqrt = call float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -473,10 +488,10 @@ define float @maxnum_with_poszero_op_commute(float %a) #0 {
 
 define float @maxnum_with_negzero_op(float %a) #0 {
 ; CHECK-LABEL: @maxnum_with_negzero_op(
-; CHECK-NEXT:    [[NNAN:%.*]] = call nnan float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN]]) #[[ATTR0]]
-; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float -0.000000e+00, float [[FABSA]], metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) #[[ATTR0]]
+; CHECK-NEXT:    [[NNAN2:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN2]]) #[[ATTR0]]
+; CHECK-NEXT:    [[MAX1:%.*]] = call float @llvm.maxnum.f32(float -0.000000e+00, float [[FABSA]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %nnan = call nnan float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -488,10 +503,10 @@ define float @maxnum_with_negzero_op(float %a) #0 {
 
 define float @maxnum_with_negzero_op_commute(float %a) #0 {
 ; CHECK-LABEL: @maxnum_with_negzero_op_commute(
-; CHECK-NEXT:    [[NNAN:%.*]] = call nnan float @llvm.experimental.constrained.sqrt.f32(float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN]]) #[[ATTR0]]
-; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[FABSA]], float -0.000000e+00, metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) #[[ATTR0]]
+; CHECK-NEXT:    [[NNAN2:%.*]] = call nnan float @llvm.sqrt.f32(float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABSA:%.*]] = call float @llvm.fabs.f32(float [[NNAN2]]) #[[ATTR0]]
+; CHECK-NEXT:    [[MAX1:%.*]] = call float @llvm.maxnum.f32(float [[FABSA]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX1]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %nnan = call nnan float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -505,8 +520,8 @@ define float @maxnum_with_negzero_op_commute(float %a) #0 {
 
 define float @maxnum_with_pos_one_op(float %a) #0 {
 ; CHECK-LABEL: @maxnum_with_pos_one_op(
-; CHECK-NEXT:    [[MAX:%.*]] = call float @llvm.experimental.constrained.maxnum.f32(float [[A:%.*]], float 1.000000e+00, metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[MAX]]) #[[ATTR0]]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.maxnum.f32(float [[A:%.*]], float 1.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS1:%.*]] = call float @llvm.fabs.f32(float [[FABS]]) #[[ATTR0]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %max = call float @llvm.experimental.constrained.maxnum.f32(float %a, float 1.0, metadata !"fpexcept.ignore")
@@ -531,4 +546,42 @@ declare float @llvm.experimental.constrained.fdiv.f32(float, float, metadata, me
 declare float @llvm.experimental.constrained.maxnum.f32(float, float, metadata)
 declare float @llvm.experimental.constrained.sqrt.f32(float, metadata, metadata)
 
+; Tests for the new FP intrinsic form with explicit fp.control/fp.except operand
+; bundles.  InstSimplify has no folding rules for these calls (no
+; Intrinsic::fadd case), so they pass through unchanged.  These functions
+; verify that the IR is valid and that InstSimplify correctly preserves calls
+; with non-default FP environment overrides.
+
+define float @new_fadd_neg0_ignore(float %a) #0 {
+; CHECK-LABEL: @new_fadd_neg0_ignore(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.fadd.f32(float %a, float -0.0)
+  [ "fp.except"(metadata !"ignore") ]
+  ret float %r
+}
+
+define float @new_fadd_neg0_rtz(float %a) #0 {
+; CHECK-LABEL: @new_fadd_neg0_rtz(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rtz") ]
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.fadd.f32(float %a, float -0.0)
+  [ "fp.control"(metadata !"rtz") ]
+  ret float %r
+}
+
+define float @new_fadd_neg0_rtz_ignore(float %a) #0 {
+; CHECK-LABEL: @new_fadd_neg0_rtz_ignore(
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R]]
+;
+  %r = call float @llvm.fadd.f32(float %a, float -0.0)
+  [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+  ret float %r
+}
+
+declare float @llvm.fadd.f32(float, float)
+
 attributes #0 = { strictfp }
diff --git a/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll b/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
index 801fd75a24a71..62d1de5bd7c03 100644
--- a/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
@@ -9,8 +9,7 @@
 
 define float @fadd_undef_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -18,7 +17,7 @@ define float @fadd_undef_op0_strict(float %x) #0 {
 
 define float @fadd_undef_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -27,7 +26,7 @@ define float @fadd_undef_op0_maytrap(float %x) #0 {
 
 define float @fadd_undef_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float undef, float [[X:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -36,7 +35,8 @@ define float @fadd_undef_op0_upward(float %x) #0 {
 
 define float @fadd_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -44,7 +44,6 @@ define float @fadd_undef_op0_defaultfp(float %x) #0 {
 
 define float @fadd_poison_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float poison, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -53,7 +52,8 @@ define float @fadd_poison_op0_strict(float %x) #0 {
 
 define float @fadd_poison_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op0_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -61,7 +61,8 @@ define float @fadd_poison_op0_maytrap(float %x) #0 {
 
 define float @fadd_poison_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op0_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -69,7 +70,8 @@ define float @fadd_poison_op0_upward(float %x) #0 {
 
 define float @fadd_poison_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op0_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -77,8 +79,7 @@ define float @fadd_poison_op0_defaultfp(float %x) #0 {
 
 define float @fadd_undef_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -86,7 +87,7 @@ define float @fadd_undef_op1_strict(float %x) #0 {
 
 define float @fadd_undef_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -95,7 +96,7 @@ define float @fadd_undef_op1_maytrap(float %x) #0 {
 
 define float @fadd_undef_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -104,7 +105,8 @@ define float @fadd_undef_op1_upward(float %x) #0 {
 
 define float @fadd_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_undef_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -112,7 +114,6 @@ define float @fadd_undef_op1_defaultfp(float %x) #0 {
 
 define float @fadd_poison_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -121,7 +122,8 @@ define float @fadd_poison_op1_strict(float %x) #0 {
 
 define float @fadd_poison_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op1_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -129,7 +131,8 @@ define float @fadd_poison_op1_maytrap(float %x) #0 {
 
 define float @fadd_poison_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op1_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -137,7 +140,8 @@ define float @fadd_poison_op1_upward(float %x) #0 {
 
 define float @fadd_poison_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_poison_op1_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -149,8 +153,7 @@ define float @fadd_poison_op1_defaultfp(float %x) #0 {
 
 define float @fsub_undef_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -158,7 +161,7 @@ define float @fsub_undef_op0_strict(float %x) #0 {
 
 define float @fsub_undef_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -167,7 +170,7 @@ define float @fsub_undef_op0_maytrap(float %x) #0 {
 
 define float @fsub_undef_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float undef, float [[X:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -176,7 +179,8 @@ define float @fsub_undef_op0_upward(float %x) #0 {
 
 define float @fsub_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -184,7 +188,6 @@ define float @fsub_undef_op0_defaultfp(float %x) #0 {
 
 define float @fsub_poison_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float poison, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -193,7 +196,8 @@ define float @fsub_poison_op0_strict(float %x) #0 {
 
 define float @fsub_poison_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op0_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -201,7 +205,8 @@ define float @fsub_poison_op0_maytrap(float %x) #0 {
 
 define float @fsub_poison_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op0_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -209,7 +214,8 @@ define float @fsub_poison_op0_upward(float %x) #0 {
 
 define float @fsub_poison_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op0_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -217,8 +223,7 @@ define float @fsub_poison_op0_defaultfp(float %x) #0 {
 
 define float @fsub_undef_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -226,7 +231,7 @@ define float @fsub_undef_op1_strict(float %x) #0 {
 
 define float @fsub_undef_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -235,7 +240,7 @@ define float @fsub_undef_op1_maytrap(float %x) #0 {
 
 define float @fsub_undef_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -244,7 +249,8 @@ define float @fsub_undef_op1_upward(float %x) #0 {
 
 define float @fsub_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_undef_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -252,7 +258,6 @@ define float @fsub_undef_op1_defaultfp(float %x) #0 {
 
 define float @fsub_poison_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[X:%.*]], float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -261,7 +266,8 @@ define float @fsub_poison_op1_strict(float %x) #0 {
 
 define float @fsub_poison_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op1_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -269,7 +275,8 @@ define float @fsub_poison_op1_maytrap(float %x) #0 {
 
 define float @fsub_poison_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op1_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -277,7 +284,8 @@ define float @fsub_poison_op1_upward(float %x) #0 {
 
 define float @fsub_poison_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_poison_op1_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -289,8 +297,7 @@ define float @fsub_poison_op1_defaultfp(float %x) #0 {
 
 define float @fmul_undef_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -298,7 +305,7 @@ define float @fmul_undef_op0_strict(float %x) #0 {
 
 define float @fmul_undef_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -307,7 +314,7 @@ define float @fmul_undef_op0_maytrap(float %x) #0 {
 
 define float @fmul_undef_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float undef, float [[X:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -316,7 +323,8 @@ define float @fmul_undef_op0_upward(float %x) #0 {
 
 define float @fmul_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -324,16 +332,16 @@ define float @fmul_undef_op0_defaultfp(float %x) #0 {
 
 define float @fmul_poison_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float poison, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
 }
 
-define float @fmul_poison_op0_maytrap(float %x) #0 { 
+define float @fmul_poison_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op0_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -341,7 +349,8 @@ define float @fmul_poison_op0_maytrap(float %x) #0 {
 
 define float @fmul_poison_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op0_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -349,7 +358,8 @@ define float @fmul_poison_op0_upward(float %x) #0 {
 
 define float @fmul_poison_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op0_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -357,8 +367,7 @@ define float @fmul_poison_op0_defaultfp(float %x) #0 {
 
 define float @fmul_undef_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -366,7 +375,7 @@ define float @fmul_undef_op1_strict(float %x) #0 {
 
 define float @fmul_undef_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -375,7 +384,7 @@ define float @fmul_undef_op1_maytrap(float %x) #0 {
 
 define float @fmul_undef_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[X:%.*]], float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -384,7 +393,8 @@ define float @fmul_undef_op1_upward(float %x) #0 {
 
 define float @fmul_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_undef_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -392,7 +402,6 @@ define float @fmul_undef_op1_defaultfp(float %x) #0 {
 
 define float @fmul_poison_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[X:%.*]], float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -401,7 +410,8 @@ define float @fmul_poison_op1_strict(float %x) #0 {
 
 define float @fmul_poison_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op1_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -409,7 +419,8 @@ define float @fmul_poison_op1_maytrap(float %x) #0 {
 
 define float @fmul_poison_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op1_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -417,7 +428,8 @@ define float @fmul_poison_op1_upward(float %x) #0 {
 
 define float @fmul_poison_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_poison_op1_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -429,8 +441,7 @@ define float @fmul_poison_op1_defaultfp(float %x) #0 {
 
 define float @fdiv_undef_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -438,7 +449,7 @@ define float @fdiv_undef_op0_strict(float %x) #0 {
 
 define float @fdiv_undef_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -447,7 +458,7 @@ define float @fdiv_undef_op0_maytrap(float %x) #0 {
 
 define float @fdiv_undef_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float undef, float [[X:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -456,7 +467,8 @@ define float @fdiv_undef_op0_upward(float %x) #0 {
 
 define float @fdiv_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -464,7 +476,6 @@ define float @fdiv_undef_op0_defaultfp(float %x) #0 {
 
 define float @fdiv_poison_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float poison, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -473,7 +484,8 @@ define float @fdiv_poison_op0_strict(float %x) #0 {
 
 define float @fdiv_poison_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op0_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -481,7 +493,8 @@ define float @fdiv_poison_op0_maytrap(float %x) #0 {
 
 define float @fdiv_poison_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op0_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -489,7 +502,8 @@ define float @fdiv_poison_op0_upward(float %x) #0 {
 
 define float @fdiv_poison_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op0_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -497,8 +511,7 @@ define float @fdiv_poison_op0_defaultfp(float %x) #0 {
 
 define float @fdiv_undef_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -506,7 +519,7 @@ define float @fdiv_undef_op1_strict(float %x) #0 {
 
 define float @fdiv_undef_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -515,7 +528,7 @@ define float @fdiv_undef_op1_maytrap(float %x) #0 {
 
 define float @fdiv_undef_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[X:%.*]], float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -524,7 +537,8 @@ define float @fdiv_undef_op1_upward(float %x) #0 {
 
 define float @fdiv_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_undef_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -532,7 +546,6 @@ define float @fdiv_undef_op1_defaultfp(float %x) #0 {
 
 define float @fdiv_poison_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[X:%.*]], float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -541,7 +554,8 @@ define float @fdiv_poison_op1_strict(float %x) #0 {
 
 define float @fdiv_poison_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op1_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -549,7 +563,8 @@ define float @fdiv_poison_op1_maytrap(float %x) #0 {
 
 define float @fdiv_poison_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op1_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -557,7 +572,8 @@ define float @fdiv_poison_op1_upward(float %x) #0 {
 
 define float @fdiv_poison_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_poison_op1_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -569,8 +585,7 @@ define float @fdiv_poison_op1_defaultfp(float %x) #0 {
 
 define float @frem_undef_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -578,7 +593,7 @@ define float @frem_undef_op0_strict(float %x) #0 {
 
 define float @frem_undef_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float undef, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -587,7 +602,7 @@ define float @frem_undef_op0_maytrap(float %x) #0 {
 
 define float @frem_undef_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float undef, float [[X:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -596,7 +611,8 @@ define float @frem_undef_op0_upward(float %x) #0 {
 
 define float @frem_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op0_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -604,7 +620,6 @@ define float @frem_undef_op0_defaultfp(float %x) #0 {
 
 define float @frem_poison_op0_strict(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float poison, float [[X:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -613,7 +628,8 @@ define float @frem_poison_op0_strict(float %x) #0 {
 
 define float @frem_poison_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op0_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -621,7 +637,8 @@ define float @frem_poison_op0_maytrap(float %x) #0 {
 
 define float @frem_poison_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op0_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -629,7 +646,8 @@ define float @frem_poison_op0_upward(float %x) #0 {
 
 define float @frem_poison_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op0_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -637,8 +655,7 @@ define float @frem_poison_op0_defaultfp(float %x) #0 {
 
 define float @frem_undef_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -646,7 +663,7 @@ define float @frem_undef_op1_strict(float %x) #0 {
 
 define float @frem_undef_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float [[X:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -655,7 +672,7 @@ define float @frem_undef_op1_maytrap(float %x) #0 {
 
 define float @frem_undef_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float [[X:%.*]], float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -664,7 +681,8 @@ define float @frem_undef_op1_upward(float %x) #0 {
 
 define float @frem_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_undef_op1_defaultfp(
-; CHECK-NEXT:    ret float 0x7FF8000000000000
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -672,7 +690,6 @@ define float @frem_undef_op1_defaultfp(float %x) #0 {
 
 define float @frem_poison_op1_strict(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.frem.f32(float [[X:%.*]], float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -681,7 +698,8 @@ define float @frem_poison_op1_strict(float %x) #0 {
 
 define float @frem_poison_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op1_maytrap(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -689,7 +707,8 @@ define float @frem_poison_op1_maytrap(float %x) #0 {
 
 define float @frem_poison_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op1_upward(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -697,7 +716,8 @@ define float @frem_poison_op1_upward(float %x) #0 {
 
 define float @frem_poison_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_poison_op1_defaultfp(
-; CHECK-NEXT:    ret float poison
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[R1]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -709,8 +729,8 @@ define float @frem_poison_op1_defaultfp(float %x) #0 {
 
 define float @fma_undef_op0_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]])
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -718,8 +738,8 @@ define float @fma_undef_op0_strict(float %x, float %y) #0 {
 
 define float @fma_undef_op0_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -727,8 +747,8 @@ define float @fma_undef_op0_maytrap(float %x, float %y) #0 {
 
 define float @fma_undef_op0_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -736,6 +756,7 @@ define float @fma_undef_op0_upward(float %x, float %y) #0 {
 
 define float @fma_undef_op0_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op0_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -744,7 +765,7 @@ define float @fma_undef_op0_defaultfp(float %x, float %y) #0 {
 
 define float @fma_poison_op0_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op0_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]])
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -753,6 +774,7 @@ define float @fma_poison_op0_strict(float %x, float %y) #0 {
 
 define float @fma_poison_op0_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op0_maytrap(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -761,6 +783,7 @@ define float @fma_poison_op0_maytrap(float %x, float %y) #0 {
 
 define float @fma_poison_op0_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op0_upward(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -769,6 +792,7 @@ define float @fma_poison_op0_upward(float %x, float %y) #0 {
 
 define float @fma_poison_op0_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op0_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -777,8 +801,8 @@ define float @fma_poison_op0_defaultfp(float %x, float %y) #0 {
 
 define float @fma_undef_op1_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]])
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -786,8 +810,8 @@ define float @fma_undef_op1_strict(float %x, float %y) #0 {
 
 define float @fma_undef_op1_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -795,8 +819,8 @@ define float @fma_undef_op1_maytrap(float %x, float %y) #0 {
 
 define float @fma_undef_op1_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -804,6 +828,7 @@ define float @fma_undef_op1_upward(float %x, float %y) #0 {
 
 define float @fma_undef_op1_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op1_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -812,7 +837,7 @@ define float @fma_undef_op1_defaultfp(float %x, float %y) #0 {
 
 define float @fma_poison_op1_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op1_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]])
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -821,6 +846,7 @@ define float @fma_poison_op1_strict(float %x, float %y) #0 {
 
 define float @fma_poison_op1_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op1_maytrap(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -829,6 +855,7 @@ define float @fma_poison_op1_maytrap(float %x, float %y) #0 {
 
 define float @fma_poison_op1_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op1_upward(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -837,6 +864,7 @@ define float @fma_poison_op1_upward(float %x, float %y) #0 {
 
 define float @fma_poison_op1_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op1_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -845,8 +873,8 @@ define float @fma_poison_op1_defaultfp(float %x, float %y) #0 {
 
 define float @fma_undef_op2_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op2_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef)
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
   ret float %r
@@ -854,8 +882,8 @@ define float @fma_undef_op2_strict(float %x, float %y) #0 {
 
 define float @fma_undef_op2_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op2_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -863,8 +891,8 @@ define float @fma_undef_op2_maytrap(float %x, float %y) #0 {
 
 define float @fma_undef_op2_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op2_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -872,6 +900,7 @@ define float @fma_undef_op2_upward(float %x, float %y) #0 {
 
 define float @fma_undef_op2_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_undef_op2_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -880,7 +909,7 @@ define float @fma_undef_op2_defaultfp(float %x, float %y) #0 {
 
 define float @fma_poison_op2_strict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op2_strict(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison)
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -889,6 +918,7 @@ define float @fma_poison_op2_strict(float %x, float %y) #0 {
 
 define float @fma_poison_op2_maytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op2_maytrap(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -897,6 +927,7 @@ define float @fma_poison_op2_maytrap(float %x, float %y) #0 {
 
 define float @fma_poison_op2_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op2_upward(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -905,6 +936,7 @@ define float @fma_poison_op2_upward(float %x, float %y) #0 {
 
 define float @fma_poison_op2_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_poison_op2_defaultfp(
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Transforms/InstSimplify/ldexp.ll b/llvm/test/Transforms/InstSimplify/ldexp.ll
index d39f6a1e49673..ad67cbbbb49d7 100644
--- a/llvm/test/Transforms/InstSimplify/ldexp.ll
+++ b/llvm/test/Transforms/InstSimplify/ldexp.ll
@@ -142,14 +142,15 @@ define void @ldexp_f32_val_nan(i32 %y) {
 
 define void @ldexp_f32_val_nan_strictfp_maytrap(i32 %y) #0 {
 ; CHECK-LABEL: @ldexp_f32_val_nan_strictfp_maytrap(
-; CHECK-NEXT:    [[PLUS_QNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0001000000000, i32 [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    store volatile float [[PLUS_QNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NEG_QNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0xFFF0000100000000, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[NEG_QNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[PLUS_SNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0000020000000, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[PLUS_SNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NEG_SNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[NEG_SNAN]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[PLUS_QNAN1:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0001000000000, i32 [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0x7FF8001000000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NEG_QNAN2:%.*]] = call float @llvm.ldexp.f32.i32(float 0xFFF0000100000000, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0xFFF8000100000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[PLUS_SNAN3:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0000020000000, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NEG_SNAN4:%.*]] = call float @llvm.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0xFFFFFFFFE0000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0x7FF8000000000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -174,15 +175,15 @@ define void @ldexp_f32_val_nan_strictfp_maytrap(i32 %y) #0 {
 
 define void @ldexp_f32_val_nan_strictfp_strict(i32 %y) #0 {
 ; CHECK-LABEL: @ldexp_f32_val_nan_strictfp_strict(
-; CHECK-NEXT:    [[PLUS_QNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0001000000000, i32 [[Y:%.*]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[PLUS_QNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NEG_QNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0xFFF0000100000000, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[NEG_QNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[PLUS_SNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0000020000000, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[PLUS_SNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NEG_SNAN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[NEG_SNAN]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNDEF:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float undef, i32 [[Y]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[PLUS_QNAN1:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0001000000000, i32 [[Y:%.*]])
+; CHECK-NEXT:    store volatile float 0x7FF8001000000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NEG_QNAN2:%.*]] = call float @llvm.ldexp.f32.i32(float 0xFFF0000100000000, i32 [[Y]])
+; CHECK-NEXT:    store volatile float 0xFFF8000100000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[PLUS_SNAN3:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0000020000000, i32 [[Y]])
+; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NEG_SNAN4:%.*]] = call float @llvm.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 [[Y]])
+; CHECK-NEXT:    store volatile float 0xFFFFFFFFE0000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 [[Y]])
 ; CHECK-NEXT:    store volatile float 0x7FF8000000000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -241,12 +242,17 @@ define void @ldexp_f32_0() {
 
 define void @ldexp_f32_undef_strictfp(float %x, i32 %y) #0 {
 ; CHECK-LABEL: @ldexp_f32_undef_strictfp(
-; CHECK-NEXT:    [[UNDEF_EXP:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float [[X:%.*]], i32 undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[UNDEF_EXP1:%.*]] = call float @llvm.ldexp.f32.i32(float [[UNDEF_EXP:%.*]], i32 undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float [[UNDEF_EXP]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    store volatile float [[X]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[POISON_EXP2:%.*]] = call float @llvm.ldexp.f32.i32(float [[UNDEF_EXP]], i32 poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float [[UNDEF_EXP]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[UNDEF_VAL3:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0x7FF8000000000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[POISON_VAL4:%.*]] = call float @llvm.ldexp.f32.i32(float poison, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float poison, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[POISON_UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float poison, i32 undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float poison, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[UNDEF_POISON6:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 poison) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float undef, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -268,17 +274,20 @@ define void @ldexp_f32_undef_strictfp(float %x, i32 %y) #0 {
 ; Should be able to ignore strictfp in this case
 define void @ldexp_f32_0_strictfp(float %x) #0 {
 ; CHECK-LABEL: @ldexp_f32_0_strictfp(
+; CHECK-NEXT:    [[ZERO1:%.*]] = call float @llvm.ldexp.f32.i32(float 0.000000e+00, i32 0) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0.000000e+00, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NEG_ZERO2:%.*]] = call float @llvm.ldexp.f32.i32(float -0.000000e+00, i32 0) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float -0.000000e+00, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[ONE3:%.*]] = call float @llvm.ldexp.f32.i32(float 0.000000e+00, i32 1) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0.000000e+00, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNKNOWN_ZERO:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float [[X:%.*]], i32 0, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[UNKNOWN_ZERO]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNKNOWN_UNDEF:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float [[X]], i32 undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[UNKNOWN_UNDEF]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[DENORMAL_0:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 0, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[UNKNOWN_ZERO4:%.*]] = call float @llvm.ldexp.f32.i32(float [[DENORMAL_0:%.*]], i32 0) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float [[DENORMAL_0]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[DENORMAL_1:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 1, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[DENORMAL_1]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[UNKNOWN_UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float [[DENORMAL_0]], i32 undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float [[DENORMAL_0]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[DENORMAL_06:%.*]] = call float @llvm.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 0) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0x380FFFFFC0000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[DENORMAL_17:%.*]] = call float @llvm.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 1) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0x381FFFFFC0000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
   %zero = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0.0, i32 0, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
@@ -465,16 +474,16 @@ define void @ldexp_ppcf128() {
 
 define void @constant_fold_ldexp_f32_val_strictfp(i32 %y) #0 {
 ; CHECK-LABEL: @constant_fold_ldexp_f32_val_strictfp(
-; CHECK-NEXT:    [[SNAN_MAY_TRAP:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0000020000000, i32 3, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[SNAN_MAY_TRAP]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[SNAN_MAY_NOT_TRAP:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0000020000000, i32 3, metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[SNAN_MAY_NOT_TRAP]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNKNOWN_ROUNDING:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 2.500000e+00, i32 42, metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[UNKNOWN_ROUNDING]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NORMAL:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 2.500000e+00, i32 42, metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[NORMAL]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NORMAL_DOWN:%.*]] = call float @llvm.experimental.constrained.ldexp.f32.i32(float 2.500000e+00, i32 42, metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    store volatile float [[NORMAL_DOWN]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[SNAN_MAY_TRAP1:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0000020000000, i32 3) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[SNAN_MAY_NOT_TRAP2:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0000020000000, i32 3) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[UNKNOWN_ROUNDING3:%.*]] = call float @llvm.ldexp.f32.i32(float 2.500000e+00, i32 42) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    store volatile float 0x42A4000000000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NORMAL4:%.*]] = call float @llvm.ldexp.f32.i32(float 2.500000e+00, i32 42) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    store volatile float 0x42A4000000000000, ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[NORMAL_DOWN5:%.*]] = call float @llvm.ldexp.f32.i32(float 2.500000e+00, i32 42) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    store volatile float 0x42A4000000000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
   %snan.may.trap = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7FF0000020000000, i32 3, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
diff --git a/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll b/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
index d75c00e04c4eb..d471cfad4bf96 100644
--- a/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+++ b/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
@@ -11,7 +11,8 @@
 
 define float @fadd_x_n0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %ret
@@ -19,7 +20,8 @@ define float @fadd_x_n0_defaultenv(float %a) #0 {
 
 define <2 x float> @fadd_vec_x_n0_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_defaultenv(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %ret
@@ -27,7 +29,7 @@ define <2 x float> @fadd_vec_x_n0_defaultenv(<2 x float> %a) #0 {
 
 define float @fadd_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -36,7 +38,7 @@ define float @fadd_x_n0_ebmaytrap(float %a) #0 {
 
 define <2 x float> @fadd_vec_x_n0_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00), metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -45,7 +47,7 @@ define <2 x float> @fadd_vec_x_n0_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fadd_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -54,7 +56,7 @@ define float @fadd_x_n0_ebstrict(float %a) #0 {
 
 define <2 x float> @fadd_vec_x_n0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00), metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -63,7 +65,7 @@ define <2 x float> @fadd_vec_x_n0_ebstrict(<2 x float> %a) #0 {
 
 define float @fadd_x_n0_neginf(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_neginf(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.downward", metadata !"fpexcept.ignore") #0
@@ -72,7 +74,7 @@ define float @fadd_x_n0_neginf(float %a) #0 {
 
 define <2 x float> @fadd_vec_x_n0_neginf(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_neginf(
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00), metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.downward", metadata !"fpexcept.ignore") #0
@@ -81,7 +83,7 @@ define <2 x float> @fadd_vec_x_n0_neginf(<2 x float> %a) #0 {
 
 define float @fadd_x_n0_dynamic(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_dynamic(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -90,7 +92,7 @@ define float @fadd_x_n0_dynamic(float %a) #0 {
 
 define <2 x float> @fadd_vec_x_n0_dynamic(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_dynamic(
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00), metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -101,7 +103,8 @@ define <2 x float> @fadd_vec_x_n0_dynamic(<2 x float> %a) #0 {
 ; Test one of the remaining rounding modes and the rest will be fine.
 define float @fadd_x_n0_towardzero(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_towardzero(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
   ret float %ret
@@ -111,7 +114,8 @@ define float @fadd_x_n0_towardzero(float %a) #0 {
 ; Test one of the remaining rounding modes and the rest will be fine.
 define <2 x float> @fadd_vec_x_n0_towardzero(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_towardzero(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
   ret <2 x float> %ret
@@ -119,7 +123,8 @@ define <2 x float> @fadd_vec_x_n0_towardzero(<2 x float> %a) #0 {
 
 define float @fadd_nnan_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fadd_nnan_x_n0_ebmaytrap(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret float %ret
@@ -127,7 +132,8 @@ define float @fadd_nnan_x_n0_ebmaytrap(float %a) #0 {
 
 define <2 x float> @fadd_vec_nnan_x_n0_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_nnan_x_n0_ebmaytrap(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %ret = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %ret
@@ -135,7 +141,7 @@ define <2 x float> @fadd_vec_nnan_x_n0_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fadd_nnan_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fadd_nnan_x_n0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -144,7 +150,7 @@ define float @fadd_nnan_x_n0_ebstrict(float %a) #0 {
 
 define <2 x float> @fadd_vec_nnan_x_n0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_nnan_x_n0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00), metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %ret = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -154,7 +160,7 @@ define <2 x float> @fadd_vec_nnan_x_n0_ebstrict(<2 x float> %a) #0 {
 ; Test with a fast math flag set but that flag is not "nnan".
 define float @fadd_ninf_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fadd_ninf_x_n0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call ninf float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call ninf float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call ninf float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -164,7 +170,7 @@ define float @fadd_ninf_x_n0_ebstrict(float %a) #0 {
 ; Test with a fast math flag set but that flag is not "nnan".
 define <2 x float> @fadd_vec_ninf_x_n0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_ninf_x_n0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call ninf <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00), metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call ninf <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call ninf <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -173,7 +179,8 @@ define <2 x float> @fadd_vec_ninf_x_n0_ebstrict(<2 x float> %a) #0 {
 
 define float @fadd_n0_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fadd_n0_x_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float -0.000000e+00, float [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %ret
@@ -181,7 +188,8 @@ define float @fadd_n0_x_defaultenv(float %a) #0 {
 
 define <2 x float> @fadd_vec_n0_x_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_n0_x_defaultenv(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float -0.0, float -0.0>, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %ret
@@ -190,7 +198,7 @@ define <2 x float> @fadd_vec_n0_x_defaultenv(<2 x float> %a) #0 {
 ; TODO: Canonicalize the order of the arguments. Then this will fire.
 define float @fadd_n0_x_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fadd_n0_x_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fadd.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -200,7 +208,7 @@ define float @fadd_n0_x_ebmaytrap(float %a) #0 {
 ; TODO: Canonicalize the order of the arguments. Then this will fire.
 define <2 x float> @fadd_vec_n0_x_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_n0_x_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[RET:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret <2 x float> [[RET]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float -0.0, float -0.0>, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -213,7 +221,8 @@ define <2 x float> @fadd_vec_n0_x_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_x_0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -221,7 +230,8 @@ define float @fold_fadd_nsz_x_0_defaultenv(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_x_0_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_x_0_defaultenv(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -229,7 +239,8 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_defaultenv(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_x_0_neginf(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0_neginf(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.downward", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -237,7 +248,8 @@ define float @fold_fadd_nsz_x_0_neginf(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_x_0_neginf(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_x_0_neginf(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.downward", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -245,7 +257,7 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_neginf(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_x_0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0_ebmaytrap(
-; CHECK-NEXT:    [[ADD:%.*]] = call nsz float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call nsz float @llvm.fadd.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -254,7 +266,7 @@ define float @fold_fadd_nsz_x_0_ebmaytrap(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_x_0_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_x_0_ebmaytrap(
-; CHECK-NEXT:    [[ADD:%.*]] = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret <2 x float> [[ADD]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -263,7 +275,8 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fold_fadd_nnan_nsz_x_0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nnan_nsz_x_0_ebmaytrap(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %add = call nnan nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret float %add
@@ -271,7 +284,8 @@ define float @fold_fadd_nnan_nsz_x_0_ebmaytrap(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nnan_nsz_x_0_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nnan_nsz_x_0_ebmaytrap(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %add = call nnan nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %add
@@ -279,7 +293,7 @@ define <2 x float> @fold_fadd_vec_nnan_nsz_x_0_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_x_0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call nsz float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call nsz float @llvm.fadd.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -288,7 +302,7 @@ define float @fold_fadd_nsz_x_0_ebstrict(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_x_0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_x_0_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret <2 x float> [[ADD]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -297,7 +311,7 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_ebstrict(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_nnan_x_0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_nnan_x_0_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call nnan nsz float @llvm.experimental.constrained.fadd.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[A]]
 ;
   %add = call nsz nnan float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -306,7 +320,7 @@ define float @fold_fadd_nsz_nnan_x_0_ebstrict(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_nnan_x_0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_nnan_x_0_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call nnan nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[A:%.*]], <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %add = call nsz nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -315,7 +329,8 @@ define <2 x float> @fold_fadd_vec_nsz_nnan_x_0_ebstrict(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_0_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_0_x_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float 0.000000e+00, float [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -323,7 +338,8 @@ define float @fold_fadd_nsz_0_x_defaultenv(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_0_x_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_0_x_defaultenv(
-; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[A]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -332,7 +348,7 @@ define <2 x float> @fold_fadd_vec_nsz_0_x_defaultenv(<2 x float> %a) #0 {
 ; TODO: Canonicalize the order of the arguments. Then this will fire.
 define float @fold_fadd_nsz_0_x_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_0_x_ebmaytrap(
-; CHECK-NEXT:    [[ADD:%.*]] = call nsz float @llvm.experimental.constrained.fadd.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call nsz float @llvm.fadd.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -342,7 +358,7 @@ define float @fold_fadd_nsz_0_x_ebmaytrap(float %a) #0 {
 ; TODO: Canonicalize the order of the arguments. Then this will fire.
 define <2 x float> @fold_fadd_vec_nsz_0_x_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_0_x_ebmaytrap(
-; CHECK-NEXT:    [[ADD:%.*]] = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret <2 x float> [[ADD]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -355,6 +371,7 @@ define <2 x float> @fold_fadd_vec_nsz_0_x_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fold_fadd_qnan_qnan_ebmaytrap() #0 {
 ; CHECK-LABEL: @fold_fadd_qnan_qnan_ebmaytrap(
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff8000000000000, float 0x7ff8000000000000, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -363,7 +380,7 @@ define float @fold_fadd_qnan_qnan_ebmaytrap() #0 {
 
 define float @fold_fadd_qnan_qnan_ebstrict() #0 {
 ; CHECK-LABEL: @fold_fadd_qnan_qnan_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float 0x7FF8000000000000) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff8000000000000, float 0x7ff8000000000000, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -374,7 +391,8 @@ define float @fold_fadd_qnan_qnan_ebstrict() #0 {
 
 define float @fold_fadd_snan_variable_ebignore(float %x) #0 {
 ; CHECK-LABEL: @fold_fadd_snan_variable_ebignore(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[ADD1]]
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -384,7 +402,8 @@ define float @fold_fadd_snan_variable_ebignore(float %x) #0 {
 
 define float @fold_fadd_snan_variable_ebmaytrap(float %x) #0 {
 ; CHECK-LABEL: @fold_fadd_snan_variable_ebmaytrap(
-; CHECK-NEXT:    ret float 0x7FFC000000000000
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[ADD1]]
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret float %add
@@ -394,7 +413,8 @@ define float @fold_fadd_snan_variable_ebmaytrap(float %x) #0 {
 
 define <2 x float> @fold_fadd_vec_snan_variable_ebignore(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_snan_variable_ebignore(
-; CHECK-NEXT:    ret <2 x float> <float 0x7FFC000000000000, float 0xFFFC000000000000>
+; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0x7FF4000000000000, float 0xFFF4000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[ADD1]]
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0x7ff4000000000000, float 0xfff4000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -404,7 +424,8 @@ define <2 x float> @fold_fadd_vec_snan_variable_ebignore(<2 x float> %x) #0 {
 
 define <2 x float> @fold_fadd_vec_snan_variable_ebmaytrap(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_snan_variable_ebmaytrap(
-; CHECK-NEXT:    ret <2 x float> <float 0xFFFC000000000000, float 0x7FFC000000000000>
+; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0xFFF4000000000000, float 0x7FF4000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret <2 x float> [[ADD1]]
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0xfff4000000000000, float 0x7ff4000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %add
@@ -414,7 +435,8 @@ define <2 x float> @fold_fadd_vec_snan_variable_ebmaytrap(<2 x float> %x) #0 {
 
 define <2 x float> @fold_fadd_vec_partial_snan_variable_ebignore(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_partial_snan_variable_ebignore(
-; CHECK-NEXT:    ret <2 x float> <float 0x7FFC000000000000, float 0xFFFF000000000000>
+; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0x7FF4000000000000, float 0xFFFF000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret <2 x float> [[ADD1]]
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0x7ff4000000000000, float 0xffff000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -424,7 +446,8 @@ define <2 x float> @fold_fadd_vec_partial_snan_variable_ebignore(<2 x float> %x)
 
 define <2 x float> @fold_fadd_vec_partial_snan_variable_ebmaytrap(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_partial_snan_variable_ebmaytrap(
-; CHECK-NEXT:    ret <2 x float> <float 0xFFF8000000000000, float 0x7FFC000000000000>
+; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0xFFF8000000000000, float 0x7FF4000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret <2 x float> [[ADD1]]
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0xfff8000000000000, float 0x7ff4000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %add
@@ -432,7 +455,7 @@ define <2 x float> @fold_fadd_vec_partial_snan_variable_ebmaytrap(<2 x float> %x
 
 define float @fold_fadd_snan_variable_ebstrict(float %x) #0 {
 ; CHECK-LABEL: @fold_fadd_snan_variable_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF4000000000000, float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -441,6 +464,7 @@ define float @fold_fadd_snan_variable_ebstrict(float %x) #0 {
 
 define float @fold_fadd_snan_qnan_ebmaytrap() #0 {
 ; CHECK-LABEL: @fold_fadd_snan_qnan_ebmaytrap(
+; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FFC000000000000
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float 0x7ff8000000000000, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -449,7 +473,7 @@ define float @fold_fadd_snan_qnan_ebmaytrap() #0 {
 
 define float @fold_fadd_snan_qnan_ebstrict() #0 {
 ; CHECK-LABEL: @fold_fadd_snan_qnan_ebstrict(
-; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF4000000000000, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float 0x7FF8000000000000) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[ADD]]
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float 0x7ff8000000000000, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
diff --git a/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll b/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
index b55519e8374b6..73e205361e938 100644
--- a/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+++ b/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
@@ -11,7 +11,8 @@
 
 define float @fsub_x_p0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_p0_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %ret
@@ -20,7 +21,7 @@ define float @fsub_x_p0_defaultenv(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_x_p0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_p0_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -29,7 +30,8 @@ define float @fsub_x_p0_ebmaytrap(float %a) #0 {
 
 define float @fsub_nnan_x_p0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_p0_ebmaytrap(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
   ret float %ret
@@ -38,7 +40,7 @@ define float @fsub_nnan_x_p0_ebmaytrap(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_x_p0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_p0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -48,7 +50,7 @@ define float @fsub_x_p0_ebstrict(float %a) #0 {
 ; The instruction is expected to remain, but the result isn't used.
 define float @fsub_nnan_x_p0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_p0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -58,7 +60,7 @@ define float @fsub_nnan_x_p0_ebstrict(float %a) #0 {
 ; Test with a fast math flag set but that flag is not "nnan".
 define float @fsub_ninf_x_p0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_ninf_x_p0_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call ninf float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[RET:%.*]] = call ninf float @llvm.fsub.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call ninf float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -68,7 +70,7 @@ define float @fsub_ninf_x_p0_ebstrict(float %a) #0 {
 ; Round to -inf and if x is zero then the result is -0.0: must not fire
 define float @fsub_x_p0_neginf(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_p0_neginf(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.downward", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.downward", metadata !"fpexcept.ignore")
@@ -79,7 +81,7 @@ define float @fsub_x_p0_neginf(float %a) #0 {
 ; Round to -inf and if x is zero then the result is -0.0: must not fire
 define float @fsub_x_p0_dynamic(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_p0_dynamic(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float 0.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -89,7 +91,8 @@ define float @fsub_x_p0_dynamic(float %a) #0 {
 ; With nsz we don't have to worry about -0.0 so the transform is valid.
 define float @fsub_nsz_x_p0_neginf(float %a) #0 {
 ; CHECK-LABEL: @fsub_nsz_x_p0_neginf(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.downward", metadata !"fpexcept.ignore")
   ret float %ret
@@ -98,7 +101,8 @@ define float @fsub_nsz_x_p0_neginf(float %a) #0 {
 ; With nsz we don't have to worry about -0.0 so the transform is valid.
 define float @fsub_nsz_x_p0_dynamic(float %a) #0 {
 ; CHECK-LABEL: @fsub_nsz_x_p0_dynamic(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.dynamic", metadata !"fpexcept.ignore")
   ret float %ret
@@ -111,7 +115,8 @@ define float @fsub_nsz_x_p0_dynamic(float %a) #0 {
 
 define float @fold_fsub_nsz_x_n0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nsz_x_n0_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %sub = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %sub
@@ -120,7 +125,7 @@ define float @fold_fsub_nsz_x_n0_defaultenv(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fold_fsub_nsz_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nsz_x_n0_ebmaytrap(
-; CHECK-NEXT:    [[SUB:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[SUB:%.*]] = call nsz float @llvm.fsub.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
   %sub = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -129,7 +134,8 @@ define float @fold_fsub_nsz_x_n0_ebmaytrap(float %a) #0 {
 
 define float @fold_fsub_nnan_nsz_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nnan_nsz_x_n0_ebmaytrap(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fsub.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %sub = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
   ret float %sub
@@ -138,7 +144,7 @@ define float @fold_fsub_nnan_nsz_x_n0_ebmaytrap(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fold_fsub_nsz_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nsz_x_n0_ebstrict(
-; CHECK-NEXT:    [[SUB:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[SUB:%.*]] = call nsz float @llvm.fsub.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
   %sub = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -148,7 +154,7 @@ define float @fold_fsub_nsz_x_n0_ebstrict(float %a) #0 {
 ; The instruction is expected to remain, but the result isn't used.
 define float @fold_fsub_nsz_nnan_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nsz_nnan_x_n0_ebstrict(
-; CHECK-NEXT:    [[SUB:%.*]] = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fsub.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[A]]
 ;
   %sub = call nsz nnan float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -162,7 +168,8 @@ define float @fold_fsub_nsz_nnan_x_n0_ebstrict(float %a) #0 {
 
 define float @fold_fsub_fabs_x_n0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_x_n0_defaultenv(
-; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[ABSA1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fsub.f32(float [[ABSA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[ABSA]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
@@ -174,7 +181,7 @@ define float @fold_fsub_fabs_x_n0_defaultenv(float %a) #0 {
 define float @fold_fsub_fabs_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_x_n0_ebmaytrap(
 ; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
-; CHECK-NEXT:    [[SUB:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[ABSA]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[ABSA]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
@@ -184,7 +191,8 @@ define float @fold_fsub_fabs_x_n0_ebmaytrap(float %a) #0 {
 
 define float @fold_fsub_fabs_nnan_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_nnan_x_n0_ebmaytrap(
-; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
+; CHECK-NEXT:    [[ABSA1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
+; CHECK-NEXT:    [[ABSA:%.*]] = call nnan float @llvm.fsub.f32(float [[ABSA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[ABSA]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
@@ -196,7 +204,7 @@ define float @fold_fsub_fabs_nnan_x_n0_ebmaytrap(float %a) #0 {
 define float @fold_fsub_fabs_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_x_n0_ebstrict(
 ; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
-; CHECK-NEXT:    [[SUB:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[ABSA]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[ABSA]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[SUB]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
@@ -207,8 +215,8 @@ define float @fold_fsub_fabs_x_n0_ebstrict(float %a) #0 {
 ; The instruction is expected to remain, but the result isn't used.
 define float @fold_fsub_fabs_nnan_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_nnan_x_n0_ebstrict(
-; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
-; CHECK-NEXT:    [[SUB:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float [[ABSA]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[ABSA1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
+; CHECK-NEXT:    [[ABSA:%.*]] = call nnan float @llvm.fsub.f32(float [[ABSA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[ABSA]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
@@ -218,7 +226,8 @@ define float @fold_fsub_fabs_nnan_x_n0_ebstrict(float %a) #0 {
 
 define float @fold_fsub_sitofp_x_n0_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @fold_fsub_sitofp_x_n0_defaultenv(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[FPA2:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.fsub.f32(float [[FPA2]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[FPA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -232,7 +241,9 @@ define float @fold_fsub_sitofp_x_n0_defaultenv(i32 %a) #0 {
 
 define float @fsub_fneg_n0_fnX_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_n0_fnX_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %nega = fneg float %a
   %ret = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -243,7 +254,7 @@ define float @fsub_fneg_n0_fnX_defaultenv(float %a) #0 {
 define float @fsub_fneg_n0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_n0_fnX_ebmaytrap(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = fneg float %a
@@ -253,7 +264,9 @@ define float @fsub_fneg_n0_fnX_ebmaytrap(float %a) #0 {
 
 define float @fsub_fneg_nnan_n0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nnan_n0_fnX_ebmaytrap(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %nega = fneg float %a
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -264,7 +277,7 @@ define float @fsub_fneg_nnan_n0_fnX_ebmaytrap(float %a) #0 {
 define float @fsub_fneg_n0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_n0_fnX_ebstrict(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = fneg float %a
@@ -276,8 +289,8 @@ define float @fsub_fneg_n0_fnX_ebstrict(float %a) #0 {
 define float @fsub_fneg_nnan_n0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nnan_n0_fnX_ebstrict(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    [[RET1:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    ret float [[RET1]]
 ;
   %nega = fneg float %a
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -291,8 +304,8 @@ define float @fsub_fneg_nnan_n0_fnX_ebstrict(float %a) #0 {
 ; TODO: This won't fire without m_FNeg() knowing the constrained intrinsics.
 define float @fsub_fsub_n0_fnX_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_n0_fnX_defaultenv(
-; CHECK-NEXT:    [[NEGA:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -303,8 +316,8 @@ define float @fsub_fsub_n0_fnX_defaultenv(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_fsub_n0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_n0_fnX_ebmaytrap(
-; CHECK-NEXT:    [[NEGA:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -315,8 +328,8 @@ define float @fsub_fsub_n0_fnX_ebmaytrap(float %a) #0 {
 ; TODO: This won't fire without m_FNeg() knowing the constrained intrinsics.
 define float @fsub_fsub_nnan_n0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_nnan_n0_fnX_ebmaytrap(
-; CHECK-NEXT:    [[NEGA:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -327,8 +340,8 @@ define float @fsub_fsub_nnan_n0_fnX_ebmaytrap(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_fsub_n0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_n0_fnX_ebstrict(
-; CHECK-NEXT:    [[NEGA:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -339,8 +352,8 @@ define float @fsub_fsub_n0_fnX_ebstrict(float %a) #0 {
 ; TODO: This won't fire without m_FNeg() knowing the constrained intrinsics.
 define float @fsub_fsub_nnan_n0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_nnan_n0_fnX_ebstrict(
-; CHECK-NEXT:    [[NEGA:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -354,7 +367,9 @@ define float @fsub_fsub_nnan_n0_fnX_ebstrict(float %a) #0 {
 
 define float @fsub_fneg_nsz_p0_fnX_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nsz_p0_fnX_defaultenv(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %nega = fneg float %a
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -365,7 +380,7 @@ define float @fsub_fneg_nsz_p0_fnX_defaultenv(float %a) #0 {
 define float @fsub_fneg_nsz_p0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nsz_p0_fnX_ebmaytrap(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
-; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = fneg float %a
@@ -375,7 +390,9 @@ define float @fsub_fneg_nsz_p0_fnX_ebmaytrap(float %a) #0 {
 
 define float @fsub_fneg_nsz_nnan_p0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nsz_nnan_p0_fnX_ebmaytrap(
-; CHECK-NEXT:    ret float [[A:%.*]]
+; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %nega = fneg float %a
   %ret = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -386,7 +403,7 @@ define float @fsub_fneg_nsz_nnan_p0_fnX_ebmaytrap(float %a) #0 {
 define float @fsub_fneg_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nsz_p0_fnX_ebstrict(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
-; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = fneg float %a
@@ -398,8 +415,8 @@ define float @fsub_fneg_nsz_p0_fnX_ebstrict(float %a) #0 {
 define float @fsub_fneg_nnan_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nnan_nsz_p0_fnX_ebstrict(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
-; CHECK-NEXT:    [[RET:%.*]] = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    [[RET1:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    ret float [[RET1]]
 ;
   %nega = fneg float %a
   %ret = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -413,8 +430,8 @@ define float @fsub_fneg_nnan_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; TODO: Need constrained intrinsic support in m_FNeg() and m_FSub to fire.
 define float @fsub_fsub_p0_nsz_fnX_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_p0_nsz_fnX_defaultenv(
-; CHECK-NEXT:    [[NEGA:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -425,8 +442,8 @@ define float @fsub_fsub_p0_nsz_fnX_defaultenv(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_fsub_nsz_p0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_nsz_p0_fnX_ebmaytrap(
-; CHECK-NEXT:    [[NEGA:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -437,8 +454,8 @@ define float @fsub_fsub_nsz_p0_fnX_ebmaytrap(float %a) #0 {
 ; TODO: Need constrained intrinsic support in m_FNeg() and m_FSub to fire.
 define float @fsub_fsub_nnan_nsz_p0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_nnan_nsz_p0_fnX_ebmaytrap(
-; CHECK-NEXT:    [[NEGA:%.*]] = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -449,8 +466,8 @@ define float @fsub_fsub_nnan_nsz_p0_fnX_ebmaytrap(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_fsub_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_nsz_p0_fnX_ebstrict(
-; CHECK-NEXT:    [[NEGA:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -461,8 +478,8 @@ define float @fsub_fsub_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; TODO: Need constrained intrinsic support in m_FNeg() and m_FSub to fire.
 define float @fsub_fsub_nnan_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fsub_nnan_nsz_p0_fnX_ebstrict(
-; CHECK-NEXT:    [[NEGA:%.*]] = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.000000e+00, float [[NEGA]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[NEGA1:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA1]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %nega = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -477,7 +494,7 @@ define float @fsub_fsub_nnan_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_x_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_x_defaultenv(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -486,7 +503,8 @@ define float @fsub_x_x_defaultenv(float %a) #0 {
 
 define float @fsub_nnan_x_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_x_defaultenv(
-; CHECK-NEXT:    ret float 0.000000e+00
+; CHECK-NEXT:    [[RET1:%.*]] = call nnan float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[RET1]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %ret
@@ -495,7 +513,7 @@ define float @fsub_nnan_x_x_defaultenv(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_x_x_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_x_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -505,7 +523,7 @@ define float @fsub_x_x_ebmaytrap(float %a) #0 {
 ; TODO: This will fold if we allow non-default floating point environments.
 define float @fsub_nnan_x_x_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_x_ebmaytrap(
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -515,7 +533,7 @@ define float @fsub_nnan_x_x_ebmaytrap(float %a) #0 {
 ; Missing nnan: must not fire.
 define float @fsub_x_x_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_x_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -526,7 +544,7 @@ define float @fsub_x_x_ebstrict(float %a) #0 {
 ; The instruction is expected to remain, but the result isn't used.
 define float @fsub_nnan_x_x_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_x_ebstrict(
-; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float [[A:%.*]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[RET:%.*]] = call nnan float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -540,8 +558,8 @@ define float @fsub_nnan_x_x_ebstrict(float %a) #0 {
 ; Missing nsz and reassoc: must not fire
 define float @fsub_fsub_y_x_x_defaultenv(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_y_x_x_defaultenv(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -552,8 +570,8 @@ define float @fsub_fsub_y_x_x_defaultenv(float %x, float %y) #0 {
 ; TODO: Need constrained intrinsic support in m_c_FAdd() and m_FSub to fire.
 define float @fsub_fsub_fmf_y_x_x_defaultenv(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_fmf_y_x_x_defaultenv(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -567,8 +585,8 @@ define float @fsub_fsub_fmf_y_x_x_defaultenv(float %x, float %y) #0 {
 ; "fpexcept.ignore" instruction. This must not fire.
 define float @fsub_fsub_fmf_y_x_x_ebmaytrap_defaultenv(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_fmf_y_x_x_ebmaytrap_defaultenv(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -579,8 +597,8 @@ define float @fsub_fsub_fmf_y_x_x_ebmaytrap_defaultenv(float %x, float %y) #0 {
 ; Missing nsz and reassoc: must not fire
 define float @fsub_fsub_y_x_x_ebmaytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_y_x_x_ebmaytrap(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -591,8 +609,8 @@ define float @fsub_fsub_y_x_x_ebmaytrap(float %x, float %y) #0 {
 ; TODO: Need constrained intrinsic support in m_c_FAdd() and m_FSub to fire.
 define float @fsub_fsub_fmf_y_x_x_ebmaytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_fmf_y_x_x_ebmaytrap(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -603,8 +621,8 @@ define float @fsub_fsub_fmf_y_x_x_ebmaytrap(float %x, float %y) #0 {
 ; Missing nsz and reassoc: must not fire
 define float @fsub_fsub_y_x_x_ebstrict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_y_x_x_ebstrict(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -615,8 +633,8 @@ define float @fsub_fsub_y_x_x_ebstrict(float %x, float %y) #0 {
 ; TODO: Need constrained intrinsic support in m_c_FAdd() and m_FSub to fire.
 define float @fsub_fsub_fmf_y_x_x_ebstrict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fsub_fsub_fmf_y_x_x_ebstrict(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[Y:%.*]], float [[X:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[Y]], float [[INNER]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fsub.f32(float [[Y:%.*]], float [[X:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[Y]], float [[INNER1]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fsub.f32(float %y, float %x, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -632,8 +650,8 @@ define float @fsub_fsub_fmf_y_x_x_ebstrict(float %x, float %y) #0 {
 ; Missing nsz and reassoc: must not fire
 define float @fadd_fsub_x_y_y_defaultenv(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_x_y_y_defaultenv(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -644,8 +662,8 @@ define float @fadd_fsub_x_y_y_defaultenv(float %x, float %y) #0 {
 ; TODO: Need constrained intrinsic support in m_c_FAdd() and m_FSub to fire.
 define float @fadd_fsub_fmf_x_y_y_defaultenv(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_fmf_x_y_y_defaultenv(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -657,8 +675,8 @@ define float @fadd_fsub_fmf_x_y_y_defaultenv(float %x, float %y) #0 {
 ; "fpexcept.ignore" instruction. This must not fire.
 define float @fadd_fsub_fmf_x_y_y_ebmaytrap_defaultenv(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_fmf_x_y_y_ebmaytrap_defaultenv(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.ignore")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -669,8 +687,8 @@ define float @fadd_fsub_fmf_x_y_y_ebmaytrap_defaultenv(float %x, float %y) #0 {
 ; Missing nsz and reassoc: must not fire
 define float @fadd_fsub_x_y_y_ebmaytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_x_y_y_ebmaytrap(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -681,8 +699,8 @@ define float @fadd_fsub_x_y_y_ebmaytrap(float %x, float %y) #0 {
 ; TODO: Need constrained intrinsic support in m_c_FAdd() and m_FSub to fire.
 define float @fadd_fsub_fmf_x_y_y_ebmaytrap(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_fmf_x_y_y_ebmaytrap(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.maytrap")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -693,8 +711,8 @@ define float @fadd_fsub_fmf_x_y_y_ebmaytrap(float %x, float %y) #0 {
 ; Missing nsz and reassoc: must not fire
 define float @fadd_fsub_x_y_y_ebstrict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_x_y_y_ebstrict(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -705,8 +723,8 @@ define float @fadd_fsub_x_y_y_ebstrict(float %x, float %y) #0 {
 ; TODO: Need constrained intrinsic support in m_c_FAdd() and m_FSub to fire.
 define float @fadd_fsub_fmf_x_y_y_ebstrict(float %x, float %y) #0 {
 ; CHECK-LABEL: @fadd_fsub_fmf_x_y_y_ebstrict(
-; CHECK-NEXT:    [[INNER:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[X:%.*]], float [[Y:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.experimental.constrained.fsub.f32(float [[INNER]], float [[Y]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+; CHECK-NEXT:    [[INNER1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[RET:%.*]] = call reassoc nsz float @llvm.fsub.f32(float [[INNER1]], float [[Y]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[RET]]
 ;
   %inner = call float @llvm.experimental.constrained.fadd.f32(float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.strict")
diff --git a/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll b/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll
index a1294561a3d9e..b40973afaeabd 100644
--- a/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll
+++ b/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll
@@ -7,8 +7,9 @@
 
 define float @nonneg_u_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_defaultenv(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -19,8 +20,9 @@ define float @nonneg_u_defaultenv(i32 %a) #0 {
 
 define float @nonneg_s_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_defaultenv(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearest", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -31,8 +33,9 @@ define float @nonneg_s_defaultenv(i32 %a) #0 {
 
 define float @nonneg_u_maytrap(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_maytrap(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -43,8 +46,9 @@ define float @nonneg_u_maytrap(i32 %a) #0 {
 
 define float @nonneg_s_maytrap(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_maytrap(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -56,9 +60,9 @@ define float @nonneg_s_maytrap(i32 %a) #0 {
 ; NOTE: The fsub instruction is expected to remain, but the result isn't used.
 define float @nonneg_u_ebstrict(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_ebstrict(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[SUB:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float [[SQRA]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -70,9 +74,9 @@ define float @nonneg_u_ebstrict(i32 %a) #0 {
 ; NOTE: The fsub instruction is expected to remain, but the result isn't used.
 define float @nonneg_s_ebstrict(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_ebstrict(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
-; CHECK-NEXT:    [[SUB:%.*]] = call nnan float @llvm.experimental.constrained.fsub.f32(float [[SQRA]], float -0.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -86,8 +90,9 @@ define float @nonneg_s_ebstrict(i32 %a) #0 {
 
 define float @nonneg_u_downward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_downward(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.downward", metadata !"fpexcept.ignore") #0
@@ -98,8 +103,9 @@ define float @nonneg_u_downward(i32 %a) #0 {
 
 define float @nonneg_s_downward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_downward(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.downward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.downward", metadata !"fpexcept.ignore") #0
@@ -110,8 +116,9 @@ define float @nonneg_s_downward(i32 %a) #0 {
 
 define float @nonneg_u_upward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_upward(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.upward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.upward", metadata !"fpexcept.ignore") #0
@@ -122,8 +129,9 @@ define float @nonneg_u_upward(i32 %a) #0 {
 
 define float @nonneg_s_upward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_upward(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.upward", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.upward", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.upward", metadata !"fpexcept.ignore") #0
@@ -134,8 +142,9 @@ define float @nonneg_s_upward(i32 %a) #0 {
 
 define float @nonneg_u_towardzero(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_towardzero(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.towardzero", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.towardzero", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
@@ -146,8 +155,9 @@ define float @nonneg_u_towardzero(i32 %a) #0 {
 
 define float @nonneg_s_towardzero(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_towardzero(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.towardzero", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.towardzero", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
@@ -158,8 +168,9 @@ define float @nonneg_s_towardzero(i32 %a) #0 {
 
 define float @nonneg_u_tonearestaway(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_tonearestaway(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #0
@@ -170,8 +181,9 @@ define float @nonneg_u_tonearestaway(i32 %a) #0 {
 
 define float @nonneg_s_tonearestaway(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_tonearestaway(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #0
@@ -182,8 +194,9 @@ define float @nonneg_s_tonearestaway(i32 %a) #0 {
 
 define float @nonneg_u_dynamic(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_dynamic(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -194,8 +207,9 @@ define float @nonneg_u_dynamic(i32 %a) #0 {
 
 define float @nonneg_s_dynamic(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_dynamic(
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A:%.*]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.experimental.constrained.sqrt.f32(float [[FPA]], metadata !"round.dynamic", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[SQRA]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/Transforms/MergeFunc/merge-fp-intrinsics.ll b/llvm/test/Transforms/MergeFunc/merge-fp-intrinsics.ll
index 2a19ed8b8cca4..889ac673a0a65 100644
--- a/llvm/test/Transforms/MergeFunc/merge-fp-intrinsics.ll
+++ b/llvm/test/Transforms/MergeFunc/merge-fp-intrinsics.ll
@@ -7,9 +7,9 @@ declare float @llvm.experimental.constrained.fadd.f32(float, float, metadata, me
 define float @func1(float %a, float %b) {
 ; CHECK-LABEL: define float @func1
 ; CHECK-SAME: (float [[A:%.*]], float [[B:%.*]]) {
-; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    [[RESULT_2:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[A]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret float [[RESULT]]
+; CHECK-NEXT:    [[RESULT2:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT:    [[RESULT_21:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[RESULT2]]
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
   %result_2 = call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
diff --git a/llvm/test/Transforms/SCCP/strictfp-phis-fcmp.ll b/llvm/test/Transforms/SCCP/strictfp-phis-fcmp.ll
index a6c023a25608b..1a0cbe8dea9d5 100644
--- a/llvm/test/Transforms/SCCP/strictfp-phis-fcmp.ll
+++ b/llvm/test/Transforms/SCCP/strictfp-phis-fcmp.ll
@@ -8,6 +8,7 @@ define i1 @float.1.defaultenv(i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 true
 ;
 
@@ -31,6 +32,7 @@ define i1 @float.1.maytrap(i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret i1 true
 ;
 
@@ -54,7 +56,7 @@ define i1 @float.1.strict(i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.strict") #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    ret i1 true
 ;
 
@@ -79,7 +81,7 @@ define i1 @float.2.defaultenv(i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ 2.000000e+00, [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -103,7 +105,7 @@ define i1 @float.2.maytrap(i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ 2.000000e+00, [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -127,7 +129,7 @@ define i1 @float.2.strict(i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ 2.000000e+00, [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -151,7 +153,7 @@ define i1 @float.3.defaultenv(float %f, i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ [[F:%.*]], [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -175,7 +177,7 @@ define i1 @float.3.maytrap(float %f, i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ [[F:%.*]], [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -199,7 +201,7 @@ define i1 @float.3.strict(float %f, i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ [[F:%.*]], [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmp.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -222,6 +224,7 @@ define i1 @float.4_unreachable.defaultenv(float %f, i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"une") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 
@@ -247,6 +250,7 @@ define i1 @float.4_unreachable.maytrap(float %f, i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"une") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 
@@ -273,7 +277,7 @@ define i1 @float.4_unreachable.strict(float %f, i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"une", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmp.f32(float 1.000000e+00, float 1.000000e+00, metadata !"une") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT:    ret i1 false
 ;
 
diff --git a/llvm/test/Transforms/SCCP/strictfp-phis-fcmps.ll b/llvm/test/Transforms/SCCP/strictfp-phis-fcmps.ll
index 213293a785938..6f0b8dc872625 100644
--- a/llvm/test/Transforms/SCCP/strictfp-phis-fcmps.ll
+++ b/llvm/test/Transforms/SCCP/strictfp-phis-fcmps.ll
@@ -54,7 +54,7 @@ define i1 @float.1.strict(i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float 1.000000e+00, float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.strict") #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmps.f32(float 1.000000e+00, float 1.000000e+00, metadata !"ueq")
 ; CHECK-NEXT:    ret i1 true
 ;
 
@@ -79,7 +79,7 @@ define i1 @float.2.defaultenv(i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ 2.000000e+00, [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -103,7 +103,7 @@ define i1 @float.2.maytrap(i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ 2.000000e+00, [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -127,7 +127,7 @@ define i1 @float.2.strict(i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ 2.000000e+00, [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq")
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -151,7 +151,7 @@ define i1 @float.3.defaultenv(float %f, i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ [[F:%.*]], [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.ignore") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -175,7 +175,7 @@ define i1 @float.3.maytrap(float %f, i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ [[F:%.*]], [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.maytrap") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq") [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -199,7 +199,7 @@ define i1 @float.3.strict(float %f, i1 %cmp) #0 {
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
 ; CHECK-NEXT:    [[P:%.*]] = phi float [ 1.000000e+00, [[ENTRY:%.*]] ], [ [[F:%.*]], [[IF_TRUE]] ]
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.fcmps.f32(float [[P]], float 1.000000e+00, metadata !"ueq")
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
 
@@ -273,7 +273,7 @@ define i1 @float.4_unreachable.strict(float %f, i1 %cmp) #0 {
 ; CHECK:       if.true:
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[C:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float 1.000000e+00, float 1.000000e+00, metadata !"une", metadata !"fpexcept.strict") #[[ATTR0]]
+; CHECK-NEXT:    [[C1:%.*]] = call i1 @llvm.fcmps.f32(float 1.000000e+00, float 1.000000e+00, metadata !"une")
 ; CHECK-NEXT:    ret i1 false
 ;
 
diff --git a/llvm/test/Transforms/Util/libcalls-shrinkwrap-double.ll b/llvm/test/Transforms/Util/libcalls-shrinkwrap-double.ll
index 4ac216f85c74c..2770cfc084a11 100644
--- a/llvm/test/Transforms/Util/libcalls-shrinkwrap-double.ll
+++ b/llvm/test/Transforms/Util/libcalls-shrinkwrap-double.ll
@@ -1,110 +1,121 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt < %s -passes=libcalls-shrinkwrap -S | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 define void @test_range_error(double %value) {
+; CHECK-LABEL: define void @test_range_error(
+; CHECK-SAME: double [[VALUE:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt double [[VALUE]], -7.100000e+02
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt double [[VALUE]], 7.100000e+02
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_0:%.*]] = call double @cosh(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp olt double [[VALUE]], -7.450000e+02
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt double [[VALUE]], 7.090000e+02
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_1:%.*]] = call double @exp(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp olt double [[VALUE]], -1.074000e+03
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp ogt double [[VALUE]], 1.023000e+03
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_3:%.*]] = call double @exp2(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp olt double [[VALUE]], -7.100000e+02
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp ogt double [[VALUE]], 7.100000e+02
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_4:%.*]] = call double @sinh(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp ogt double [[VALUE]], 7.090000e+02
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_5:%.*]] = call double @expm1(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_0 = call double @cosh(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt double %value, -7.100000e+02
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt double %value, 7.100000e+02
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT:[0-9]+]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_0 = call double @cosh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_1 = call double @exp(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt double %value, -7.450000e+02
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt double %value, 7.090000e+02
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_1 = call double @exp(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_3 = call double @exp2(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt double %value, -1.074000e+03
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt double %value, 1.023000e+03
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_3 = call double @exp2(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_4 = call double @sinh(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt double %value, -7.100000e+02
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt double %value, 7.100000e+02
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_4 = call double @sinh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_5 = call double @expm1(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ogt double %value, 7.090000e+02
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_5 = call double @expm1(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 define void @test_range_error_strictfp(double %value) strictfp {
+; CHECK-LABEL: define void @test_range_error_strictfp(
+; CHECK-SAME: double [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt double [[VALUE]], -7.100000e+02
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt double [[VALUE]], 7.100000e+02
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_0:%.*]] = call double @cosh(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp olt double [[VALUE]], -7.450000e+02
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt double [[VALUE]], 7.090000e+02
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_1:%.*]] = call double @exp(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp olt double [[VALUE]], -1.074000e+03
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp ogt double [[VALUE]], 1.023000e+03
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_3:%.*]] = call double @exp2(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp olt double [[VALUE]], -7.100000e+02
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp ogt double [[VALUE]], 7.100000e+02
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_4:%.*]] = call double @sinh(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp ogt double [[VALUE]], 7.090000e+02
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_5:%.*]] = call double @expm1(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_0 = call double @cosh(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -7.100000e+02, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 7.100000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT:[0-9]+]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_0 = call double @cosh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_1 = call double @exp(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -7.450000e+02, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 7.090000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_1 = call double @exp(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_3 = call double @exp2(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -1.074000e+03, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 1.023000e+03, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_3 = call double @exp2(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_4 = call double @sinh(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -7.100000e+02, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 7.100000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_4 = call double @sinh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_5 = call double @expm1(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 7.090000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_5 = call double @expm1(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
@@ -116,223 +127,233 @@ declare double @sinh(double)
 declare double @expm1(double)
 
 define void @test_domain_error(double %value) {
+; CHECK-LABEL: define void @test_domain_error(
+; CHECK-SAME: double [[VALUE:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ogt double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp olt double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_00:%.*]] = call double @acos(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ogt double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_01:%.*]] = call double @asin(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp oeq double [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp oeq double [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_02:%.*]] = call double @cos(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp oeq double [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp oeq double [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_03:%.*]] = call double @sin(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp olt double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_04:%.*]] = call double @acosh(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    [[TMP13:%.*]] = fcmp olt double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[CDCE_CALL9:.*]], label %[[CDCE_END10:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL9]]:
+; CHECK-NEXT:    [[CALL_05:%.*]] = call double @sqrt(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END10]]
+; CHECK:       [[CDCE_END10]]:
+; CHECK-NEXT:    [[TMP14:%.*]] = fcmp oge double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp ole double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP16:%.*]] = or i1 [[TMP15]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[TMP16]], label %[[CDCE_CALL11:.*]], label %[[CDCE_END12:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL11]]:
+; CHECK-NEXT:    [[CALL_06:%.*]] = call double @atanh(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END12]]
+; CHECK:       [[CDCE_END12]]:
+; CHECK-NEXT:    [[TMP17:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP17]], label %[[CDCE_CALL13:.*]], label %[[CDCE_END14:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL13]]:
+; CHECK-NEXT:    [[CALL_07:%.*]] = call double @log(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END14]]
+; CHECK:       [[CDCE_END14]]:
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP18]], label %[[CDCE_CALL15:.*]], label %[[CDCE_END16:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL15]]:
+; CHECK-NEXT:    [[CALL_08:%.*]] = call double @log10(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END16]]
+; CHECK:       [[CDCE_END16]]:
+; CHECK-NEXT:    [[TMP19:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP19]], label %[[CDCE_CALL17:.*]], label %[[CDCE_END18:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL17]]:
+; CHECK-NEXT:    [[CALL_09:%.*]] = call double @log2(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END18]]
+; CHECK:       [[CDCE_END18]]:
+; CHECK-NEXT:    [[TMP20:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP20]], label %[[CDCE_CALL19:.*]], label %[[CDCE_END20:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL19]]:
+; CHECK-NEXT:    [[CALL_10:%.*]] = call double @logb(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END20]]
+; CHECK:       [[CDCE_END20]]:
+; CHECK-NEXT:    [[TMP21:%.*]] = fcmp ole double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP21]], label %[[CDCE_CALL21:.*]], label %[[CDCE_END22:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL21]]:
+; CHECK-NEXT:    [[CALL_11:%.*]] = call double @log1p(double [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END22]]
+; CHECK:       [[CDCE_END22]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_00 = call double @acos(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt double %value, 1.000000e+00
-; CHECK: [[COND2:%[0-9]+]] = fcmp olt double %value, -1.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_00 = call double @acos(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_01 = call double @asin(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt double %value, 1.000000e+00
-; CHECK: [[COND2:%[0-9]+]] = fcmp olt double %value, -1.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_01 = call double @asin(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_02 = call double @cos(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oeq double %value, 0xFFF0000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp oeq double %value, 0x7FF0000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_02 = call double @cos(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_03 = call double @sin(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oeq double %value, 0xFFF0000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp oeq double %value, 0x7FF0000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_03 = call double @sin(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_04 = call double @acosh(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp olt double %value, 1.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_04 = call double @acosh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_05 = call double @sqrt(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp olt double %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_05 = call double @sqrt(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_06 = call double @atanh(double %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oge double %value, 1.000000e+00
-; CHECK: [[COND2:%[0-9]+]] = fcmp ole double %value, -1.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_06 = call double @atanh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_07 = call double @log(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole double %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_07 = call double @log(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_08 = call double @log10(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole double %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_08 = call double @log10(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_09 = call double @log2(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole double %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_09 = call double @log2(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_10 = call double @logb(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole double %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_10 = call double @logb(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_11 = call double @log1p(double %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole double %value, -1.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_11 = call double @log1p(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 define void @test_domain_error_strictfp(double %value) strictfp {
+; CHECK-LABEL: define void @test_domain_error_strictfp(
+; CHECK-SAME: double [[VALUE:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ogt double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp olt double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_00:%.*]] = call double @acos(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ogt double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_01:%.*]] = call double @asin(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp oeq double [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp oeq double [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_02:%.*]] = call double @cos(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp oeq double [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp oeq double [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_03:%.*]] = call double @sin(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp olt double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_04:%.*]] = call double @acosh(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    [[TMP13:%.*]] = fcmp olt double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[CDCE_CALL9:.*]], label %[[CDCE_END10:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL9]]:
+; CHECK-NEXT:    [[CALL_05:%.*]] = call double @sqrt(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END10]]
+; CHECK:       [[CDCE_END10]]:
+; CHECK-NEXT:    [[TMP14:%.*]] = fcmp oge double [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp ole double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP16:%.*]] = or i1 [[TMP15]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[TMP16]], label %[[CDCE_CALL11:.*]], label %[[CDCE_END12:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL11]]:
+; CHECK-NEXT:    [[CALL_06:%.*]] = call double @atanh(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END12]]
+; CHECK:       [[CDCE_END12]]:
+; CHECK-NEXT:    [[TMP17:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP17]], label %[[CDCE_CALL13:.*]], label %[[CDCE_END14:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL13]]:
+; CHECK-NEXT:    [[CALL_07:%.*]] = call double @log(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END14]]
+; CHECK:       [[CDCE_END14]]:
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP18]], label %[[CDCE_CALL15:.*]], label %[[CDCE_END16:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL15]]:
+; CHECK-NEXT:    [[CALL_08:%.*]] = call double @log10(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END16]]
+; CHECK:       [[CDCE_END16]]:
+; CHECK-NEXT:    [[TMP19:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP19]], label %[[CDCE_CALL17:.*]], label %[[CDCE_END18:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL17]]:
+; CHECK-NEXT:    [[CALL_09:%.*]] = call double @log2(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END18]]
+; CHECK:       [[CDCE_END18]]:
+; CHECK-NEXT:    [[TMP20:%.*]] = fcmp ole double [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP20]], label %[[CDCE_CALL19:.*]], label %[[CDCE_END20:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL19]]:
+; CHECK-NEXT:    [[CALL_10:%.*]] = call double @logb(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END20]]
+; CHECK:       [[CDCE_END20]]:
+; CHECK-NEXT:    [[TMP21:%.*]] = fcmp ole double [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP21]], label %[[CDCE_CALL21:.*]], label %[[CDCE_END22:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL21]]:
+; CHECK-NEXT:    [[CALL_11:%.*]] = call double @log1p(double [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END22]]
+; CHECK:       [[CDCE_END22]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_00 = call double @acos(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 1.000000e+00, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_00 = call double @acos(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_01 = call double @asin(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 1.000000e+00, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_01 = call double @asin(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_02 = call double @cos(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0xFFF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_02 = call double @cos(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_03 = call double @sin(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0xFFF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_03 = call double @sin(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_04 = call double @acosh(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_04 = call double @acosh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_05 = call double @sqrt(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_05 = call double @sqrt(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_06 = call double @atanh(double %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 1.000000e+00, metadata !"oge", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -1.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_06 = call double @atanh(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_07 = call double @log(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_07 = call double @log(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_08 = call double @log10(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_08 = call double @log10(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_09 = call double @log2(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_09 = call double @log2(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_10 = call double @logb(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_10 = call double @logb(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_11 = call double @log1p(double %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %value, double -1.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_11 = call double @log1p(double %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
@@ -351,101 +372,121 @@ declare double @logb(double)
 declare double @log1p(double)
 
 define void @test_pow(i32 %int_val, double %exp) {
+; CHECK-LABEL: define void @test_pow(
+; CHECK-SAME: i32 [[INT_VAL:%.*]], double [[EXP:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt double [[EXP]], 1.270000e+02
+; CHECK-NEXT:    br i1 [[TMP1]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL:%.*]] = call double @pow(double 2.500000e+00, double [[EXP]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[CONV:%.*]] = sitofp i32 [[INT_VAL]] to double
+; CHECK-NEXT:    [[TMP2:%.*]] = fcmp ogt double [[EXP]], 3.200000e+01
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ole double [[CONV]], 0.000000e+00
+; CHECK-NEXT:    [[TMP4:%.*]] = or i1 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    br i1 [[TMP4]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call double @pow(double [[CONV]], double [[EXP]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[CONV2:%.*]] = trunc i32 [[INT_VAL]] to i8
+; CHECK-NEXT:    [[CONV3:%.*]] = uitofp i8 [[CONV2]] to double
+; CHECK-NEXT:    [[TMP5:%.*]] = fcmp ogt double [[EXP]], 1.280000e+02
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp ole double [[CONV3]], 0.000000e+00
+; CHECK-NEXT:    [[TMP7:%.*]] = or i1 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:    br i1 [[TMP7]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL4:%.*]] = call double @pow(double [[CONV3]], double [[EXP]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[CONV5:%.*]] = trunc i32 [[INT_VAL]] to i16
+; CHECK-NEXT:    [[CONV6:%.*]] = uitofp i16 [[CONV5]] to double
+; CHECK-NEXT:    [[TMP8:%.*]] = fcmp ogt double [[EXP]], 6.400000e+01
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp ole double [[CONV6]], 0.000000e+00
+; CHECK-NEXT:    [[TMP10:%.*]] = or i1 [[TMP9]], [[TMP8]]
+; CHECK-NEXT:    br i1 [[TMP10]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL7:%.*]] = call double @pow(double [[CONV6]], double [[EXP]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    ret void
+;
   %call = call double @pow(double 2.500000e+00, double %exp)
-; CHECK: [[COND:%[0-9]+]] = fcmp ogt double %exp, 1.270000e+02
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call = call double @pow(double 2.500000e+00, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %conv = sitofp i32 %int_val to double
   %call1 = call double @pow(double %conv, double %exp)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt double %exp, 3.200000e+01
-; CHECK: [[COND2:%[0-9]+]] = fcmp ole double %conv, 0.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call1 = call double @pow(double %conv, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %conv2 = trunc i32 %int_val to i8
   %conv3 = uitofp i8 %conv2 to double
   %call4 = call double @pow(double %conv3, double %exp)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt double %exp, 1.280000e+02
-; CHECK: [[COND2:%[0-9]+]] = fcmp ole double %conv3, 0.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call4 = call double @pow(double %conv3, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
 
   %conv5 = trunc i32 %int_val to i16
   %conv6 = uitofp i16 %conv5 to double
   %call7 = call double @pow(double %conv6, double %exp)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt double %exp, 6.400000e+01
-; CHECK: [[COND2:%[0-9]+]] = fcmp ole double %conv6, 0.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call7 = call double @pow(double %conv6, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 define void @test_pow_strictfp(i32 %int_val, double %exp) strictfp {
+; CHECK-LABEL: define void @test_pow_strictfp(
+; CHECK-SAME: i32 [[INT_VAL:%.*]], double [[EXP:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt double [[EXP]], 1.270000e+02
+; CHECK-NEXT:    br i1 [[TMP1]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL:%.*]] = call double @pow(double 2.500000e+00, double [[EXP]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[CONV:%.*]] = sitofp i32 [[INT_VAL]] to double
+; CHECK-NEXT:    [[TMP2:%.*]] = fcmp ogt double [[EXP]], 3.200000e+01
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ole double [[CONV]], 0.000000e+00
+; CHECK-NEXT:    [[TMP4:%.*]] = or i1 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    br i1 [[TMP4]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call double @pow(double [[CONV]], double [[EXP]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[CONV2:%.*]] = trunc i32 [[INT_VAL]] to i8
+; CHECK-NEXT:    [[CONV3:%.*]] = uitofp i8 [[CONV2]] to double
+; CHECK-NEXT:    [[TMP5:%.*]] = fcmp ogt double [[EXP]], 1.280000e+02
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp ole double [[CONV3]], 0.000000e+00
+; CHECK-NEXT:    [[TMP7:%.*]] = or i1 [[TMP6]], [[TMP5]]
+; CHECK-NEXT:    br i1 [[TMP7]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL4:%.*]] = call double @pow(double [[CONV3]], double [[EXP]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[CONV5:%.*]] = trunc i32 [[INT_VAL]] to i16
+; CHECK-NEXT:    [[CONV6:%.*]] = uitofp i16 [[CONV5]] to double
+; CHECK-NEXT:    [[TMP8:%.*]] = fcmp ogt double [[EXP]], 6.400000e+01
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp ole double [[CONV6]], 0.000000e+00
+; CHECK-NEXT:    [[TMP10:%.*]] = or i1 [[TMP9]], [[TMP8]]
+; CHECK-NEXT:    br i1 [[TMP10]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL7:%.*]] = call double @pow(double [[CONV6]], double [[EXP]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    ret void
+;
   %call = call double @pow(double 2.500000e+00, double %exp) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %exp, double 1.270000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call = call double @pow(double 2.500000e+00, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %conv = sitofp i32 %int_val to double
   %call1 = call double @pow(double %conv, double %exp) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %exp, double 3.200000e+01, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %conv, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call1 = call double @pow(double %conv, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %conv2 = trunc i32 %int_val to i8
   %conv3 = uitofp i8 %conv2 to double
   %call4 = call double @pow(double %conv3, double %exp) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %exp, double 1.280000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %conv3, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call4 = call double @pow(double %conv3, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
 
   %conv5 = trunc i32 %int_val to i16
   %conv6 = uitofp i16 %conv5 to double
   %call7 = call double @pow(double %conv6, double %exp) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %exp, double 6.400000e+01, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double %conv6, double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call7 = call double @pow(double %conv6, double %exp)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 declare double @pow(double, double)
 
-; CHECK: ![[BRANCH_WEIGHT]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
+; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
diff --git a/llvm/test/Transforms/Util/libcalls-shrinkwrap-float.ll b/llvm/test/Transforms/Util/libcalls-shrinkwrap-float.ll
index f4dc79759d17e..48d3d4580881f 100644
--- a/llvm/test/Transforms/Util/libcalls-shrinkwrap-float.ll
+++ b/llvm/test/Transforms/Util/libcalls-shrinkwrap-float.ll
@@ -1,110 +1,121 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt < %s -passes=libcalls-shrinkwrap -S | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 define void @test_range_error(float %value) {
+; CHECK-LABEL: define void @test_range_error(
+; CHECK-SAME: float [[VALUE:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt float [[VALUE]], -8.900000e+01
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt float [[VALUE]], 8.900000e+01
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_0:%.*]] = call float @coshf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp olt float [[VALUE]], -1.030000e+02
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt float [[VALUE]], 8.800000e+01
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_1:%.*]] = call float @expf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp olt float [[VALUE]], -1.490000e+02
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp ogt float [[VALUE]], 1.270000e+02
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_3:%.*]] = call float @exp2f(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp olt float [[VALUE]], -8.900000e+01
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp ogt float [[VALUE]], 8.900000e+01
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_4:%.*]] = call float @sinhf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp ogt float [[VALUE]], 8.800000e+01
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_5:%.*]] = call float @expm1f(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_0 = call float @coshf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt float %value, -8.900000e+01
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt float %value, 8.900000e+01
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT:[0-9]+]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_0 = call float @coshf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_1 = call float @expf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt float %value, -1.030000e+02
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt float %value, 8.800000e+01
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_1 = call float @expf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_3 = call float @exp2f(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt float %value, -1.490000e+02
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt float %value, 1.270000e+02
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_3 = call float @exp2f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_4 = call float @sinhf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt float %value, -8.900000e+01
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt float %value, 8.900000e+01
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_4 = call float @sinhf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_5 = call float @expm1f(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ogt float %value, 8.800000e+01
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_5 = call float @expm1f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 define void @test_range_error_strictfp(float %value) strictfp {
+; CHECK-LABEL: define void @test_range_error_strictfp(
+; CHECK-SAME: float [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt float [[VALUE]], -8.900000e+01
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt float [[VALUE]], 8.900000e+01
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_0:%.*]] = call float @coshf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp olt float [[VALUE]], -1.030000e+02
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt float [[VALUE]], 8.800000e+01
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_1:%.*]] = call float @expf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp olt float [[VALUE]], -1.490000e+02
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp ogt float [[VALUE]], 1.270000e+02
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_3:%.*]] = call float @exp2f(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp olt float [[VALUE]], -8.900000e+01
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp ogt float [[VALUE]], 8.900000e+01
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_4:%.*]] = call float @sinhf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp ogt float [[VALUE]], 8.800000e+01
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_5:%.*]] = call float @expm1f(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_0 = call float @coshf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE:%.*]], float -8.900000e+01, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 8.900000e+01, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT:[0-9]+]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_0 = call float @coshf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_1 = call float @expf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -1.030000e+02, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 8.800000e+01, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_1 = call float @expf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_3 = call float @exp2f(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -1.490000e+02, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 1.270000e+02, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_3 = call float @exp2f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_4 = call float @sinhf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -8.900000e+01, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 8.900000e+01, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_4 = call float @sinhf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_5 = call float @expm1f(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 8.800000e+01, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_5 = call float @expm1f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
@@ -116,224 +127,234 @@ declare float @sinhf(float)
 declare float @expm1f(float)
 
 define void @test_domain_error(float %value) {
+; CHECK-LABEL: define void @test_domain_error(
+; CHECK-SAME: float [[VALUE:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ogt float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp olt float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_00:%.*]] = call float @acosf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ogt float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_01:%.*]] = call float @asinf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp oeq float [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp oeq float [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_02:%.*]] = call float @cosf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp oeq float [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp oeq float [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_03:%.*]] = call float @sinf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp olt float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_04:%.*]] = call float @acoshf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    [[TMP13:%.*]] = fcmp olt float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[CDCE_CALL9:.*]], label %[[CDCE_END10:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL9]]:
+; CHECK-NEXT:    [[CALL_05:%.*]] = call float @sqrtf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END10]]
+; CHECK:       [[CDCE_END10]]:
+; CHECK-NEXT:    [[TMP14:%.*]] = fcmp oge float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp ole float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP16:%.*]] = or i1 [[TMP15]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[TMP16]], label %[[CDCE_CALL11:.*]], label %[[CDCE_END12:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL11]]:
+; CHECK-NEXT:    [[CALL_06:%.*]] = call float @atanhf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END12]]
+; CHECK:       [[CDCE_END12]]:
+; CHECK-NEXT:    [[TMP17:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP17]], label %[[CDCE_CALL13:.*]], label %[[CDCE_END14:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL13]]:
+; CHECK-NEXT:    [[CALL_07:%.*]] = call float @logf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END14]]
+; CHECK:       [[CDCE_END14]]:
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP18]], label %[[CDCE_CALL15:.*]], label %[[CDCE_END16:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL15]]:
+; CHECK-NEXT:    [[CALL_08:%.*]] = call float @log10f(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END16]]
+; CHECK:       [[CDCE_END16]]:
+; CHECK-NEXT:    [[TMP19:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP19]], label %[[CDCE_CALL17:.*]], label %[[CDCE_END18:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL17]]:
+; CHECK-NEXT:    [[CALL_09:%.*]] = call float @log2f(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END18]]
+; CHECK:       [[CDCE_END18]]:
+; CHECK-NEXT:    [[TMP20:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP20]], label %[[CDCE_CALL19:.*]], label %[[CDCE_END20:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL19]]:
+; CHECK-NEXT:    [[CALL_10:%.*]] = call float @logbf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END20]]
+; CHECK:       [[CDCE_END20]]:
+; CHECK-NEXT:    [[TMP21:%.*]] = fcmp ole float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP21]], label %[[CDCE_CALL21:.*]], label %[[CDCE_END22:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL21]]:
+; CHECK-NEXT:    [[CALL_11:%.*]] = call float @log1pf(float [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END22]]
+; CHECK:       [[CDCE_END22]]:
+; CHECK-NEXT:    ret void
+;
 entry:
 
   %call_00 = call float @acosf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt float %value, 1.000000e+00
-; CHECK: [[COND2:%[0-9]+]] = fcmp olt float %value, -1.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_00 = call float @acosf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_01 = call float @asinf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt float %value, 1.000000e+00
-; CHECK: [[COND2:%[0-9]+]] = fcmp olt float %value, -1.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_01 = call float @asinf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_02 = call float @cosf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oeq float %value, 0xFFF0000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp oeq float %value, 0x7FF0000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_02 = call float @cosf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_03 = call float @sinf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oeq float %value, 0xFFF0000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp oeq float %value, 0x7FF0000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_03 = call float @sinf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_04 = call float @acoshf(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp olt float %value, 1.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_04 = call float @acoshf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_05 = call float @sqrtf(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp olt float %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_05 = call float @sqrtf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_06 = call float @atanhf(float %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oge float %value, 1.000000e+00
-; CHECK: [[COND2:%[0-9]+]] = fcmp ole float %value, -1.000000e+00
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_06 = call float @atanhf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_07 = call float @logf(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole float %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_07 = call float @logf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_08 = call float @log10f(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole float %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_08 = call float @log10f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_09 = call float @log2f(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole float %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_09 = call float @log2f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_10 = call float @logbf(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole float %value, 0.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_10 = call float @logbf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_11 = call float @log1pf(float %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole float %value, -1.000000e+00
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_11 = call float @log1pf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
   ret void
 }
 
 define void @test_domain_error_strictfp(float %value) strictfp {
+; CHECK-LABEL: define void @test_domain_error_strictfp(
+; CHECK-SAME: float [[VALUE:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ogt float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp olt float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_00:%.*]] = call float @acosf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ogt float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_01:%.*]] = call float @asinf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp oeq float [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp oeq float [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_02:%.*]] = call float @cosf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp oeq float [[VALUE]], 0xFFF0000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp oeq float [[VALUE]], 0x7FF0000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_03:%.*]] = call float @sinf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp olt float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_04:%.*]] = call float @acoshf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    [[TMP13:%.*]] = fcmp olt float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[CDCE_CALL9:.*]], label %[[CDCE_END10:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL9]]:
+; CHECK-NEXT:    [[CALL_05:%.*]] = call float @sqrtf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END10]]
+; CHECK:       [[CDCE_END10]]:
+; CHECK-NEXT:    [[TMP14:%.*]] = fcmp oge float [[VALUE]], 1.000000e+00
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp ole float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    [[TMP16:%.*]] = or i1 [[TMP15]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[TMP16]], label %[[CDCE_CALL11:.*]], label %[[CDCE_END12:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL11]]:
+; CHECK-NEXT:    [[CALL_06:%.*]] = call float @atanhf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END12]]
+; CHECK:       [[CDCE_END12]]:
+; CHECK-NEXT:    [[TMP17:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP17]], label %[[CDCE_CALL13:.*]], label %[[CDCE_END14:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL13]]:
+; CHECK-NEXT:    [[CALL_07:%.*]] = call float @logf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END14]]
+; CHECK:       [[CDCE_END14]]:
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP18]], label %[[CDCE_CALL15:.*]], label %[[CDCE_END16:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL15]]:
+; CHECK-NEXT:    [[CALL_08:%.*]] = call float @log10f(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END16]]
+; CHECK:       [[CDCE_END16]]:
+; CHECK-NEXT:    [[TMP19:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP19]], label %[[CDCE_CALL17:.*]], label %[[CDCE_END18:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL17]]:
+; CHECK-NEXT:    [[CALL_09:%.*]] = call float @log2f(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END18]]
+; CHECK:       [[CDCE_END18]]:
+; CHECK-NEXT:    [[TMP20:%.*]] = fcmp ole float [[VALUE]], 0.000000e+00
+; CHECK-NEXT:    br i1 [[TMP20]], label %[[CDCE_CALL19:.*]], label %[[CDCE_END20:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL19]]:
+; CHECK-NEXT:    [[CALL_10:%.*]] = call float @logbf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END20]]
+; CHECK:       [[CDCE_END20]]:
+; CHECK-NEXT:    [[TMP21:%.*]] = fcmp ole float [[VALUE]], -1.000000e+00
+; CHECK-NEXT:    br i1 [[TMP21]], label %[[CDCE_CALL21:.*]], label %[[CDCE_END22:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL21]]:
+; CHECK-NEXT:    [[CALL_11:%.*]] = call float @log1pf(float [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END22]]
+; CHECK:       [[CDCE_END22]]:
+; CHECK-NEXT:    ret void
+;
 entry:
 
   %call_00 = call float @acosf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE:%.*]], float 1.000000e+00, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_00 = call float @acosf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_01 = call float @asinf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 1.000000e+00, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_01 = call float @asinf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_02 = call float @cosf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0xFFF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_02 = call float @cosf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_03 = call float @sinf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0xFFF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_03 = call float @sinf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_04 = call float @acoshf(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_04 = call float @acoshf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_05 = call float @sqrtf(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_05 = call float @sqrtf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_06 = call float @atanhf(float %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 1.000000e+00, metadata !"oge", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -1.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_06 = call float @atanhf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_07 = call float @logf(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_07 = call float @logf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_08 = call float @log10f(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_08 = call float @log10f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_09 = call float @log2f(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_09 = call float @log2f(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_10 = call float @logbf(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_10 = call float @logbf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_11 = call float @log1pf(float %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[VALUE]], float -1.000000e+00, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_11 = call float @log1pf(float %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
   ret void
 }
 
@@ -350,4 +371,6 @@ declare float @log2f(float)
 declare float @logbf(float)
 declare float @log1pf(float)
 
-; CHECK: ![[BRANCH_WEIGHT]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
+; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
diff --git a/llvm/test/Transforms/Util/libcalls-shrinkwrap-long-double.ll b/llvm/test/Transforms/Util/libcalls-shrinkwrap-long-double.ll
index c2b981c81c75d..4024025eaeee0 100644
--- a/llvm/test/Transforms/Util/libcalls-shrinkwrap-long-double.ll
+++ b/llvm/test/Transforms/Util/libcalls-shrinkwrap-long-double.ll
@@ -1,110 +1,121 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt < %s -passes=libcalls-shrinkwrap -S | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 define void @test_range_error(x86_fp80 %value) {
+; CHECK-LABEL: define void @test_range_error(
+; CHECK-SAME: x86_fp80 [[VALUE:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00CB174000000000000
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB174000000000000
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_0:%.*]] = call x86_fp80 @coshl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00CB21C000000000000
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB170000000000000
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_1:%.*]] = call x86_fp80 @expl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00D807A000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB1DC000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_3:%.*]] = call x86_fp80 @exp2l(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00CB174000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB174000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_4:%.*]] = call x86_fp80 @sinhl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB170000000000000
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_5:%.*]] = call x86_fp80 @expm1l(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_0 = call x86_fp80 @coshl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xKC00CB174000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK400CB174000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT:[0-9]+]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_0 = call x86_fp80 @coshl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_1 = call x86_fp80 @expl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xKC00CB21C000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK400CB170000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_1 = call x86_fp80 @expl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_3 = call x86_fp80 @exp2l(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xKC00D807A000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK400CB1DC000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_3 = call x86_fp80 @exp2l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_4 = call x86_fp80 @sinhl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xKC00CB174000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK400CB174000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_4 = call x86_fp80 @sinhl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_5 = call x86_fp80 @expm1l(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK400CB170000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_5 = call x86_fp80 @expm1l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 define void @test_range_error_strictfp(x86_fp80 %value) strictfp {
+; CHECK-LABEL: define void @test_range_error_strictfp(
+; CHECK-SAME: x86_fp80 [[VALUE:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00CB174000000000000
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB174000000000000
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_0:%.*]] = call x86_fp80 @coshl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00CB21C000000000000
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB170000000000000
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_1:%.*]] = call x86_fp80 @expl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00D807A000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB1DC000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_3:%.*]] = call x86_fp80 @exp2l(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKC00CB174000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB174000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_4:%.*]] = call x86_fp80 @sinhl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK400CB170000000000000
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_5:%.*]] = call x86_fp80 @expm1l(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_0 = call x86_fp80 @coshl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE:%.*]], x86_fp80 0xKC00CB174000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK400CB174000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT:[0-9]+]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_0 = call x86_fp80 @coshl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_1 = call x86_fp80 @expl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKC00CB21C000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK400CB170000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_1 = call x86_fp80 @expl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_3 = call x86_fp80 @exp2l(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKC00D807A000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK400CB1DC000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_3 = call x86_fp80 @exp2l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_4 = call x86_fp80 @sinhl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKC00CB174000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK400CB174000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_4 = call x86_fp80 @sinhl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_5 = call x86_fp80 @expm1l(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK400CB170000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_5 = call x86_fp80 @expm1l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
@@ -117,223 +128,233 @@ declare x86_fp80 @sinhl(x86_fp80)
 declare x86_fp80 @expm1l(x86_fp80)
 
 define void @test_domain_error(x86_fp80 %value) {
+; CHECK-LABEL: define void @test_domain_error(
+; CHECK-SAME: x86_fp80 [[VALUE:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_00:%.*]] = call x86_fp80 @acosl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_01:%.*]] = call x86_fp80 @asinl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xKFFFF8000000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xK7FFF8000000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_02:%.*]] = call x86_fp80 @cosl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xKFFFF8000000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xK7FFF8000000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_03:%.*]] = call x86_fp80 @sinl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_04:%.*]] = call x86_fp80 @acoshl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    [[TMP13:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[CDCE_CALL9:.*]], label %[[CDCE_END10:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL9]]:
+; CHECK-NEXT:    [[CALL_05:%.*]] = call x86_fp80 @sqrtl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END10]]
+; CHECK:       [[CDCE_END10]]:
+; CHECK-NEXT:    [[TMP14:%.*]] = fcmp oge x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    [[TMP16:%.*]] = or i1 [[TMP15]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[TMP16]], label %[[CDCE_CALL11:.*]], label %[[CDCE_END12:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL11]]:
+; CHECK-NEXT:    [[CALL_06:%.*]] = call x86_fp80 @atanhl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END12]]
+; CHECK:       [[CDCE_END12]]:
+; CHECK-NEXT:    [[TMP17:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP17]], label %[[CDCE_CALL13:.*]], label %[[CDCE_END14:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL13]]:
+; CHECK-NEXT:    [[CALL_07:%.*]] = call x86_fp80 @logl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END14]]
+; CHECK:       [[CDCE_END14]]:
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP18]], label %[[CDCE_CALL15:.*]], label %[[CDCE_END16:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL15]]:
+; CHECK-NEXT:    [[CALL_08:%.*]] = call x86_fp80 @log10l(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END16]]
+; CHECK:       [[CDCE_END16]]:
+; CHECK-NEXT:    [[TMP19:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP19]], label %[[CDCE_CALL17:.*]], label %[[CDCE_END18:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL17]]:
+; CHECK-NEXT:    [[CALL_09:%.*]] = call x86_fp80 @log2l(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END18]]
+; CHECK:       [[CDCE_END18]]:
+; CHECK-NEXT:    [[TMP20:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP20]], label %[[CDCE_CALL19:.*]], label %[[CDCE_END20:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL19]]:
+; CHECK-NEXT:    [[CALL_10:%.*]] = call x86_fp80 @logbl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END20]]
+; CHECK:       [[CDCE_END20]]:
+; CHECK-NEXT:    [[TMP21:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    br i1 [[TMP21]], label %[[CDCE_CALL21:.*]], label %[[CDCE_END22:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL21]]:
+; CHECK-NEXT:    [[CALL_11:%.*]] = call x86_fp80 @log1pl(x86_fp80 [[VALUE]])
+; CHECK-NEXT:    br label %[[CDCE_END22]]
+; CHECK:       [[CDCE_END22]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_00 = call x86_fp80 @acosl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK3FFF8000000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xKBFFF8000000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_00 = call x86_fp80 @acosl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_01 = call x86_fp80 @asinl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt x86_fp80 %value, 0xK3FFF8000000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xKBFFF8000000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_01 = call x86_fp80 @asinl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_02 = call x86_fp80 @cosl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oeq x86_fp80 %value, 0xKFFFF8000000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp oeq x86_fp80 %value, 0xK7FFF8000000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_02 = call x86_fp80 @cosl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_03 = call x86_fp80 @sinl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oeq x86_fp80 %value, 0xKFFFF8000000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp oeq x86_fp80 %value, 0xK7FFF8000000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_03 = call x86_fp80 @sinl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_04 = call x86_fp80 @acoshl(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xK3FFF8000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_04 = call x86_fp80 @acoshl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_05 = call x86_fp80 @sqrtl(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp olt x86_fp80 %value, 0xK00000000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_05 = call x86_fp80 @sqrtl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_06 = call x86_fp80 @atanhl(x86_fp80 %value)
-; CHECK: [[COND1:%[0-9]+]] = fcmp oge x86_fp80 %value, 0xK3FFF8000000000000000
-; CHECK: [[COND2:%[0-9]+]] = fcmp ole x86_fp80 %value, 0xKBFFF8000000000000000
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_06 = call x86_fp80 @atanhl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_07 = call x86_fp80 @logl(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole x86_fp80 %value, 0xK00000000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_07 = call x86_fp80 @logl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_08 = call x86_fp80 @log10l(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole x86_fp80 %value, 0xK00000000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_08 = call x86_fp80 @log10l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_09 = call x86_fp80 @log2l(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole x86_fp80 %value, 0xK00000000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_09 = call x86_fp80 @log2l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_10 = call x86_fp80 @logbl(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole x86_fp80 %value, 0xK00000000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_10 = call x86_fp80 @logbl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_11 = call x86_fp80 @log1pl(x86_fp80 %value)
-; CHECK: [[COND:%[0-9]+]] = fcmp ole x86_fp80 %value, 0xKBFFF8000000000000000
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_11 = call x86_fp80 @log1pl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
 
 define void @test_domain_error_strictfp(x86_fp80 %value) strictfp {
+; CHECK-LABEL: define void @test_domain_error_strictfp(
+; CHECK-SAME: x86_fp80 [[VALUE:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    [[TMP1:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    [[TMP2:%.*]] = or i1 [[TMP1]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[CALL_00:%.*]] = call x86_fp80 @acosl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = fcmp ogt x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    [[TMP5:%.*]] = or i1 [[TMP4]], [[TMP3]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[CDCE_CALL1:.*]], label %[[CDCE_END2:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL1]]:
+; CHECK-NEXT:    [[CALL_01:%.*]] = call x86_fp80 @asinl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END2]]
+; CHECK:       [[CDCE_END2]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xKFFFF8000000000000000
+; CHECK-NEXT:    [[TMP7:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xK7FFF8000000000000000
+; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    br i1 [[TMP8]], label %[[CDCE_CALL3:.*]], label %[[CDCE_END4:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL3]]:
+; CHECK-NEXT:    [[CALL_02:%.*]] = call x86_fp80 @cosl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END4]]
+; CHECK:       [[CDCE_END4]]:
+; CHECK-NEXT:    [[TMP9:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xKFFFF8000000000000000
+; CHECK-NEXT:    [[TMP10:%.*]] = fcmp oeq x86_fp80 [[VALUE]], 0xK7FFF8000000000000000
+; CHECK-NEXT:    [[TMP11:%.*]] = or i1 [[TMP10]], [[TMP9]]
+; CHECK-NEXT:    br i1 [[TMP11]], label %[[CDCE_CALL5:.*]], label %[[CDCE_END6:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL5]]:
+; CHECK-NEXT:    [[CALL_03:%.*]] = call x86_fp80 @sinl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END6]]
+; CHECK:       [[CDCE_END6]]:
+; CHECK-NEXT:    [[TMP12:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    br i1 [[TMP12]], label %[[CDCE_CALL7:.*]], label %[[CDCE_END8:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL7]]:
+; CHECK-NEXT:    [[CALL_04:%.*]] = call x86_fp80 @acoshl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END8]]
+; CHECK:       [[CDCE_END8]]:
+; CHECK-NEXT:    [[TMP13:%.*]] = fcmp olt x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[CDCE_CALL9:.*]], label %[[CDCE_END10:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL9]]:
+; CHECK-NEXT:    [[CALL_05:%.*]] = call x86_fp80 @sqrtl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END10]]
+; CHECK:       [[CDCE_END10]]:
+; CHECK-NEXT:    [[TMP14:%.*]] = fcmp oge x86_fp80 [[VALUE]], 0xK3FFF8000000000000000
+; CHECK-NEXT:    [[TMP15:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    [[TMP16:%.*]] = or i1 [[TMP15]], [[TMP14]]
+; CHECK-NEXT:    br i1 [[TMP16]], label %[[CDCE_CALL11:.*]], label %[[CDCE_END12:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL11]]:
+; CHECK-NEXT:    [[CALL_06:%.*]] = call x86_fp80 @atanhl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END12]]
+; CHECK:       [[CDCE_END12]]:
+; CHECK-NEXT:    [[TMP17:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP17]], label %[[CDCE_CALL13:.*]], label %[[CDCE_END14:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL13]]:
+; CHECK-NEXT:    [[CALL_07:%.*]] = call x86_fp80 @logl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END14]]
+; CHECK:       [[CDCE_END14]]:
+; CHECK-NEXT:    [[TMP18:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP18]], label %[[CDCE_CALL15:.*]], label %[[CDCE_END16:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL15]]:
+; CHECK-NEXT:    [[CALL_08:%.*]] = call x86_fp80 @log10l(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END16]]
+; CHECK:       [[CDCE_END16]]:
+; CHECK-NEXT:    [[TMP19:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP19]], label %[[CDCE_CALL17:.*]], label %[[CDCE_END18:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL17]]:
+; CHECK-NEXT:    [[CALL_09:%.*]] = call x86_fp80 @log2l(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END18]]
+; CHECK:       [[CDCE_END18]]:
+; CHECK-NEXT:    [[TMP20:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xK00000000000000000000
+; CHECK-NEXT:    br i1 [[TMP20]], label %[[CDCE_CALL19:.*]], label %[[CDCE_END20:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL19]]:
+; CHECK-NEXT:    [[CALL_10:%.*]] = call x86_fp80 @logbl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END20]]
+; CHECK:       [[CDCE_END20]]:
+; CHECK-NEXT:    [[TMP21:%.*]] = fcmp ole x86_fp80 [[VALUE]], 0xKBFFF8000000000000000
+; CHECK-NEXT:    br i1 [[TMP21]], label %[[CDCE_CALL21:.*]], label %[[CDCE_END22:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL21]]:
+; CHECK-NEXT:    [[CALL_11:%.*]] = call x86_fp80 @log1pl(x86_fp80 [[VALUE]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END22]]
+; CHECK:       [[CDCE_END22]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   %call_00 = call x86_fp80 @acosl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE:%.*]], x86_fp80 0xK3FFF8000000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKBFFF8000000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_00 = call x86_fp80 @acosl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_01 = call x86_fp80 @asinl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK3FFF8000000000000000, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKBFFF8000000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_01 = call x86_fp80 @asinl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_02 = call x86_fp80 @cosl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKFFFF8000000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK7FFF8000000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_02 = call x86_fp80 @cosl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_03 = call x86_fp80 @sinl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKFFFF8000000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK7FFF8000000000000000, metadata !"oeq", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_03 = call x86_fp80 @sinl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_04 = call x86_fp80 @acoshl(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK3FFF8000000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_04 = call x86_fp80 @acoshl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_05 = call x86_fp80 @sqrtl(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK00000000000000000000, metadata !"olt", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_05 = call x86_fp80 @sqrtl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_06 = call x86_fp80 @atanhl(x86_fp80 %value) strictfp
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK3FFF8000000000000000, metadata !"oge", metadata !"fpexcept.strict")
-; CHECK: [[COND2:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKBFFF8000000000000000, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: [[COND:%[0-9]+]] = or i1 [[COND2]], [[COND1]]
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_06 = call x86_fp80 @atanhl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_07 = call x86_fp80 @logl(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK00000000000000000000, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_07 = call x86_fp80 @logl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_08 = call x86_fp80 @log10l(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK00000000000000000000, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_08 = call x86_fp80 @log10l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_09 = call x86_fp80 @log2l(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK00000000000000000000, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_09 = call x86_fp80 @log2l(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_10 = call x86_fp80 @logbl(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xK00000000000000000000, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_10 = call x86_fp80 @logbl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   %call_11 = call x86_fp80 @log1pl(x86_fp80 %value) strictfp
-; CHECK: [[COND:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[VALUE]], x86_fp80 0xKBFFF8000000000000000, metadata !"ole", metadata !"fpexcept.strict")
-; CHECK: br i1 [[COND]], label %[[CALL_LABEL:cdce.call[0-9]*]], label %[[END_LABEL:cdce.end[0-9]*]], !prof ![[BRANCH_WEIGHT]]
-; CHECK: [[CALL_LABEL]]:
-; CHECK-NEXT: %call_11 = call x86_fp80 @log1pl(x86_fp80 %value)
-; CHECK-NEXT: br label %[[END_LABEL]]
-; CHECK: [[END_LABEL]]:
 
   ret void
 }
@@ -351,4 +372,6 @@ declare x86_fp80 @log2l(x86_fp80)
 declare x86_fp80 @logbl(x86_fp80)
 declare x86_fp80 @log1pl(x86_fp80)
 
-; CHECK: ![[BRANCH_WEIGHT]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
+; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
diff --git a/llvm/test/Transforms/Util/libcalls-shrinkwrap-strictfp.ll b/llvm/test/Transforms/Util/libcalls-shrinkwrap-strictfp.ll
index 61eee90f21053..0b683389533b9 100644
--- a/llvm/test/Transforms/Util/libcalls-shrinkwrap-strictfp.ll
+++ b/llvm/test/Transforms/Util/libcalls-shrinkwrap-strictfp.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt < %s -passes=libcalls-shrinkwrap -S | FileCheck %s
 
 ; #include <math.h>
@@ -17,13 +18,33 @@ target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 define void @test_quiet_nan() {
+; CHECK-LABEL: define void @test_quiet_nan() {
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca double, align 8
+; CHECK-NEXT:    store volatile double 0x7FF8000000000000, ptr [[TMP1]], align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = tail call i32 @feclearexcept(i32 noundef 61)
+; CHECK-NEXT:    [[TMP3:%.*]] = load volatile double, ptr [[TMP1]], align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt double [[TMP3]], 1.000000e+00
+; CHECK-NEXT:    [[TMP5:%.*]] = fcmp olt double [[TMP3]], -1.000000e+00
+; CHECK-NEXT:    [[TMP6:%.*]] = or i1 [[TMP5]], [[TMP4]]
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0:![0-9]+]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[TMP7:%.*]] = call double @acos(double noundef [[TMP3]])
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP8:%.*]] = call i32 @fetestexcept(i32 noundef 61)
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp ne i32 [[TMP8]], 0
+; CHECK-NEXT:    br i1 [[TMP9]], label %[[ABORT:.*]], label %[[RET:.*]]
+; CHECK:       [[ABORT]]:
+; CHECK-NEXT:    call void @abort()
+; CHECK-NEXT:    unreachable
+; CHECK:       [[RET]]:
+; CHECK-NEXT:    ret void
+;
   %1 = alloca double, align 8
   store volatile double 0x7FF8000000000000, ptr %1, align 8
   %2 = tail call i32 @feclearexcept(i32 noundef 61)
   %3 = load volatile double, ptr %1, align 8
   %4 = call double @acos(double noundef %3)
-; CHECK: [[COND1:%[0-9]+]] = fcmp ogt double [[VALUE:%.*]], 1.000000e+00
-; CHECK: [[COND1:%[0-9]+]] = fcmp olt double [[VALUE]], -1.000000e+00
   %5 = call i32 @fetestexcept(i32 noundef 61)
   %6 = icmp ne i32 %5, 0
   br i1 %6, label %abort, label %ret
@@ -37,6 +58,29 @@ ret:
 }
 
 define void @test_quiet_nan_strictfp() strictfp {
+; CHECK-LABEL: define void @test_quiet_nan_strictfp(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca double, align 8
+; CHECK-NEXT:    store volatile double 0x7FF8000000000000, ptr [[TMP1]], align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = tail call i32 @feclearexcept(i32 noundef 61) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP3:%.*]] = load volatile double, ptr [[TMP1]], align 8
+; CHECK-NEXT:    [[TMP4:%.*]] = fcmp ogt double [[TMP3]], 1.000000e+00
+; CHECK-NEXT:    [[TMP5:%.*]] = fcmp olt double [[TMP3]], -1.000000e+00
+; CHECK-NEXT:    [[TMP6:%.*]] = or i1 [[TMP5]], [[TMP4]]
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[CDCE_CALL:.*]], label %[[CDCE_END:.*]], !prof [[PROF0]]
+; CHECK:       [[CDCE_CALL]]:
+; CHECK-NEXT:    [[TMP7:%.*]] = call double @acos(double noundef [[TMP3]]) #[[ATTR0]]
+; CHECK-NEXT:    br label %[[CDCE_END]]
+; CHECK:       [[CDCE_END]]:
+; CHECK-NEXT:    [[TMP8:%.*]] = call i32 @fetestexcept(i32 noundef 61) #[[ATTR0]]
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp ne i32 [[TMP8]], 0
+; CHECK-NEXT:    br i1 [[TMP9]], label %[[ABORT:.*]], label %[[RET:.*]]
+; CHECK:       [[ABORT]]:
+; CHECK-NEXT:    call void @abort() #[[ATTR0]]
+; CHECK-NEXT:    unreachable
+; CHECK:       [[RET]]:
+; CHECK-NEXT:    ret void
+;
   %1 = alloca double, align 8
   store volatile double 0x7FF8000000000000, ptr %1, align 8
   %2 = tail call i32 @feclearexcept(i32 noundef 61) strictfp
@@ -44,8 +88,6 @@ define void @test_quiet_nan_strictfp() strictfp {
   %4 = call double @acos(double noundef %3) strictfp
 ; Generate constrained fcmp if function has strictfp attribute.
 ; That avoids raising fp exception with quiet nan input.
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[VALUE]], double 1.000000e+00, metadata !"ogt", metadata !"fpexcept.strict")
-; CHECK: [[COND1:%[0-9]+]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[VALUE]], double -1.000000e+00, metadata !"olt", metadata !"fpexcept.strict")
   %5 = call i32 @fetestexcept(i32 noundef 61) strictfp
   %6 = icmp ne i32 %5, 0
   br i1 %6, label %abort, label %ret
@@ -65,3 +107,6 @@ declare i32 @fetestexcept(i32 noundef)
 declare double @acos(double noundef)
 
 declare void @abort()
+;.
+; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 1048575}
+;.
diff --git a/llvm/test/Verifier/fp-intrinsics-pass.ll b/llvm/test/Verifier/fp-intrinsics-pass.ll
index 1cc2cb70be76f..e5cfd72fde21a 100644
--- a/llvm/test/Verifier/fp-intrinsics-pass.ll
+++ b/llvm/test/Verifier/fp-intrinsics-pass.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
 ; RUN: opt -passes=verify -S < %s 2>&1 | FileCheck %s
 
 declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
@@ -5,38 +6,35 @@ declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadat
 
 ; Test that the verifier accepts legal code, and that the correct attributes are
 ; attached to the FP intrinsic. The attributes are checked at the bottom.
-; CHECK: declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata) #[[ATTR:[0-9]+]]
-; CHECK: declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata) #[[ATTR]]
 ; Note: FP exceptions aren't usually caught through normal unwind mechanisms,
 ;       but we may want to revisit this for asynchronous exception handling.
 define double @f1(double %a, double %b) strictfp {
-; CHECK-LABEL: define double @f1
-; CHECK-SAME: (double [[A:%.*]], double [[B:%.*]]) #[[STRICTFP:[0-9]+]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FADD:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret double [[FADD]]
+; CHECK-LABEL: define double @f1(
+; CHECK-SAME: double [[A:%.*]], double [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[FADD1:%.*]] = fadd double [[A]], [[B]]
+; CHECK-NEXT:    ret double [[FADD1]]
+;
 entry:
   %fadd = call double @llvm.experimental.constrained.fadd.f64(
-                                               double %a, double %b,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict")
+  double %a, double %b,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict")
   ret double %fadd
 }
 
 define double @f1u(double %a) strictfp {
-; CHECK-LABEL: define double @f1u
-; CHECK-SAME: (double [[A:%.*]]) #[[STRICTFP]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[FSQRT:%.*]] = call double @llvm.experimental.constrained.sqrt.f64(double [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict")
-; CHECK-NEXT:    ret double [[FSQRT]]
+; CHECK-LABEL: define double @f1u(
+; CHECK-SAME: double [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[FSQRT1:%.*]] = call double @llvm.sqrt.f64(double [[A]])
+; CHECK-NEXT:    ret double [[FSQRT1]]
 ;
 entry:
   %fsqrt = call double @llvm.experimental.constrained.sqrt.f64(
-                                               double %a,
-                                               metadata !"round.dynamic",
-                                               metadata !"fpexcept.strict")
+  double %a,
+  metadata !"round.dynamic",
+  metadata !"fpexcept.strict")
   ret double %fsqrt
 }
 
-; CHECK: attributes #[[ATTR]] = { nocallback nofree nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite) }
-; CHECK: attributes #[[STRICTFP]] = { strictfp }
diff --git a/llvm/test/Verifier/fp-intrinsics.ll b/llvm/test/Verifier/fp-intrinsics.ll
index 4934843d5a2ed..63ebc9dcdfc92 100644
--- a/llvm/test/Verifier/fp-intrinsics.ll
+++ b/llvm/test/Verifier/fp-intrinsics.ll
@@ -1,54 +1,166 @@
 ; RUN: not opt -passes=verify -disable-output < %s 2>&1 | FileCheck %s
 
-declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
-declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
+; Test multiple fp.control bundles.
+; CHECK: Multiple "fp.control" operand bundles
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"rtz"), "fp.control"(metadata !"rtz") ]
+define double @f6(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"rtz"), "fp.control"(metadata !"rtz") ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle that has more than one rounding mode specification.
+; CHECK-NEXT: Rounding mode is specified more that once
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"rtz", metadata !"rte") ]
+define double @f7(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"rtz", metadata !"rte") ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle that has non-metadata operand.
+; CHECK-NEXT: Value of a "fp.control" bundle operand must be a metadata
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(i32 0) ]
+define double @f8(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(i32 0) ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle that has non-string operand.
+; CHECK-NEXT: Value of a "fp.control" bundle operand must be a string
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata i64 3) ]
+define double @f9(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !{i64 3}) ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle that specifies incorrect value.
+; CHECK-NEXT: Unrecognized value in "fp.control" bundle operand
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"qqq") ]
+define double @f10(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"qqq") ]
+  ret double %ftrunc
+}
+
+; Test multiple fp.except bundles.
+; CHECK-NEXT: Multiple "fp.except" operand bundles
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.except"(metadata !"strict"), "fp.except"(metadata !"strict") ]
+define double @f11(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.except"(metadata !"strict"), "fp.except"(metadata !"strict") ]
+  ret double %ftrunc
+}
+
+; Test fp.except bundle that has more than one operands.
+; CHECK-NEXT: Expected exactly one "fp.except" bundle operand
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.except"(metadata !"strict", metadata !"strict") ]
+define double @f12(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.except"(metadata !"strict", metadata !"strict") ]
+  ret double %ftrunc
+}
+
+; Test fp.except bundle that has non-metadata operand.
+; CHECK-NEXT: Value of a "fp.except" bundle operand must be a metadata
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.except"(i32 0) ]
+define double @f13(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.except"(i32 0) ]
+  ret double %ftrunc
+}
+
+; Test fp.except bundle that has non-string operand.
+; CHECK-NEXT: Value of a "fp.except" bundle operand must be a string
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.except"(metadata i64 3) ]
+define double @f14(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.except"(metadata !{i64 3}) ]
+  ret double %ftrunc
+}
+
+; Test fp.except bundle that specifies incorrect value.
+; CHECK-NEXT: Value of a "fp.except" bundle operand is not a correct exception behavior
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.except"(metadata !"qqq") ]
+define double @f15(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.except"(metadata !"qqq") ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle specifies two input demormal modes.
+; CHECK-NEXT: Input denormal mode is specified more that once
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.in=ieee", metadata !"denorm.in=ieee") ]
+define double @f16(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.in=ieee", metadata !"denorm.in=ieee") ]
+  ret double %ftrunc
+}
 
-; Test an illegal value for the rounding mode argument.
-; CHECK: invalid rounding mode argument
-; CHECK-NEXT:   %fadd = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.dynomic", metadata !"fpexcept.strict") #1
-define double @f2(double %a, double %b) #0 {
+; Test fp.control bundle specifies two output demormal modes.
+; CHECK-NEXT: Output denormal mode is specified more that once
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.out=ieee", metadata !"denorm.out=ieee") ]
+define double @f17(double %a) #0 {
 entry:
-  %fadd = call double @llvm.experimental.constrained.fadd.f64(
-                                          double %a, double %b,
-                                          metadata !"round.dynomic",
-                                          metadata !"fpexcept.strict") #0
-  ret double %fadd
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.out=ieee", metadata !"denorm.out=ieee") ]
+  ret double %ftrunc
 }
 
-; Test an illegal value for the exception behavior argument.
-; CHECK-NEXT: invalid exception behavior argument
-; CHECK-NEXT:   %fadd = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.restrict") #1
-define double @f3(double %a, double %b) #0 {
+; Test fp.control bundle specifies invalid input demormal modes.
+; CHECK-NEXT: Invalid input denormal mode
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.in=qqq") ]
+define double @f18(double %a) #0 {
 entry:
-  %fadd = call double @llvm.experimental.constrained.fadd.f64(
-                                        double %a, double %b,
-                                        metadata !"round.dynamic",
-                                        metadata !"fpexcept.restrict") #0
-  ret double %fadd
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.in=qqq") ]
+  ret double %ftrunc
 }
 
-; Test an illegal value for the rounding mode argument.
-; CHECK-NEXT: invalid rounding mode argument
-; CHECK-NEXT:   %fadd = call double @llvm.experimental.constrained.sqrt.f64(double %a, metadata !"round.dynomic", metadata !"fpexcept.strict") #1
-define double @f4(double %a) #0 {
+; Test fp.control bundle specifies invalid output demormal modes.
+; CHECK-NEXT: Invalid output denormal mode
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.out=qqq") ]
+define double @f19(double %a) #0 {
 entry:
-  %fadd = call double @llvm.experimental.constrained.sqrt.f64(
-                                          double %a,
-                                          metadata !"round.dynomic",
-                                          metadata !"fpexcept.strict") #0
-  ret double %fadd
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.out=qqq") ]
+  ret double %ftrunc
 }
 
-; Test an illegal value for the exception behavior argument.
-; CHECK-NEXT: invalid exception behavior argument
-; CHECK-NEXT:   %fadd = call double @llvm.experimental.constrained.sqrt.f64(double %a, metadata !"round.dynamic", metadata !"fpexcept.restrict") #1
-define double @f5(double %a) #0 {
+; Test fp.control bundle specifies two F32 input denormal modes.
+; CHECK-NEXT: F32 input denormal mode is specified more than once
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.f32.in=ieee", metadata !"denorm.f32.in=ieee") ]
+define double @f20(double %a) #0 {
 entry:
-  %fadd = call double @llvm.experimental.constrained.sqrt.f64(
-                                        double %a,
-                                        metadata !"round.dynamic",
-                                        metadata !"fpexcept.restrict") #0
-  ret double %fadd
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.f32.in=ieee", metadata !"denorm.f32.in=ieee") ]
+  ret double %ftrunc
 }
 
+; Test fp.control bundle specifies two F32 output denormal modes.
+; CHECK-NEXT: F32 output denormal mode is specified more than once
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.f32.out=ieee", metadata !"denorm.f32.out=ieee") ]
+define double @f21(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.f32.out=ieee", metadata !"denorm.f32.out=ieee") ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle specifies invalid F32 input denormal mode.
+; CHECK-NEXT: Invalid F32 input denormal mode
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.f32.in=qqq") ]
+define double @f22(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.f32.in=qqq") ]
+  ret double %ftrunc
+}
+
+; Test fp.control bundle specifies invalid F32 output denormal mode.
+; CHECK-NEXT: Invalid F32 output denormal mode
+; CHECK-NEXT:   %ftrunc = call double @llvm.trunc.f64(double %a) #{{[0-9]+}} [ "fp.control"(metadata !"denorm.f32.out=qqq") ]
+define double @f23(double %a) #0 {
+entry:
+  %ftrunc = call double @llvm.trunc.f64(double %a) #0 [ "fp.control"(metadata !"denorm.f32.out=qqq") ]
+  ret double %ftrunc
+}
+
+
 attributes #0 = { strictfp }
diff --git a/llvm/test/tools/llvm-reduce/inline-call-sites.ll b/llvm/test/tools/llvm-reduce/inline-call-sites.ll
index 34775d92461fa..9c2d101441c21 100644
--- a/llvm/test/tools/llvm-reduce/inline-call-sites.ll
+++ b/llvm/test/tools/llvm-reduce/inline-call-sites.ll
@@ -730,9 +730,9 @@ define float @nonstrictfp_callee(float %a) {
 }
 
 ; CHECK-LABEL: define float @strictfp_caller(
-; RESULT-NEXT: call float @llvm.experimental.constrained.fadd.f32(
-; RESULT-NEXT: call float @llvm.experimental.constrained.fadd.f32(
-; RESULT-NEXT: ret float %add
+; RESULT-NEXT: fadd float
+; RESULT-NEXT: fadd float
+; RESULT-NEXT: ret float
 define float @strictfp_caller(float %a) strictfp {
   %call = call float @nonstrictfp_callee(float %a) strictfp
   %add = call float @llvm.experimental.constrained.fadd.f32(float %call, float 2.0, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -740,7 +740,7 @@ define float @strictfp_caller(float %a) strictfp {
 }
 
 ; CHECK-LABEL: define float @strictfp_callee(
-; RESULT-NEXT: call float @llvm.experimental.constrained.fadd.f32(
+; RESULT-NEXT: fadd float
 ; RESULT-NEXT: ret float
 define float @strictfp_callee(float %a) strictfp {
   %add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -750,7 +750,7 @@ define float @strictfp_callee(float %a) strictfp {
 ; FIXME: This should not inline. The inlined case should fail the
 ; verifier, but it does not.
 ; CHECK-LABEL: define float @nonstrictfp_caller(
-; RESULT-NEXT: call float @llvm.experimental.constrained.fadd.f32(
+; RESULT-NEXT: fadd float
 ; RESULT-NEXT: fadd float
 ; RESULT-NEXT: ret float
 define float @nonstrictfp_caller(float %a) {
diff --git a/llvm/unittests/Bitcode/BitReaderTest.cpp b/llvm/unittests/Bitcode/BitReaderTest.cpp
index 85f5f645e6454..209ac9db9479b 100644
--- a/llvm/unittests/Bitcode/BitReaderTest.cpp
+++ b/llvm/unittests/Bitcode/BitReaderTest.cpp
@@ -14,6 +14,7 @@
 #include "llvm/Bitcode/BitcodeWriter.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/InstrTypes.h"
+#include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Verifier.h"
@@ -172,19 +173,25 @@ TEST(BitReaderTest, MaterializeConstrainedFPStrictFP) {
   ASSERT_FALSE(Foo->materialize());
   EXPECT_FALSE(Foo->empty());
 
+  // After auto-upgrade, llvm.experimental.constrained.sqrt.f64 with
+  // round.tonearest + fpexcept.strict becomes llvm.sqrt.f64 with an
+  // fp.control { "rte" } bundle.  The strict FP semantics are now encoded
+  // in the operand bundle rather than the call-site strictfp attribute.
+  bool FoundSqrtCall = false;
   for (auto &BB : *Foo) {
-    auto It = BB.begin();
-    while (It != BB.end()) {
-      Instruction &I = *It;
-      ++It;
-
+    for (auto &I : BB) {
       if (auto *Call = dyn_cast<CallBase>(&I)) {
-        EXPECT_TRUE(Call->isStrictFP());
-        EXPECT_FALSE(Call->isNoBuiltin());
+        if (auto *II = dyn_cast<IntrinsicInst>(Call);
+            II && II->getIntrinsicID() == Intrinsic::sqrt) {
+          FoundSqrtCall = true;
+          EXPECT_TRUE(
+              Call->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+          EXPECT_FALSE(Call->isNoBuiltin());
+        }
       }
     }
   }
-
+  EXPECT_TRUE(FoundSqrtCall);
   EXPECT_FALSE(verifyModule(*M, &dbgs()));
 }
 
diff --git a/llvm/unittests/IR/IRBuilderTest.cpp b/llvm/unittests/IR/IRBuilderTest.cpp
index 8c4daf56bbfa4..eb9d4310a63bd 100644
--- a/llvm/unittests/IR/IRBuilderTest.cpp
+++ b/llvm/unittests/IR/IRBuilderTest.cpp
@@ -320,7 +320,6 @@ TEST_F(IRBuilderTest, ConstrainedFP) {
   Value *V;
   Value *VDouble;
   Value *VInt;
-  CallInst *Call;
   IntrinsicInst *II;
   GlobalVariable *GVDouble = new GlobalVariable(*M, Type::getDoubleTy(Ctx),
                             true, GlobalValue::ExternalLinkage, nullptr);
@@ -328,75 +327,62 @@ TEST_F(IRBuilderTest, ConstrainedFP) {
   V = Builder.CreateLoad(GV->getValueType(), GV);
   VDouble = Builder.CreateLoad(GVDouble->getValueType(), GVDouble);
 
-  // See if we get constrained intrinsics instead of non-constrained
-  // instructions.
+  // With default FP constraints (Dynamic rounding + ebStrict exception),
+  // plain IR instructions are emitted inside the strictfp function instead
+  // of FP intrinsics.  Intrinsics are only emitted when settings are
+  // non-default (see below).
   Builder.setIsFPConstrained(true);
   auto Parent = BB->getParent();
   Parent->addFnAttr(Attribute::StrictFP);
 
   V = Builder.CreateFAdd(V, V);
-  ASSERT_TRUE(isa<IntrinsicInst>(V));
-  II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);
+  ASSERT_FALSE(isa<IntrinsicInst>(V));
+  ASSERT_EQ(cast<BinaryOperator>(V)->getOpcode(), Instruction::FAdd);
 
   V = Builder.CreateFSub(V, V);
-  ASSERT_TRUE(isa<IntrinsicInst>(V));
-  II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fsub);
+  ASSERT_FALSE(isa<IntrinsicInst>(V));
+  ASSERT_EQ(cast<BinaryOperator>(V)->getOpcode(), Instruction::FSub);
 
   V = Builder.CreateFMul(V, V);
-  ASSERT_TRUE(isa<IntrinsicInst>(V));
-  II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fmul);
-  
+  ASSERT_FALSE(isa<IntrinsicInst>(V));
+  ASSERT_EQ(cast<BinaryOperator>(V)->getOpcode(), Instruction::FMul);
+
   V = Builder.CreateFDiv(V, V);
-  ASSERT_TRUE(isa<IntrinsicInst>(V));
-  II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fdiv);
-  
+  ASSERT_FALSE(isa<IntrinsicInst>(V));
+  ASSERT_EQ(cast<BinaryOperator>(V)->getOpcode(), Instruction::FDiv);
+
   V = Builder.CreateFRem(V, V);
-  ASSERT_TRUE(isa<IntrinsicInst>(V));
-  II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_frem);
+  ASSERT_FALSE(isa<IntrinsicInst>(V));
+  ASSERT_EQ(cast<BinaryOperator>(V)->getOpcode(), Instruction::FRem);
 
   V = Builder.CreateFMA(V, V, V);
   ASSERT_TRUE(isa<IntrinsicInst>(V));
   II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fma);
+  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::fma);
 
   VInt = Builder.CreateFPToUI(VDouble, Builder.getInt32Ty());
-  ASSERT_TRUE(isa<IntrinsicInst>(VInt));
-  II = cast<IntrinsicInst>(VInt);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fptoui);
+  ASSERT_FALSE(isa<IntrinsicInst>(VInt));
+  ASSERT_TRUE(isa<FPToUIInst>(VInt));
 
   VInt = Builder.CreateFPToSI(VDouble, Builder.getInt32Ty());
-  ASSERT_TRUE(isa<IntrinsicInst>(VInt));
-  II = cast<IntrinsicInst>(VInt);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fptosi);
+  ASSERT_FALSE(isa<IntrinsicInst>(VInt));
+  ASSERT_TRUE(isa<FPToSIInst>(VInt));
 
   VDouble = Builder.CreateUIToFP(VInt, Builder.getDoubleTy());
-  ASSERT_TRUE(isa<IntrinsicInst>(VDouble));
-  II = cast<IntrinsicInst>(VDouble);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_uitofp);
+  ASSERT_FALSE(isa<IntrinsicInst>(VDouble));
+  ASSERT_TRUE(isa<UIToFPInst>(VDouble));
 
   VDouble = Builder.CreateSIToFP(VInt, Builder.getDoubleTy());
-  ASSERT_TRUE(isa<IntrinsicInst>(VDouble));
-  II = cast<IntrinsicInst>(VDouble);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_sitofp);
+  ASSERT_FALSE(isa<IntrinsicInst>(VDouble));
+  ASSERT_TRUE(isa<SIToFPInst>(VDouble));
 
   V = Builder.CreateFPTrunc(VDouble, Type::getFloatTy(Ctx));
-  ASSERT_TRUE(isa<IntrinsicInst>(V));
-  II = cast<IntrinsicInst>(V);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fptrunc);
+  ASSERT_FALSE(isa<IntrinsicInst>(V));
+  ASSERT_TRUE(isa<FPTruncInst>(V));
 
   VDouble = Builder.CreateFPExt(V, Type::getDoubleTy(Ctx));
-  ASSERT_TRUE(isa<IntrinsicInst>(VDouble));
-  II = cast<IntrinsicInst>(VDouble);
-  EXPECT_EQ(II->getIntrinsicID(), Intrinsic::experimental_constrained_fpext);
-
-  // Verify attributes on the call are created automatically.
-  AttributeSet CallAttrs = II->getAttributes().getFnAttrs();
-  EXPECT_EQ(CallAttrs.hasAttribute(Attribute::StrictFP), true);
+  ASSERT_FALSE(isa<IntrinsicInst>(VDouble));
+  ASSERT_TRUE(isa<FPExtInst>(VDouble));
 
   // Verify attributes on the containing function are created when requested.
   Builder.setConstrainedFPFunctionAttr();
@@ -404,90 +390,50 @@ TEST_F(IRBuilderTest, ConstrainedFP) {
   AttributeSet FnAttrs = Attrs.getFnAttrs();
   EXPECT_EQ(FnAttrs.hasAttribute(Attribute::StrictFP), true);
 
-  // Verify the codepaths for setting and overriding the default metadata.
+  // Verify that non-default settings produce FP intrinsics with bundles,
+  // while still-default settings produce plain instructions.
   V = Builder.CreateFAdd(V, V);
-  ASSERT_TRUE(isa<ConstrainedFPIntrinsic>(V));
-  auto *CII = cast<ConstrainedFPIntrinsic>(V);
-  EXPECT_EQ(fp::ebStrict, CII->getExceptionBehavior());
-  EXPECT_EQ(RoundingMode::Dynamic, CII->getRoundingMode());
+  ASSERT_FALSE(isa<IntrinsicInst>(V)); // default: Dynamic rounding + ebStrict
 
   Builder.setDefaultConstrainedExcept(fp::ebIgnore);
   Builder.setDefaultConstrainedRounding(RoundingMode::TowardPositive);
   V = Builder.CreateFAdd(V, V);
-  CII = cast<ConstrainedFPIntrinsic>(V);
+  auto *CII = cast<IntrinsicInst>(V);
   EXPECT_EQ(fp::ebIgnore, CII->getExceptionBehavior());
   EXPECT_EQ(CII->getRoundingMode(), RoundingMode::TowardPositive);
 
   Builder.setDefaultConstrainedExcept(fp::ebIgnore);
   Builder.setDefaultConstrainedRounding(RoundingMode::NearestTiesToEven);
   V = Builder.CreateFAdd(V, V);
-  CII = cast<ConstrainedFPIntrinsic>(V);
+  CII = cast<IntrinsicInst>(V);
   EXPECT_EQ(fp::ebIgnore, CII->getExceptionBehavior());
   EXPECT_EQ(RoundingMode::NearestTiesToEven, CII->getRoundingMode());
 
   Builder.setDefaultConstrainedExcept(fp::ebMayTrap);
   Builder.setDefaultConstrainedRounding(RoundingMode::TowardNegative);
   V = Builder.CreateFAdd(V, V);
-  CII = cast<ConstrainedFPIntrinsic>(V);
+  CII = cast<IntrinsicInst>(V);
   EXPECT_EQ(fp::ebMayTrap, CII->getExceptionBehavior());
   EXPECT_EQ(RoundingMode::TowardNegative, CII->getRoundingMode());
 
   Builder.setDefaultConstrainedExcept(fp::ebStrict);
   Builder.setDefaultConstrainedRounding(RoundingMode::TowardZero);
   V = Builder.CreateFAdd(V, V);
-  CII = cast<ConstrainedFPIntrinsic>(V);
+  CII = cast<IntrinsicInst>(V);
   EXPECT_EQ(fp::ebStrict, CII->getExceptionBehavior());
   EXPECT_EQ(RoundingMode::TowardZero, CII->getRoundingMode());
 
   Builder.setDefaultConstrainedExcept(fp::ebIgnore);
   Builder.setDefaultConstrainedRounding(RoundingMode::Dynamic);
   V = Builder.CreateFAdd(V, V);
-  CII = cast<ConstrainedFPIntrinsic>(V);
+  CII = cast<IntrinsicInst>(V);
   EXPECT_EQ(fp::ebIgnore, CII->getExceptionBehavior());
   EXPECT_EQ(RoundingMode::Dynamic, CII->getRoundingMode());
 
-  // Now override the defaults.
-  Call = Builder.CreateConstrainedFPBinOp(
-        Intrinsic::experimental_constrained_fadd, V, V, nullptr, "", nullptr,
-        RoundingMode::TowardNegative, fp::ebMayTrap);
-  CII = cast<ConstrainedFPIntrinsic>(Call);
-  EXPECT_EQ(CII->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);
-  EXPECT_EQ(fp::ebMayTrap, CII->getExceptionBehavior());
-  EXPECT_EQ(RoundingMode::TowardNegative, CII->getRoundingMode());
-
-  // Same as previous test for CreateConstrainedFPIntrinsic
-  Call = Builder.CreateConstrainedFPIntrinsic(
-      Intrinsic::experimental_constrained_fadd, {V->getType()}, {V, V}, nullptr,
-      "", nullptr, RoundingMode::TowardNegative, fp::ebMayTrap);
-  CII = cast<ConstrainedFPIntrinsic>(Call);
-  EXPECT_EQ(CII->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);
-  EXPECT_EQ(fp::ebMayTrap, CII->getExceptionBehavior());
-  EXPECT_EQ(RoundingMode::TowardNegative, CII->getRoundingMode());
-
   Builder.CreateRetVoid();
   EXPECT_FALSE(verifyModule(*M));
 }
 
-TEST_F(IRBuilderTest, ConstrainedFPIntrinsics) {
-  IRBuilder<> Builder(BB);
-  Value *V;
-  Value *VDouble;
-  ConstrainedFPIntrinsic *CII;
-  GlobalVariable *GVDouble = new GlobalVariable(
-      *M, Type::getDoubleTy(Ctx), true, GlobalValue::ExternalLinkage, nullptr);
-  VDouble = Builder.CreateLoad(GVDouble->getValueType(), GVDouble);
-
-  Builder.setDefaultConstrainedExcept(fp::ebStrict);
-  Builder.setDefaultConstrainedRounding(RoundingMode::TowardZero);
-  Function *Fn = Intrinsic::getOrInsertDeclaration(
-      M.get(), Intrinsic::experimental_constrained_roundeven,
-      {Type::getDoubleTy(Ctx)});
-  V = Builder.CreateConstrainedFPCall(Fn, { VDouble });
-  CII = cast<ConstrainedFPIntrinsic>(V);
-  EXPECT_EQ(Intrinsic::experimental_constrained_roundeven, CII->getIntrinsicID());
-  EXPECT_EQ(fp::ebStrict, CII->getExceptionBehavior());
-}
-
 TEST_F(IRBuilderTest, ConstrainedFPFunctionCall) {
   IRBuilder<> Builder(BB);
 
@@ -515,6 +461,207 @@ TEST_F(IRBuilderTest, ConstrainedFPFunctionCall) {
   EXPECT_FALSE(verifyModule(*M));
 }
 
+TEST_F(IRBuilderTest, FPBundlesDefault) {
+  IRBuilder<> Builder(BB);
+  GlobalVariable *GVDouble = new GlobalVariable(
+      *M, Type::getDoubleTy(Ctx), true, GlobalValue::ExternalLinkage, nullptr);
+  Value *FnArg = Builder.CreateLoad(GVDouble->getValueType(), GVDouble);
+  Function *Fn = Intrinsic::getOrInsertDeclaration(
+      M.get(), Intrinsic::nearbyint, {Type::getDoubleTy(Ctx)});
+
+  // A floating-point operation does not have side effects in default
+  // environment even.
+  {
+    Value *V = Builder.CreateCall(Fn, {FnArg});
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::NearestTiesToEven, I->getRoundingMode());
+    EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesNotAccessMemory());
+  }
+
+  // Check call with FP bundles, rounding is set to default value.
+  // nearbyint(%x) [ "fp.control" (metadata !"rte") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPRoundingBundle(Ctx, Bundles, RoundingMode::NearestTiesToEven);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::NearestTiesToEven, I->getRoundingMode());
+    EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesNotAccessMemory());
+  }
+
+  // Check call with FP bundles, exception behavior is set to default value.
+  // nearbyint(%x) [ "fp.except" (metadata !"ignore") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPExceptionBundle(Ctx, Bundles, fp::ebIgnore);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::NearestTiesToEven, I->getRoundingMode());
+    EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesNotAccessMemory());
+  }
+
+  // Check call with FP bundles, both rounding mode and exception behavior are
+  // set.
+  // nearbyint(%x) [ "fp.except" (metadata !"ignore") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPRoundingBundle(Ctx, Bundles, RoundingMode::NearestTiesToEven);
+    llvm::addFPExceptionBundle(Ctx, Bundles, fp::ebIgnore);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::NearestTiesToEven, I->getRoundingMode());
+    EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesNotAccessMemory());
+  }
+}
+
+TEST_F(IRBuilderTest, FPBundlesStrict) {
+  F->addFnAttr(Attribute::StrictFP);
+
+  IRBuilder<> Builder(BB);
+  Builder.setDefaultConstrainedExcept(fp::ebStrict);
+  Builder.setDefaultConstrainedRounding(RoundingMode::TowardZero);
+  Builder.setIsFPConstrained(true);
+
+  GlobalVariable *GVDouble = new GlobalVariable(
+      *M, Type::getDoubleTy(Ctx), true, GlobalValue::ExternalLinkage, nullptr);
+  Value *FnArg = Builder.CreateLoad(GVDouble->getValueType(), GVDouble);
+  Function *Fn = Intrinsic::getOrInsertDeclaration(
+      M.get(), Intrinsic::nearbyint, {Type::getDoubleTy(Ctx)});
+
+  // A floating-point operation has side effects in strictfp environment even
+  // if it has no FP bundles.
+  {
+    Value *V = Builder.CreateCall(Fn, {FnArg});
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::TowardZero, I->getRoundingMode());
+    EXPECT_EQ(fp::ebStrict, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+  }
+
+  // Check call with FP bundles, with default (dynamic) rounding mode
+  // nearbyint(%x) [ "fp.control" (metadata !"dyn") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPRoundingBundle(Ctx, Bundles, RoundingMode::Dynamic);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::Dynamic, I->getRoundingMode());
+    EXPECT_EQ(fp::ebStrict, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+  }
+
+  // Check call with FP bundles, with specific rounding mode
+  // nearbyint(%x) [ "fp.control" (metadata !"rtz") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPRoundingBundle(Ctx, Bundles, RoundingMode::TowardZero);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::TowardZero, I->getRoundingMode());
+    EXPECT_EQ(fp::ebStrict, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+  }
+
+  // Check call with FP bundles, exception behavior is set to default value.
+  // nearbyint(%x) [ "fp.except" (metadata !"strict") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPExceptionBundle(Ctx, Bundles, fp::ebStrict);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::TowardZero, I->getRoundingMode());
+    EXPECT_EQ(fp::ebStrict, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+  }
+
+  // Check call with FP bundles, exception behavior is set to specific value.
+  // nearbyint(%x) [ "fp.except" (metadata !"ignore") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPExceptionBundle(Ctx, Bundles, fp::ebIgnore);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::TowardZero, I->getRoundingMode());
+    EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+  }
+
+  // Check call with both FP bundles.
+  // nearbyint(%x) [ "fp.control" (metadata !"rtz"),
+  //                 "fp.except" (metadata !"ignore") ]
+  {
+    SmallVector<OperandBundleDef, 1> Bundles;
+    llvm::addFPRoundingBundle(Ctx, Bundles, RoundingMode::NearestTiesToEven);
+    llvm::addFPExceptionBundle(Ctx, Bundles, fp::ebIgnore);
+    Value *V = Builder.CreateCall(Fn, {FnArg}, Bundles);
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    EXPECT_TRUE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_EQ(Intrinsic::nearbyint, I->getIntrinsicID());
+    EXPECT_EQ(RoundingMode::NearestTiesToEven, I->getRoundingMode());
+    EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+  }
+
+  // Integer intrinsics never receive FP operand bundles and have no FP
+  // memory effects, even in strictfp context.
+  {
+    Function *Fn = Intrinsic::getOrInsertDeclaration(M.get(), Intrinsic::abs,
+                                                     {Type::getInt64Ty(Ctx)});
+    GlobalVariable *GVInt = new GlobalVariable(*M, Type::getInt64Ty(Ctx), true,
+                                               GlobalValue::ExternalLinkage,
+                                               nullptr);
+    Value *IntArg = Builder.CreateLoad(Type::getInt64Ty(Ctx), GVInt);
+    Value *V = Builder.CreateCall(Fn, {IntArg, Builder.getInt1(false)});
+    auto *I = cast<IntrinsicInst>(V);
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_except).has_value());
+    EXPECT_FALSE(I->getOperandBundle(LLVMContext::OB_fp_control).has_value());
+    MemoryEffects ME = I->getMemoryEffects();
+    EXPECT_TRUE(ME.doesNotAccessMemory());
+  }
+}
+
 TEST_F(IRBuilderTest, Lifetime) {
   IRBuilder<> Builder(BB);
   AllocaInst *Var1 = Builder.CreateAlloca(Builder.getInt8Ty());
diff --git a/llvm/unittests/IR/InstructionsTest.cpp b/llvm/unittests/IR/InstructionsTest.cpp
index b01569d216676..4208089359d2a 100644
--- a/llvm/unittests/IR/InstructionsTest.cpp
+++ b/llvm/unittests/IR/InstructionsTest.cpp
@@ -18,7 +18,6 @@
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DerivedTypes.h"
-#include "llvm/IR/FPEnv.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/LLVMContext.h"
@@ -566,59 +565,6 @@ TEST(InstructionsTest, FPMathOperator) {
   I->deleteValue();
 }
 
-TEST(InstructionTest, ConstrainedTrans) {
-  LLVMContext Context;
-  std::unique_ptr<Module> M(new Module("MyModule", Context));
-  FunctionType *FTy =
-      FunctionType::get(Type::getVoidTy(Context),
-                        {Type::getFloatTy(Context), Type::getFloatTy(Context),
-                         Type::getInt32Ty(Context)},
-                        false);
-  auto *F = Function::Create(FTy, Function::ExternalLinkage, "", M.get());
-  auto *BB = BasicBlock::Create(Context, "bb", F);
-  IRBuilder<> Builder(Context);
-  Builder.SetInsertPoint(BB);
-  auto *Arg0 = F->arg_begin();
-  auto *Arg1 = F->arg_begin() + 1;
-
-  {
-    auto *I = cast<Instruction>(Builder.CreateFAdd(Arg0, Arg1));
-    EXPECT_EQ(Intrinsic::experimental_constrained_fadd,
-              getConstrainedIntrinsicID(*I));
-  }
-
-  {
-    auto *I = cast<Instruction>(
-        Builder.CreateFPToSI(Arg0, Type::getInt32Ty(Context)));
-    EXPECT_EQ(Intrinsic::experimental_constrained_fptosi,
-              getConstrainedIntrinsicID(*I));
-  }
-
-  {
-    auto *I = cast<Instruction>(Builder.CreateIntrinsic(
-        Intrinsic::ceil, {Type::getFloatTy(Context)}, {Arg0}));
-    EXPECT_EQ(Intrinsic::experimental_constrained_ceil,
-              getConstrainedIntrinsicID(*I));
-  }
-
-  {
-    auto *I = cast<Instruction>(Builder.CreateFCmpOEQ(Arg0, Arg1));
-    EXPECT_EQ(Intrinsic::experimental_constrained_fcmp,
-              getConstrainedIntrinsicID(*I));
-  }
-
-  {
-    auto *Arg2 = F->arg_begin() + 2;
-    auto *I = cast<Instruction>(Builder.CreateAdd(Arg2, Arg2));
-    EXPECT_EQ(Intrinsic::not_intrinsic, getConstrainedIntrinsicID(*I));
-  }
-
-  {
-    auto *I = cast<Instruction>(Builder.CreateConstrainedFPBinOp(
-        Intrinsic::experimental_constrained_fadd, Arg0, Arg0));
-    EXPECT_EQ(Intrinsic::not_intrinsic, getConstrainedIntrinsicID(*I));
-  }
-}
 
 TEST(InstructionsTest, isEliminableCastPair) {
   LLVMContext C;
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
index 688bc19cbf18a..6d3db6ab3520b 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
@@ -419,6 +419,11 @@ def LLVM_StripInvariantGroupOp
 }
 
 // Constrained Floating-Point Intrinsics.
+//
+// These MLIR ops (llvm.intr.experimental.constrained.*) lower to new LLVM FP
+// intrinsics (llvm.fadd, llvm.fsub, etc.) with fp.control / fp.except operand
+// bundles.  The experimental_constrained_* intrinsics were removed and are
+// auto-upgraded to the new form on IR load.
 
 class LLVM_ConstrainedIntr<string mnem, int numArgs,
                            bit overloadedResult, list<int> overloadedOperands,
@@ -438,6 +443,10 @@ class LLVM_ConstrainedIntr<string mnem, int numArgs,
                            /*requiresArgAndResultAttrs=*/0,
                            /*immArgPositions=*/[],
                            /*immArgAttrNames=*/[]> {
+  // Override the LLVM enum name to the new (non-constrained) intrinsic ID.
+  // The experimental_constrained_* intrinsics have been replaced by llvm.fadd,
+  // llvm.fsub, etc. with fp.control/fp.except operand bundles.
+  let llvmEnumName = mnem;
   dag regularArgs = !dag(ins, !listsplat(LLVM_Type, numArgs), !foreach(i, !range(numArgs), "arg_" #i));
   dag attrArgs = !con(!cond(!gt(hasRoundingMode, 0) : (ins ValidRoundingModeAttr:$roundingmode),
                             true : (ins)),
@@ -458,42 +467,65 @@ class LLVM_ConstrainedIntr<string mnem, int numArgs,
     llvm::Module *module = builder.GetInsertBlock()->getModule();
     llvm::Function *callee =
       llvm::Intrinsic::getOrInsertDeclaration(module,
-        llvm::Intrinsic::experimental_constrained_}] #
+        llvm::Intrinsic::}] #
     mnem # [{, overloadedTypes); }] #
     !cond(!gt(hasRoundingMode, 0) : [{
     // Get rounding mode using interface.
     llvm::RoundingMode rounding =
         moduleTranslation.translateRoundingMode($roundingmode); }],
-          true : [{
-    // No rounding mode.
-    std::optional<llvm::RoundingMode> rounding; }]) # [{
+          true : "") # [{
     llvm::fp::ExceptionBehavior except =
       moduleTranslation.translateFPExceptionBehavior($fpExceptionBehavior);
-    $res = builder.CreateConstrainedFPCall(callee, args, "", rounding, except);
+    SmallVector<llvm::OperandBundleDef, 2> bundles; }] #
+    !cond(!gt(hasRoundingMode, 0) : [{
+    llvm::addFPRoundingBundle(module->getContext(), bundles, rounding); }],
+          true : "") # [{
+    llvm::addFPExceptionBundle(module->getContext(), bundles, except);
+    $res = builder.CreateCall(callee, args, bundles);
   }];
   let mlirBuilder = [{
-    SmallVector<Value> mlirOperands;
-    SmallVector<NamedAttribute> mlirAttrs;
-    if (failed(moduleImport.convertIntrinsicArguments(
-        llvmOperands.take_front( }] # numArgs # [{), {}, false,
-        {}, {}, mlirOperands, mlirAttrs))) {
-      return failure();
-    }
-
-    FPExceptionBehaviorAttr fpExceptionBehaviorAttr =
-        $_fpExceptionBehavior_attr($fpExceptionBehavior);
-    mlirAttrs.push_back(
-        $_builder.getNamedAttr(
-            $_qualCppClassName::getFPExceptionBehaviorAttrName(),
-            fpExceptionBehaviorAttr)); }] #
+    // Only import as constrained op when the call carries at least one FP bundle
+    // (fp.except or fp.control). If no bundle is present, skip op creation.
+    //
+    // Note: for intrinsics that have a regular MLIR counterpart (fma, fmuladd),
+    // the non-bundle case is handled explicitly in LLVMIRToLLVMTranslation.cpp
+    // before reaching this generated dispatch, so a plain call to those intrinsics
+    // never reaches this handler. For all other constrained ops (fadd, fsub, etc.),
+    // the corresponding llvm.fadd / llvm.fsub intrinsics are new and only ever
+    // appear with FP bundles, so the no-bundle branch here is dead code.
+    //
+    // The auto-upgrade of old experimental_constrained_* intrinsics omits the
+    // fp.except bundle when the exception behavior is the default (strict), and
+    // omits fp.control when the rounding mode is the default (dynamic), so we
+    // check for either bundle rather than requiring fp.except specifically.
+    if (inst->getOperandBundle(llvm::LLVMContext::OB_fp_except).has_value() ||
+        inst->getOperandBundle(llvm::LLVMContext::OB_fp_control).has_value()) {
+      SmallVector<Value> mlirOperands;
+      SmallVector<NamedAttribute> mlirAttrs;
+      if (failed(moduleImport.convertIntrinsicArguments(
+          llvmOperands.take_front( }] # numArgs # [{), {}, false,
+          {}, {}, mlirOperands, mlirAttrs))) {
+        return failure();
+      }
+
+      FPExceptionBehaviorAttr fpExceptionBehaviorAttr =
+          moduleImport.matchFPExceptionBehaviorAttrFromBundle(inst);
+      mlirAttrs.push_back(
+          $_builder.getNamedAttr(
+              $_qualCppClassName::getFPExceptionBehaviorAttrName(),
+              fpExceptionBehaviorAttr)); }] #
     !cond(!gt(hasRoundingMode, 0) : [{
-    RoundingModeAttr roundingModeAttr = $_roundingMode_attr($roundingmode);
-    mlirAttrs.push_back(
-        $_builder.getNamedAttr($_qualCppClassName::getRoundingModeAttrName(),
-                               roundingModeAttr));
+      RoundingModeAttr roundingModeAttr =
+          moduleImport.matchRoundingModeAttrFromBundle(inst);
+      mlirAttrs.push_back(
+          $_builder.getNamedAttr($_qualCppClassName::getRoundingModeAttrName(),
+                                 roundingModeAttr));
     }], true : "") # [{
-    $res = $_qualCppClassName::create($_builder, $_location,
-      $_resultType, mlirOperands, mlirAttrs);
+      $res = $_qualCppClassName::create($_builder, $_location,
+        $_resultType, mlirOperands, mlirAttrs);
+    }
+    // No return here when no bundles: fall through to let the regular handler
+    // (if any) process the plain call to this intrinsic.
   }];
 }
 
diff --git a/mlir/include/mlir/Target/LLVMIR/ModuleImport.h b/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
index dba950c0b48b6..70ec729efb682 100644
--- a/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
+++ b/mlir/include/mlir/Target/LLVMIR/ModuleImport.h
@@ -179,6 +179,15 @@ class ModuleImport {
   /// fails.
   RoundingModeAttr matchRoundingModeAttr(llvm::Value *value);
 
+  /// Extracts the FP exception behavior attribute from the fp.except operand
+  /// bundle of `inst`. Returns the default (ignore) behavior when absent.
+  FPExceptionBehaviorAttr
+  matchFPExceptionBehaviorAttrFromBundle(llvm::CallInst *inst);
+
+  /// Extracts the rounding mode attribute from the fp.control operand bundle of
+  /// `inst`. Returns the default (NearestTiesToEven) mode when absent.
+  RoundingModeAttr matchRoundingModeAttrFromBundle(llvm::CallInst *inst);
+
   /// Converts `value` to an array of alias scopes or returns failure if the
   /// conversion fails.
   FailureOr<SmallVector<AliasScopeAttr>>
diff --git a/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp
index e9cd335835263..bbcb24a2aced4 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp
@@ -61,6 +61,52 @@ static LogicalResult convertIntrinsicImpl(OpBuilder &odsBuilder,
                                           LLVM::ModuleImport &moduleImport) {
   llvm::Intrinsic::ID intrinsicID = inst->getIntrinsicID();
 
+  // llvm.fma and llvm.fmuladd have *two* MLIR handlers registered for the
+  // same LLVM intrinsic ID: a constrained op (LLVM_ConstrainedFMAIntr /
+  // LLVM_ConstrainedFMulAddIntr) and a regular op (LLVM_FMAOp /
+  // LLVM_FMulAddOp). The auto-generated .inc dispatch lists the constrained
+  // handler first and always ends with an unconditional `return success()`,
+  // so the regular handler is never reachable for calls that carry no FP
+  // bundles. Handle these two intrinsics explicitly before falling through to
+  // the .inc dispatch, dispatching on whether the call has FP bundles.
+  bool hasFPBundles =
+      inst->getOperandBundle(llvm::LLVMContext::OB_fp_except).has_value() ||
+      inst->getOperandBundle(llvm::LLVMContext::OB_fp_control).has_value();
+
+  if (!hasFPBundles && (intrinsicID == llvm::Intrinsic::fma ||
+                        intrinsicID == llvm::Intrinsic::fmuladd)) {
+    SmallVector<llvm::Value *> args(inst->args());
+    ArrayRef<llvm::Value *> llvmOperands(args);
+    SmallVector<llvm::OperandBundleUse> llvmOpBundles;
+    llvmOpBundles.reserve(inst->getNumOperandBundles());
+    for (unsigned i = 0; i < inst->getNumOperandBundles(); ++i)
+      llvmOpBundles.push_back(inst->getOperandBundleAt(i));
+
+    SmallVector<Value> mlirOperands;
+    SmallVector<NamedAttribute> mlirAttrs;
+    if (failed(moduleImport.convertIntrinsicArguments(
+            llvmOperands, llvmOpBundles, false, {}, {}, mlirOperands,
+            mlirAttrs)))
+      return failure();
+
+    SmallVector<Type> resultTypes = {moduleImport.convertType(inst->getType())};
+    Location loc = moduleImport.translateLoc(inst->getDebugLoc());
+
+    if (intrinsicID == llvm::Intrinsic::fmuladd) {
+      auto op = LLVM::FMulAddOp::create(odsBuilder, loc, resultTypes,
+                                        mlirOperands, mlirAttrs);
+      moduleImport.setFastmathFlagsAttr(inst, op);
+      moduleImport.mapValue(inst) = op;
+      return success();
+    }
+    // intrinsicID == llvm::Intrinsic::fma
+    auto op =
+        LLVM::FMAOp::create(odsBuilder, loc, resultTypes, mlirOperands, mlirAttrs);
+    moduleImport.setFastmathFlagsAttr(inst, op);
+    moduleImport.mapValue(inst) = op;
+    return success();
+  }
+
   // Check if the intrinsic is convertible to an MLIR dialect counterpart and
   // copy the arguments to an an LLVM operands array reference for conversion.
   if (isConvertibleIntrinsic(intrinsicID)) {
diff --git a/mlir/lib/Target/LLVMIR/ModuleImport.cpp b/mlir/lib/Target/LLVMIR/ModuleImport.cpp
index eab4379a28610..33bcf60f24c3d 100644
--- a/mlir/lib/Target/LLVMIR/ModuleImport.cpp
+++ b/mlir/lib/Target/LLVMIR/ModuleImport.cpp
@@ -2080,6 +2080,30 @@ RoundingModeAttr ModuleImport::matchRoundingModeAttr(llvm::Value *value) {
       convertRoundingModeFromLLVM(*optLLVM));
 }
 
+FPExceptionBehaviorAttr
+ModuleImport::matchFPExceptionBehaviorAttrFromBundle(llvm::CallInst *inst) {
+  if (inst->getOperandBundle(llvm::LLVMContext::OB_fp_except).has_value())
+    return builder.getAttr<FPExceptionBehaviorAttr>(
+        convertFPExceptionBehaviorFromLLVM(inst->getExceptionBehavior()));
+  // No fp.except bundle: when auto-upgrading experimental_constrained_*
+  // intrinsics, fp.except is omitted only when exception behavior is "strict"
+  // (the constrained-FP default). Return strict to preserve semantics.
+  return builder.getAttr<FPExceptionBehaviorAttr>(
+      convertFPExceptionBehaviorFromLLVM(llvm::fp::ebStrict));
+}
+
+RoundingModeAttr
+ModuleImport::matchRoundingModeAttrFromBundle(llvm::CallInst *inst) {
+  if (inst->getOperandBundle(llvm::LLVMContext::OB_fp_control).has_value())
+    return builder.getAttr<RoundingModeAttr>(
+        convertRoundingModeFromLLVM(inst->getRoundingMode()));
+  // No fp.control bundle: when auto-upgrading experimental_constrained_*
+  // intrinsics, fp.control is omitted only when rounding mode is "dynamic"
+  // (the constrained-FP default). Return dynamic to preserve semantics.
+  return builder.getAttr<RoundingModeAttr>(
+      convertRoundingModeFromLLVM(llvm::RoundingMode::Dynamic));
+}
+
 FailureOr<SmallVector<AliasScopeAttr>>
 ModuleImport::matchAliasScopeAttrs(llvm::Value *value) {
   auto *nodeAsVal = cast<llvm::MetadataAsValue>(value);
diff --git a/mlir/test/Target/LLVMIR/Import/intrinsic.ll b/mlir/test/Target/LLVMIR/Import/intrinsic.ll
index 946605060016c..5826cb50cf68e 100644
--- a/mlir/test/Target/LLVMIR/Import/intrinsic.ll
+++ b/mlir/test/Target/LLVMIR/Import/intrinsic.ll
@@ -1198,7 +1198,9 @@ define void @experimental_constrained_fpext(float %s, <4 x float> %v) {
   ; CHECK: llvm.intr.experimental.constrained.fpext %{{.*}} maytrap : f32 to f64
   %2 = call double @llvm.experimental.constrained.fpext.f64.f32(float %s, metadata !"fpexcept.maytrap")
   ; CHECK: llvm.intr.experimental.constrained.fpext %{{.*}} strict : f32 to f64
-  %3 = call double @llvm.experimental.constrained.fpext.f64.f32(float %s, metadata !"fpexcept.strict")
+  ; Use new bundle format: old-format fpext.strict auto-upgrades to a plain
+  ; fpext instruction (losing constrained semantics), so use explicit bundle.
+  %3 = call double @llvm.fpext.f64.f32(float %s) [ "fp.except"(metadata !"strict") ]
   ; CHECK: llvm.intr.experimental.constrained.fpext %{{.*}} ignore : vector<4xf32> to vector<4xf64>
   %6 = call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float> %v, metadata !"fpexcept.ignore")
   ret void
diff --git a/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir b/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
index ec376a0df0b58..0562fabb34abd 100644
--- a/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
+++ b/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
@@ -1185,195 +1185,127 @@ llvm.func @vector_ptrmask(%p: vector<8 x !llvm.ptr>, %mask: vector<8 x i64>) ->
 
 // CHECK-LABEL: @experimental_constrained_fadd
 llvm.func @experimental_constrained_fadd(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.fadd.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fadd.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fadd %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fadd.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.fadd.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.fadd %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fsub
 llvm.func @experimental_constrained_fsub(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.fsub.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fsub.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fsub %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fsub.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.fsub.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.fsub %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fmul
 llvm.func @experimental_constrained_fmul(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.fmul.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fmul.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fmul %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fmul.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.fmul.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.fmul %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fdiv
 llvm.func @experimental_constrained_fdiv(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.fdiv.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fdiv.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fdiv %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fdiv.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.fdiv.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.fdiv %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_frem
 llvm.func @experimental_constrained_frem(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.frem.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.frem.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.frem %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.frem.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.frem.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.frem %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fma
 llvm.func @experimental_constrained_fma(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.fma.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fma.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fma %s, %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fma.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.fma.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.fma %v, %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fmuladd
 llvm.func @experimental_constrained_fmuladd(%s: f32, %v: vector<4 x f32>) {
-  // CHECK: call float @llvm.experimental.constrained.fmuladd.f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fmuladd.f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fmuladd %s, %s, %s towardzero ignore : f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fmuladd.v4f32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call <4 x float> @llvm.fmuladd.v4f32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %1 = llvm.intr.experimental.constrained.fmuladd %v, %v, %v towardzero ignore : vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_uitofp
 llvm.func @experimental_constrained_uitofp(%s: i32, %v: vector<4 x i32>) {
-  // CHECK: call float @llvm.experimental.constrained.uitofp.f32.i32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.uitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.uitofp %s towardzero ignore : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.uitofp.f32.i32(
-  // CHECK: metadata !"round.tonearest"
-  // CHECK: metadata !"fpexcept.maytrap"
+  // CHECK: call float @llvm.uitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
   %1 = llvm.intr.experimental.constrained.uitofp %s tonearest maytrap : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.uitofp.f32.i32(
-  // CHECK: metadata !"round.upward"
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call float @llvm.uitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
   %2 = llvm.intr.experimental.constrained.uitofp %s upward strict : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.uitofp.f32.i32(
-  // CHECK: metadata !"round.downward"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.uitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
   %3 = llvm.intr.experimental.constrained.uitofp %s downward ignore : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.uitofp.f32.i32(
-  // CHECK: metadata !"round.tonearestaway"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.uitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
   %4 = llvm.intr.experimental.constrained.uitofp %s tonearestaway ignore : i32 to f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.uitofp.v4f32.v4i32(
-  // CHECK: metadata !"round.upward"
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call <4 x float> @llvm.uitofp.v4f32.v4i32({{.*}}) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
   %5 = llvm.intr.experimental.constrained.uitofp %v upward strict : vector<4 x i32> to vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_sitofp
 llvm.func @experimental_constrained_sitofp(%s: i32, %v: vector<4 x i32>) {
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.sitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.sitofp %s towardzero ignore : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(
-  // CHECK: metadata !"round.tonearest"
-  // CHECK: metadata !"fpexcept.maytrap"
+  // CHECK: call float @llvm.sitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
   %1 = llvm.intr.experimental.constrained.sitofp %s tonearest maytrap : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(
-  // CHECK: metadata !"round.upward"
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call float @llvm.sitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
   %2 = llvm.intr.experimental.constrained.sitofp %s upward strict : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(
-  // CHECK: metadata !"round.downward"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.sitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
   %3 = llvm.intr.experimental.constrained.sitofp %s downward ignore : i32 to f32
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(
-  // CHECK: metadata !"round.tonearestaway"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.sitofp.f32.i32({{.*}}) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
   %4 = llvm.intr.experimental.constrained.sitofp %s tonearestaway ignore : i32 to f32
-  // CHECK: call <4 x float> @llvm.experimental.constrained.sitofp.v4f32.v4i32(
-  // CHECK: metadata !"round.upward"
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call <4 x float> @llvm.sitofp.v4f32.v4i32({{.*}}) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
   %5 = llvm.intr.experimental.constrained.sitofp %v upward strict : vector<4 x i32> to vector<4 x f32>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fptrunc
 llvm.func @experimental_constrained_fptrunc(%s: f64, %v: vector<4xf32>) {
-  // CHECK: call float @llvm.experimental.constrained.fptrunc.f32.f64(
-  // CHECK: metadata !"round.towardzero"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fptrunc.f32.f64({{.*}}) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fptrunc %s towardzero ignore : f64 to f32
-  // CHECK: call float @llvm.experimental.constrained.fptrunc.f32.f64(
-  // CHECK: metadata !"round.tonearest"
-  // CHECK: metadata !"fpexcept.maytrap"
+  // CHECK: call float @llvm.fptrunc.f32.f64({{.*}}) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
   %1 = llvm.intr.experimental.constrained.fptrunc %s tonearest maytrap : f64 to f32
-  // CHECK: call float @llvm.experimental.constrained.fptrunc.f32.f64(
-  // CHECK: metadata !"round.upward"
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call float @llvm.fptrunc.f32.f64({{.*}}) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
   %2 = llvm.intr.experimental.constrained.fptrunc %s upward strict : f64 to f32
-  // CHECK: call float @llvm.experimental.constrained.fptrunc.f32.f64(
-  // CHECK: metadata !"round.downward"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fptrunc.f32.f64({{.*}}) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
   %3 = llvm.intr.experimental.constrained.fptrunc %s downward ignore : f64 to f32
-  // CHECK: call float @llvm.experimental.constrained.fptrunc.f32.f64(
-  // CHECK: metadata !"round.tonearestaway"
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call float @llvm.fptrunc.f32.f64({{.*}}) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
   %4 = llvm.intr.experimental.constrained.fptrunc %s tonearestaway ignore : f64 to f32
-  // CHECK: call <4 x half> @llvm.experimental.constrained.fptrunc.v4f16.v4f32(
-  // CHECK: metadata !"round.upward"
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call <4 x half> @llvm.fptrunc.v4f16.v4f32({{.*}}) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"strict") ]
   %5 = llvm.intr.experimental.constrained.fptrunc %v upward strict : vector<4xf32> to vector<4xf16>
   llvm.return
 }
 
 // CHECK-LABEL: @experimental_constrained_fpext
 llvm.func @experimental_constrained_fpext(%s: f32, %v: vector<4xf32>) {
-  // CHECK: call double @llvm.experimental.constrained.fpext.f64.f32(
-  // CHECK: metadata !"fpexcept.ignore"
+  // CHECK: call double @llvm.fpext.f64.f32({{.*}}) [ "fp.except"(metadata !"ignore") ]
   %0 = llvm.intr.experimental.constrained.fpext %s ignore : f32 to f64
-  // CHECK: call double @llvm.experimental.constrained.fpext.f64.f32(
-  // CHECK: metadata !"fpexcept.maytrap"
+  // CHECK: call double @llvm.fpext.f64.f32({{.*}}) [ "fp.except"(metadata !"maytrap") ]
   %1 = llvm.intr.experimental.constrained.fpext %s maytrap : f32 to f64
-  // CHECK: call double @llvm.experimental.constrained.fpext.f64.f32(
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call double @llvm.fpext.f64.f32({{.*}}) [ "fp.except"(metadata !"strict") ]
   %2 = llvm.intr.experimental.constrained.fpext %s strict : f32 to f64
-  // CHECK: call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(
-  // CHECK: metadata !"fpexcept.strict"
+  // CHECK: call <4 x double> @llvm.fpext.v4f64.v4f32({{.*}}) [ "fp.except"(metadata !"strict") ]
   %5 = llvm.intr.experimental.constrained.fpext %v strict : vector<4xf32> to vector<4xf64>
   llvm.return
 }
@@ -1586,28 +1518,27 @@ llvm.func @vector_scmp(%a: vector<4 x i32>, %b: vector<4 x i32>) -> vector<4 x i
 // CHECK-DAG: declare ptr addrspace(1) @llvm.stacksave.p1()
 // CHECK-DAG: declare void @llvm.stackrestore.p0(ptr)
 // CHECK-DAG: declare void @llvm.stackrestore.p1(ptr addrspace(1))
-// CHECK-DAG: declare float @llvm.experimental.constrained.fadd.f32(float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.fsub.f32(float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.fsub.v4f32(<4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.fmul.f32(float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.fdiv.f32(float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.fdiv.v4f32(<4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.frem.f32(float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.frem.v4f32(<4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float>, <4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.fmuladd.f32(float, float, float, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.fmuladd.v4f32(<4 x float>, <4 x float>, <4 x float>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.uitofp.f32.i32(i32, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.uitofp.v4f32.v4i32(<4 x i32>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.sitofp.f32.i32(i32, metadata, metadata)
-// CHECK-DAG: declare <4 x float> @llvm.experimental.constrained.sitofp.v4f32.v4i32(<4 x i32>, metadata, metadata)
-// CHECK-DAG: declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)
-// CHECK-DAG: declare <4 x half> @llvm.experimental.constrained.fptrunc.v4f16.v4f32(<4 x float>, metadata, metadata)
-// CHECK-DAG: declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)
-// CHECK-DAG: declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata)
+// CHECK-DAG: declare float @llvm.fadd.f32(float, float)
+// CHECK-DAG: declare <4 x float> @llvm.fadd.v4f32(<4 x float>, <4 x float>)
+// CHECK-DAG: declare float @llvm.fsub.f32(float, float)
+// CHECK-DAG: declare <4 x float> @llvm.fsub.v4f32(<4 x float>, <4 x float>)
+// CHECK-DAG: declare float @llvm.fmul.f32(float, float)
+// CHECK-DAG: declare <4 x float> @llvm.fmul.v4f32(<4 x float>, <4 x float>)
+// CHECK-DAG: declare float @llvm.fdiv.f32(float, float)
+// CHECK-DAG: declare <4 x float> @llvm.fdiv.v4f32(<4 x float>, <4 x float>)
+// CHECK-DAG: declare float @llvm.frem.f32(float, float)
+// CHECK-DAG: declare <4 x float> @llvm.frem.v4f32(<4 x float>, <4 x float>)
+// fma.f32 and fmuladd.f32 (scalar) are already covered by the regular intrinsic checks above.
+// CHECK-DAG: declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)
+// CHECK-DAG: declare <4 x float> @llvm.fmuladd.v4f32(<4 x float>, <4 x float>, <4 x float>)
+// CHECK-DAG: declare float @llvm.uitofp.f32.i32(i32)
+// CHECK-DAG: declare <4 x float> @llvm.uitofp.v4f32.v4i32(<4 x i32>)
+// CHECK-DAG: declare float @llvm.sitofp.f32.i32(i32)
+// CHECK-DAG: declare <4 x float> @llvm.sitofp.v4f32.v4i32(<4 x i32>)
+// CHECK-DAG: declare float @llvm.fptrunc.f32.f64(double)
+// CHECK-DAG: declare <4 x half> @llvm.fptrunc.v4f16.v4f32(<4 x float>)
+// CHECK-DAG: declare double @llvm.fpext.f64.f32(float)
+// CHECK-DAG: declare <4 x double> @llvm.fpext.v4f64.v4f32(<4 x float>)
 // CHECK-DAG: declare range(i2 -1, -2) i2 @llvm.ucmp.i2.i32(i32, i32)
 // CHECK-DAG: declare range(i32 -1, 2) <4 x i32> @llvm.ucmp.v4i32.v4i32(<4 x i32>, <4 x i32>)
 // CHECK-DAG: declare range(i2 -1, -2) i2 @llvm.scmp.i2.i32(i32, i32)

>From b254eaf83c0d1b680e8fc6e72a0c62d02ccaf4f2 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Sat, 11 Apr 2026 20:12:57 -0700
Subject: [PATCH 02/12] [Clang][NFC] Rename emitXxxMaybeConstrainedFPBuiltin
 helpers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that fp.control/fp.except operand bundles replace experimental_constrained_*
intrinsics, the "MaybeConstrained" naming is obsolete and the ConstrainedIntrinsicID
parameter (always passed as 0) is dead code. Rename and clean up:

- emitUnaryMaybeConstrainedFPBuiltin  → emitUnaryFPBuiltin
- emitBinaryMaybeConstrainedFPBuiltin → emitBinaryFPBuiltin
- emitBinaryExpMaybeConstrainedFPBuiltin → emitBinaryExpFPBuiltin (now exported)
- emitTernaryMaybeConstrainedFPBuiltin → emitTernaryFPBuiltin
- emitMaybeConstrainedFPToIntRoundBuiltin → emitFPToIntRoundBuiltin
- emitCallMaybeConstrainedFPBuiltin (ARM-local) → emitFPBuiltin

Remove AMDGPU's local duplicate of emitBinaryExpFPBuiltin and use the
shared declaration from CGBuiltin.h instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/lib/CodeGen/CGBuiltin.cpp             | 179 ++++++++------------
 clang/lib/CodeGen/CGBuiltin.h               |  11 +-
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp |  18 +-
 clang/lib/CodeGen/TargetBuiltins/ARM.cpp    |  17 +-
 clang/lib/CodeGen/TargetBuiltins/PPC.cpp    |  30 ++--
 5 files changed, 104 insertions(+), 151 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d06925c6fc656..f101ecce1b767 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -590,9 +590,8 @@ static Value *EmitISOVolatileStore(CodeGenFunction &CGF, const CallExpr *E) {
 // Emit a simple mangled intrinsic that has 1 argument and a return type
 // matching the argument type. When in constrained FP mode, CreateCall
 // automatically injects fp.control/fp.except bundles for non-default settings.
-Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
-                                const CallExpr *E, unsigned IntrinsicID,
-                                unsigned /*ConstrainedIntrinsicID*/) {
+Value *emitUnaryFPBuiltin(CodeGenFunction &CGF,
+                                const CallExpr *E, unsigned IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
@@ -602,9 +601,8 @@ Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
 // Emit an intrinsic that has 2 operands of the same type as its result.
 // When in constrained FP mode, CreateCall automatically injects fp.control/
 // fp.except bundles for non-default settings.
-static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
-                                const CallExpr *E, unsigned IntrinsicID,
-                                unsigned /*ConstrainedIntrinsicID*/) {
+static Value *emitBinaryFPBuiltin(CodeGenFunction &CGF,
+                                const CallExpr *E, unsigned IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
@@ -613,10 +611,9 @@ static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
 }
 
 // Has second type mangled argument.
-static Value *
-emitBinaryExpMaybeConstrainedFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
-                                       Intrinsic::ID IntrinsicID,
-                                       Intrinsic::ID /*ConstrainedIntrinsicID*/) {
+Value *
+emitBinaryExpFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                       Intrinsic::ID IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
@@ -628,9 +625,8 @@ emitBinaryExpMaybeConstrainedFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
 // Emit an intrinsic that has 3 operands of the same type as its result.
 // When in constrained FP mode, CreateCall automatically injects fp.control/
 // fp.except bundles for non-default settings.
-static Value *emitTernaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
-                                 const CallExpr *E, unsigned IntrinsicID,
-                                 unsigned /*ConstrainedIntrinsicID*/) {
+static Value *emitTernaryFPBuiltin(CodeGenFunction &CGF,
+                                 const CallExpr *E, unsigned IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
@@ -643,9 +639,8 @@ static Value *emitTernaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
 // When in constrained FP mode, CreateCall automatically injects fp.control/
 // fp.except bundles for non-default settings.
 static Value *
-emitMaybeConstrainedFPToIntRoundBuiltin(CodeGenFunction &CGF, const CallExpr *E,
-                                        unsigned IntrinsicID,
-                                        unsigned /*ConstrainedIntrinsicID*/) {
+emitFPToIntRoundBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                        unsigned IntrinsicID) {
   llvm::Type *ResultType = CGF.ConvertType(E->getType());
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
@@ -2680,8 +2675,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_acosl:
     case Builtin::BI__builtin_acosf128:
     case Builtin::BI__builtin_elementwise_acos:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::acos, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::acos));
 
     case Builtin::BIasin:
     case Builtin::BIasinf:
@@ -2692,8 +2687,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_asinl:
     case Builtin::BI__builtin_asinf128:
     case Builtin::BI__builtin_elementwise_asin:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::asin, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::asin));
 
     case Builtin::BIatan:
     case Builtin::BIatanf:
@@ -2704,8 +2699,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_atanl:
     case Builtin::BI__builtin_atanf128:
     case Builtin::BI__builtin_elementwise_atan:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::atan, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::atan));
 
     case Builtin::BIatan2:
     case Builtin::BIatan2f:
@@ -2716,9 +2711,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_atan2l:
     case Builtin::BI__builtin_atan2f128:
     case Builtin::BI__builtin_elementwise_atan2:
-      return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::atan2,
-          0));
+      return RValue::get(emitBinaryFPBuiltin(
+          *this, E, Intrinsic::atan2));
 
     case Builtin::BIceil:
     case Builtin::BIceilf:
@@ -2729,9 +2723,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_ceill:
     case Builtin::BI__builtin_ceilf128:
     case Builtin::BI__builtin_elementwise_ceil:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::ceil,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::ceil));
 
     case Builtin::BIcopysign:
     case Builtin::BIcopysignf:
@@ -2753,9 +2746,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_cosl:
     case Builtin::BI__builtin_cosf128:
     case Builtin::BI__builtin_elementwise_cos:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::cos,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::cos));
 
     case Builtin::BIcosh:
     case Builtin::BIcoshf:
@@ -2766,8 +2758,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_coshl:
     case Builtin::BI__builtin_coshf128:
     case Builtin::BI__builtin_elementwise_cosh:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::cosh, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::cosh));
 
     case Builtin::BIexp:
     case Builtin::BIexpf:
@@ -2778,9 +2770,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_expl:
     case Builtin::BI__builtin_expf128:
     case Builtin::BI__builtin_elementwise_exp:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::exp,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::exp));
 
     case Builtin::BIexp2:
     case Builtin::BIexp2f:
@@ -2791,9 +2782,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_exp2l:
     case Builtin::BI__builtin_exp2f128:
     case Builtin::BI__builtin_elementwise_exp2:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::exp2,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::exp2));
     case Builtin::BI__builtin_exp10:
     case Builtin::BI__builtin_exp10f:
     case Builtin::BI__builtin_exp10f16:
@@ -2826,9 +2816,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_floorl:
     case Builtin::BI__builtin_floorf128:
     case Builtin::BI__builtin_elementwise_floor:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::floor,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::floor));
 
     case Builtin::BIfma:
     case Builtin::BIfmaf:
@@ -2839,9 +2828,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_fmal:
     case Builtin::BI__builtin_fmaf128:
     case Builtin::BI__builtin_elementwise_fma:
-      return RValue::get(emitTernaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::fma,
-                                   0));
+      return RValue::get(emitTernaryFPBuiltin(*this, E,
+                                   Intrinsic::fma));
 
     case Builtin::BIfmax:
     case Builtin::BIfmaxf:
@@ -2853,9 +2841,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_fmaxf128: {
       IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);
       Builder.getFastMathFlags().setNoSignedZeros();
-      return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::maxnum,
-          0));
+      return RValue::get(emitBinaryFPBuiltin(
+          *this, E, Intrinsic::maxnum));
     }
 
     case Builtin::BIfmin:
@@ -2868,9 +2855,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_fminf128: {
       IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);
       Builder.getFastMathFlags().setNoSignedZeros();
-      return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::minnum,
-          0));
+      return RValue::get(emitBinaryFPBuiltin(
+          *this, E, Intrinsic::minnum));
     }
 
     case Builtin::BIfmaximum_num:
@@ -2921,9 +2907,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_logl:
     case Builtin::BI__builtin_logf128:
     case Builtin::BI__builtin_elementwise_log:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::log,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::log));
 
     case Builtin::BIlog10:
     case Builtin::BIlog10f:
@@ -2934,9 +2919,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_log10l:
     case Builtin::BI__builtin_log10f128:
     case Builtin::BI__builtin_elementwise_log10:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::log10,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::log10));
 
     case Builtin::BIlog2:
     case Builtin::BIlog2f:
@@ -2947,9 +2931,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_log2l:
     case Builtin::BI__builtin_log2f128:
     case Builtin::BI__builtin_elementwise_log2:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::log2,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::log2));
 
     case Builtin::BInearbyint:
     case Builtin::BInearbyintf:
@@ -2959,9 +2942,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_nearbyintl:
     case Builtin::BI__builtin_nearbyintf128:
     case Builtin::BI__builtin_elementwise_nearbyint:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                Intrinsic::nearbyint,
-                                0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                Intrinsic::nearbyint));
 
     case Builtin::BIpow:
     case Builtin::BIpowf:
@@ -2972,9 +2954,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_powl:
     case Builtin::BI__builtin_powf128:
     case Builtin::BI__builtin_elementwise_pow:
-      return RValue::get(emitBinaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::pow,
-                                   0));
+      return RValue::get(emitBinaryFPBuiltin(*this, E,
+                                   Intrinsic::pow));
 
     case Builtin::BIrint:
     case Builtin::BIrintf:
@@ -2985,9 +2966,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_rintl:
     case Builtin::BI__builtin_rintf128:
     case Builtin::BI__builtin_elementwise_rint:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::rint,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::rint));
 
     case Builtin::BIround:
     case Builtin::BIroundf:
@@ -2998,9 +2978,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_roundl:
     case Builtin::BI__builtin_roundf128:
     case Builtin::BI__builtin_elementwise_round:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::round,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::round));
 
     case Builtin::BIroundeven:
     case Builtin::BIroundevenf:
@@ -3011,9 +2990,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_roundevenl:
     case Builtin::BI__builtin_roundevenf128:
     case Builtin::BI__builtin_elementwise_roundeven:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::roundeven,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::roundeven));
 
     case Builtin::BIsin:
     case Builtin::BIsinf:
@@ -3024,9 +3002,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sinl:
     case Builtin::BI__builtin_sinf128:
     case Builtin::BI__builtin_elementwise_sin:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::sin,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::sin));
 
     case Builtin::BIsinh:
     case Builtin::BIsinhf:
@@ -3037,8 +3014,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sinhl:
     case Builtin::BI__builtin_sinhf128:
     case Builtin::BI__builtin_elementwise_sinh:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::sinh, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::sinh));
 
     case Builtin::BI__builtin_sincospi:
     case Builtin::BI__builtin_sincospif:
@@ -3070,8 +3047,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sqrtl:
     case Builtin::BI__builtin_sqrtf128:
     case Builtin::BI__builtin_elementwise_sqrt: {
-      llvm::Value *Call = emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::sqrt, 0);
+      llvm::Value *Call = emitUnaryFPBuiltin(
+          *this, E, Intrinsic::sqrt);
       SetSqrtFPAccuracy(Call);
       return RValue::get(Call);
     }
@@ -3085,8 +3062,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_tanl:
     case Builtin::BI__builtin_tanf128:
     case Builtin::BI__builtin_elementwise_tan:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::tan, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::tan));
 
     case Builtin::BItanh:
     case Builtin::BItanhf:
@@ -3097,8 +3074,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_tanhl:
     case Builtin::BI__builtin_tanhf128:
     case Builtin::BI__builtin_elementwise_tanh:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::tanh, 0));
+      return RValue::get(emitUnaryFPBuiltin(
+          *this, E, Intrinsic::tanh));
 
     case Builtin::BItrunc:
     case Builtin::BItruncf:
@@ -3109,9 +3086,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_truncl:
     case Builtin::BI__builtin_truncf128:
     case Builtin::BI__builtin_elementwise_trunc:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::trunc,
-                                   0));
+      return RValue::get(emitUnaryFPBuiltin(*this, E,
+                                   Intrinsic::trunc));
 
     case Builtin::BIlround:
     case Builtin::BIlroundf:
@@ -3120,9 +3096,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_lroundf:
     case Builtin::BI__builtin_lroundl:
     case Builtin::BI__builtin_lroundf128:
-      return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
-          *this, E, Intrinsic::lround,
-          0));
+      return RValue::get(emitFPToIntRoundBuiltin(
+          *this, E, Intrinsic::lround));
 
     case Builtin::BIllround:
     case Builtin::BIllroundf:
@@ -3131,9 +3106,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_llroundf:
     case Builtin::BI__builtin_llroundl:
     case Builtin::BI__builtin_llroundf128:
-      return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
-          *this, E, Intrinsic::llround,
-          0));
+      return RValue::get(emitFPToIntRoundBuiltin(
+          *this, E, Intrinsic::llround));
 
     case Builtin::BIlrint:
     case Builtin::BIlrintf:
@@ -3142,9 +3116,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_lrintf:
     case Builtin::BI__builtin_lrintl:
     case Builtin::BI__builtin_lrintf128:
-      return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
-          *this, E, Intrinsic::lrint,
-          0));
+      return RValue::get(emitFPToIntRoundBuiltin(
+          *this, E, Intrinsic::lrint));
 
     case Builtin::BIllrint:
     case Builtin::BIllrintf:
@@ -3153,18 +3126,16 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_llrintf:
     case Builtin::BI__builtin_llrintl:
     case Builtin::BI__builtin_llrintf128:
-      return RValue::get(emitMaybeConstrainedFPToIntRoundBuiltin(
-          *this, E, Intrinsic::llrint,
-          0));
+      return RValue::get(emitFPToIntRoundBuiltin(
+          *this, E, Intrinsic::llrint));
     case Builtin::BI__builtin_ldexp:
     case Builtin::BI__builtin_ldexpf:
     case Builtin::BI__builtin_ldexpl:
     case Builtin::BI__builtin_ldexpf16:
     case Builtin::BI__builtin_ldexpf128:
     case Builtin::BI__builtin_elementwise_ldexp:
-      return RValue::get(emitBinaryExpMaybeConstrainedFPBuiltin(
-          *this, E, Intrinsic::ldexp,
-          0));
+      return RValue::get(emitBinaryExpFPBuiltin(
+          *this, E, Intrinsic::ldexp));
     default:
       break;
     }
diff --git a/clang/lib/CodeGen/CGBuiltin.h b/clang/lib/CodeGen/CGBuiltin.h
index df71e46629884..5c95226a9120f 100644
--- a/clang/lib/CodeGen/CGBuiltin.h
+++ b/clang/lib/CodeGen/CGBuiltin.h
@@ -72,10 +72,13 @@ llvm::Value *emitBuiltinWithOneOverloadedType(clang::CodeGen::CodeGenFunction &C
   return CGF.Builder.CreateCall(F, Args, Name);
 }
 
-llvm::Value *emitUnaryMaybeConstrainedFPBuiltin(clang::CodeGen::CodeGenFunction &CGF,
-                                                const clang::CallExpr *E,
-                                                unsigned IntrinsicID,
-                                                unsigned ConstrainedIntrinsicID);
+llvm::Value *emitUnaryFPBuiltin(clang::CodeGen::CodeGenFunction &CGF,
+                                const clang::CallExpr *E,
+                                unsigned IntrinsicID);
+
+llvm::Value *emitBinaryExpFPBuiltin(clang::CodeGen::CodeGenFunction &CGF,
+                                    const clang::CallExpr *E,
+                                    llvm::Intrinsic::ID IntrinsicID);
 
 llvm::Value *EmitToInt(clang::CodeGen::CodeGenFunction &CGF, llvm::Value *V,
                        clang::QualType T, llvm::IntegerType *IntType);
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 853ecc7cfe75c..4f4bcb5e796c7 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -31,20 +31,6 @@ using namespace llvm;
 
 namespace {
 
-// Has second type mangled argument.
-static Value *
-emitBinaryExpMaybeConstrainedFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
-                                       Intrinsic::ID IntrinsicID,
-                                       Intrinsic::ID ConstrainedIntrinsicID) {
-  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
-  llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
-
-  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
-  Function *F =
-      CGF.CGM.getIntrinsic(IntrinsicID, {Src0->getType(), Src1->getType()});
-  return CGF.Builder.CreateCall(F, {Src0, Src1});
-}
-
 // If \p E is not null pointer, insert address space cast to match return
 // type of \p E if necessary.
 Value *EmitAMDGPUDispatchPtr(CodeGenFunction &CGF,
@@ -2194,8 +2180,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   case Builtin::BI__builtin_scalbnf:
   case Builtin::BIscalbn:
   case Builtin::BI__builtin_scalbn:
-    return emitBinaryExpMaybeConstrainedFPBuiltin(
-        *this, E, Intrinsic::ldexp, 0);
+    return emitBinaryExpFPBuiltin(
+        *this, E, Intrinsic::ldexp);
   default:
     return nullptr;
   }
diff --git a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
index 0259b3c8e54da..899e59e0914b6 100644
--- a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
@@ -340,8 +340,7 @@ translateArmToMsvcIntrin(unsigned BuiltinID) {
 }
 
 // Emit an intrinsic where all operands are of the same type as the result.
-// Depending on mode, this may be a constrained floating-point intrinsic.
-static Value *emitCallMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
+static Value *emitFPBuiltin(CodeGenFunction &CGF,
                                                 unsigned IntrinsicID,
                                                 llvm::Type *Ty,
                                                 ArrayRef<Value *> Args) {
@@ -1447,7 +1446,7 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
     Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
 
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
-    return emitCallMaybeConstrainedFPBuiltin(
+    return emitFPBuiltin(
         *this, Intrinsic::fma, Ty,
         {Ops[1], Ops[2], Ops[0]});
   }
@@ -5797,14 +5796,14 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateFDiv(Ops[0], Ops[1], "vdivh");
   case NEON::BI__builtin_neon_vfmah_f16:
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
-    return emitCallMaybeConstrainedFPBuiltin(
+    return emitFPBuiltin(
         *this, Intrinsic::fma, HalfTy,
         {Ops[1], Ops[2], Ops[0]});
   case NEON::BI__builtin_neon_vfmsh_f16: {
     Value *Neg = Builder.CreateFNeg(Ops[1], "vsubh");
 
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
-    return emitCallMaybeConstrainedFPBuiltin(
+    return emitFPBuiltin(
         *this, Intrinsic::fma, HalfTy,
         {Neg, Ops[2], Ops[0]});
   }
@@ -6100,7 +6099,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
       Ops[2] = Builder.CreateBitCast(Ops[2], VTy);
       Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
       Value *Result;
-      Result = emitCallMaybeConstrainedFPBuiltin(
+      Result = emitFPBuiltin(
           *this, Intrinsic::fma,
           DoubleTy, {Ops[1], Ops[2], Ops[0]});
       return Builder.CreateBitCast(Result, Ty);
@@ -6115,7 +6114,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
                                                cast<ConstantInt>(Ops[3]));
     Ops[2] = Builder.CreateShuffleVector(Ops[2], Ops[2], SV, "lane");
 
-    return emitCallMaybeConstrainedFPBuiltin(
+    return emitFPBuiltin(
         *this, Intrinsic::fma, Ty,
         {Ops[2], Ops[1], Ops[0]});
   }
@@ -6125,7 +6124,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
 
     Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
     Ops[2] = EmitNeonSplat(Ops[2], cast<ConstantInt>(Ops[3]));
-    return emitCallMaybeConstrainedFPBuiltin(
+    return emitFPBuiltin(
         *this, Intrinsic::fma, Ty,
         {Ops[2], Ops[1], Ops[0]});
   }
@@ -6137,7 +6136,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
   case NEON::BI__builtin_neon_vfmad_laneq_f64: {
     llvm::Type *Ty = ConvertType(E->getCallReturnType(getContext()));
     Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
-    return emitCallMaybeConstrainedFPBuiltin(
+    return emitFPBuiltin(
         *this, Intrinsic::fma, Ty,
         {Ops[1], Ops[2], Ops[0]});
   }
diff --git a/clang/lib/CodeGen/TargetBuiltins/PPC.cpp b/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
index 5e0bc06cbb398..fd3168dd003c5 100644
--- a/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
@@ -1230,39 +1230,33 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
     return FDiv;
   }
   case PPC::BI__builtin_ppc_fric:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::rint,
-                           0))
+    return RValue::get(emitUnaryFPBuiltin(
+                           *this, E, Intrinsic::rint))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frim:
   case PPC::BI__builtin_ppc_frims:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::floor,
-                           0))
+    return RValue::get(emitUnaryFPBuiltin(
+                           *this, E, Intrinsic::floor))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frin:
   case PPC::BI__builtin_ppc_frins:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::round,
-                           0))
+    return RValue::get(emitUnaryFPBuiltin(
+                           *this, E, Intrinsic::round))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frip:
   case PPC::BI__builtin_ppc_frips:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::ceil,
-                           0))
+    return RValue::get(emitUnaryFPBuiltin(
+                           *this, E, Intrinsic::ceil))
         .getScalarVal();
   case PPC::BI__builtin_ppc_friz:
   case PPC::BI__builtin_ppc_frizs:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::trunc,
-                           0))
+    return RValue::get(emitUnaryFPBuiltin(
+                           *this, E, Intrinsic::trunc))
         .getScalarVal();
   case PPC::BI__builtin_ppc_fsqrt:
   case PPC::BI__builtin_ppc_fsqrts:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::sqrt,
-                           0))
+    return RValue::get(emitUnaryFPBuiltin(
+                           *this, E, Intrinsic::sqrt))
         .getScalarVal();
   case PPC::BI__builtin_ppc_test_data_class: {
     Value *Op0 = EmitScalarExpr(E->getArg(0));

>From ac4984b17655793f2ba187c942248a9b3abaea74 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Sun, 12 Apr 2026 13:47:02 -0700
Subject: [PATCH 03/12] Address review comments: clang-format and code nits

- Apply clang-format to renamed helpers in CGBuiltin.cpp/h and ARM/AMDGPU/PPC
- Fix FloatingPointOps.def header line to standard LLVM style
- Use explicit type for Mode in ConstantFolding.cpp (avoid auto deduction)
- Remove redundant parens around IsSignaling bool init
- Simplify IRTranslator.cpp bundled-FP block: remove outer scope braces,
  fold nested ifs into C++17 init-if, use SmallVector<SrcOp> (no size)
- EarlyCSE.cpp: reword comment to drop "constrained predecessors" reference,
  use ASCII arrows, remove [[maybe_unused]] on CI variable

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/lib/CodeGen/CGBuiltin.cpp              | 122 +++++++------------
 clang/lib/CodeGen/CGBuiltin.h                |   3 +-
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  |   3 +-
 clang/lib/CodeGen/TargetBuiltins/ARM.cpp     |  36 ++----
 clang/lib/CodeGen/TargetBuiltins/PPC.cpp     |  18 +--
 llvm/include/llvm/IR/FloatingPointOps.def    |   2 +-
 llvm/lib/Analysis/ConstantFolding.cpp        |   5 +-
 llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp |  31 +++--
 llvm/lib/Transforms/Scalar/EarlyCSE.cpp      |  13 +-
 9 files changed, 87 insertions(+), 146 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f101ecce1b767..27b1b9a9e551c 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -590,8 +590,8 @@ static Value *EmitISOVolatileStore(CodeGenFunction &CGF, const CallExpr *E) {
 // Emit a simple mangled intrinsic that has 1 argument and a return type
 // matching the argument type. When in constrained FP mode, CreateCall
 // automatically injects fp.control/fp.except bundles for non-default settings.
-Value *emitUnaryFPBuiltin(CodeGenFunction &CGF,
-                                const CallExpr *E, unsigned IntrinsicID) {
+Value *emitUnaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                          unsigned IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
@@ -601,19 +601,18 @@ Value *emitUnaryFPBuiltin(CodeGenFunction &CGF,
 // Emit an intrinsic that has 2 operands of the same type as its result.
 // When in constrained FP mode, CreateCall automatically injects fp.control/
 // fp.except bundles for non-default settings.
-static Value *emitBinaryFPBuiltin(CodeGenFunction &CGF,
-                                const CallExpr *E, unsigned IntrinsicID) {
+static Value *emitBinaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                  unsigned IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, { Src0, Src1 });
+  return CGF.Builder.CreateCall(F, {Src0, Src1});
 }
 
 // Has second type mangled argument.
-Value *
-emitBinaryExpFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
-                                       Intrinsic::ID IntrinsicID) {
+Value *emitBinaryExpFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                              Intrinsic::ID IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
@@ -625,22 +624,21 @@ emitBinaryExpFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
 // Emit an intrinsic that has 3 operands of the same type as its result.
 // When in constrained FP mode, CreateCall automatically injects fp.control/
 // fp.except bundles for non-default settings.
-static Value *emitTernaryFPBuiltin(CodeGenFunction &CGF,
-                                 const CallExpr *E, unsigned IntrinsicID) {
+static Value *emitTernaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                   unsigned IntrinsicID) {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
   llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
   Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, { Src0, Src1, Src2 });
+  return CGF.Builder.CreateCall(F, {Src0, Src1, Src2});
 }
 
 // Emit an intrinsic that has overloaded integer result and fp operand.
 // When in constrained FP mode, CreateCall automatically injects fp.control/
 // fp.except bundles for non-default settings.
-static Value *
-emitFPToIntRoundBuiltin(CodeGenFunction &CGF, const CallExpr *E,
-                                        unsigned IntrinsicID) {
+static Value *emitFPToIntRoundBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                      unsigned IntrinsicID) {
   llvm::Type *ResultType = CGF.ConvertType(E->getType());
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
@@ -2675,8 +2673,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_acosl:
     case Builtin::BI__builtin_acosf128:
     case Builtin::BI__builtin_elementwise_acos:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::acos));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::acos));
 
     case Builtin::BIasin:
     case Builtin::BIasinf:
@@ -2687,8 +2684,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_asinl:
     case Builtin::BI__builtin_asinf128:
     case Builtin::BI__builtin_elementwise_asin:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::asin));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::asin));
 
     case Builtin::BIatan:
     case Builtin::BIatanf:
@@ -2699,8 +2695,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_atanl:
     case Builtin::BI__builtin_atanf128:
     case Builtin::BI__builtin_elementwise_atan:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::atan));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::atan));
 
     case Builtin::BIatan2:
     case Builtin::BIatan2f:
@@ -2711,8 +2706,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_atan2l:
     case Builtin::BI__builtin_atan2f128:
     case Builtin::BI__builtin_elementwise_atan2:
-      return RValue::get(emitBinaryFPBuiltin(
-          *this, E, Intrinsic::atan2));
+      return RValue::get(emitBinaryFPBuiltin(*this, E, Intrinsic::atan2));
 
     case Builtin::BIceil:
     case Builtin::BIceilf:
@@ -2723,8 +2717,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_ceill:
     case Builtin::BI__builtin_ceilf128:
     case Builtin::BI__builtin_elementwise_ceil:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::ceil));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::ceil));
 
     case Builtin::BIcopysign:
     case Builtin::BIcopysignf:
@@ -2746,8 +2739,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_cosl:
     case Builtin::BI__builtin_cosf128:
     case Builtin::BI__builtin_elementwise_cos:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::cos));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::cos));
 
     case Builtin::BIcosh:
     case Builtin::BIcoshf:
@@ -2758,8 +2750,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_coshl:
     case Builtin::BI__builtin_coshf128:
     case Builtin::BI__builtin_elementwise_cosh:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::cosh));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::cosh));
 
     case Builtin::BIexp:
     case Builtin::BIexpf:
@@ -2770,8 +2761,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_expl:
     case Builtin::BI__builtin_expf128:
     case Builtin::BI__builtin_elementwise_exp:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::exp));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::exp));
 
     case Builtin::BIexp2:
     case Builtin::BIexp2f:
@@ -2782,8 +2772,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_exp2l:
     case Builtin::BI__builtin_exp2f128:
     case Builtin::BI__builtin_elementwise_exp2:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::exp2));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::exp2));
     case Builtin::BI__builtin_exp10:
     case Builtin::BI__builtin_exp10f:
     case Builtin::BI__builtin_exp10f16:
@@ -2816,8 +2805,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_floorl:
     case Builtin::BI__builtin_floorf128:
     case Builtin::BI__builtin_elementwise_floor:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::floor));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::floor));
 
     case Builtin::BIfma:
     case Builtin::BIfmaf:
@@ -2828,8 +2816,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_fmal:
     case Builtin::BI__builtin_fmaf128:
     case Builtin::BI__builtin_elementwise_fma:
-      return RValue::get(emitTernaryFPBuiltin(*this, E,
-                                   Intrinsic::fma));
+      return RValue::get(emitTernaryFPBuiltin(*this, E, Intrinsic::fma));
 
     case Builtin::BIfmax:
     case Builtin::BIfmaxf:
@@ -2841,8 +2828,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_fmaxf128: {
       IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);
       Builder.getFastMathFlags().setNoSignedZeros();
-      return RValue::get(emitBinaryFPBuiltin(
-          *this, E, Intrinsic::maxnum));
+      return RValue::get(emitBinaryFPBuiltin(*this, E, Intrinsic::maxnum));
     }
 
     case Builtin::BIfmin:
@@ -2855,8 +2841,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_fminf128: {
       IRBuilder<>::FastMathFlagGuard FMFGuard(Builder);
       Builder.getFastMathFlags().setNoSignedZeros();
-      return RValue::get(emitBinaryFPBuiltin(
-          *this, E, Intrinsic::minnum));
+      return RValue::get(emitBinaryFPBuiltin(*this, E, Intrinsic::minnum));
     }
 
     case Builtin::BIfmaximum_num:
@@ -2907,8 +2892,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_logl:
     case Builtin::BI__builtin_logf128:
     case Builtin::BI__builtin_elementwise_log:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::log));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::log));
 
     case Builtin::BIlog10:
     case Builtin::BIlog10f:
@@ -2919,8 +2903,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_log10l:
     case Builtin::BI__builtin_log10f128:
     case Builtin::BI__builtin_elementwise_log10:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::log10));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::log10));
 
     case Builtin::BIlog2:
     case Builtin::BIlog2f:
@@ -2931,8 +2914,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_log2l:
     case Builtin::BI__builtin_log2f128:
     case Builtin::BI__builtin_elementwise_log2:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::log2));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::log2));
 
     case Builtin::BInearbyint:
     case Builtin::BInearbyintf:
@@ -2942,8 +2924,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_nearbyintl:
     case Builtin::BI__builtin_nearbyintf128:
     case Builtin::BI__builtin_elementwise_nearbyint:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                Intrinsic::nearbyint));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::nearbyint));
 
     case Builtin::BIpow:
     case Builtin::BIpowf:
@@ -2954,8 +2935,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_powl:
     case Builtin::BI__builtin_powf128:
     case Builtin::BI__builtin_elementwise_pow:
-      return RValue::get(emitBinaryFPBuiltin(*this, E,
-                                   Intrinsic::pow));
+      return RValue::get(emitBinaryFPBuiltin(*this, E, Intrinsic::pow));
 
     case Builtin::BIrint:
     case Builtin::BIrintf:
@@ -2966,8 +2946,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_rintl:
     case Builtin::BI__builtin_rintf128:
     case Builtin::BI__builtin_elementwise_rint:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::rint));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::rint));
 
     case Builtin::BIround:
     case Builtin::BIroundf:
@@ -2978,8 +2957,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_roundl:
     case Builtin::BI__builtin_roundf128:
     case Builtin::BI__builtin_elementwise_round:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::round));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::round));
 
     case Builtin::BIroundeven:
     case Builtin::BIroundevenf:
@@ -2990,8 +2968,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_roundevenl:
     case Builtin::BI__builtin_roundevenf128:
     case Builtin::BI__builtin_elementwise_roundeven:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::roundeven));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::roundeven));
 
     case Builtin::BIsin:
     case Builtin::BIsinf:
@@ -3002,8 +2979,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sinl:
     case Builtin::BI__builtin_sinf128:
     case Builtin::BI__builtin_elementwise_sin:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::sin));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::sin));
 
     case Builtin::BIsinh:
     case Builtin::BIsinhf:
@@ -3014,8 +2990,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sinhl:
     case Builtin::BI__builtin_sinhf128:
     case Builtin::BI__builtin_elementwise_sinh:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::sinh));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::sinh));
 
     case Builtin::BI__builtin_sincospi:
     case Builtin::BI__builtin_sincospif:
@@ -3047,8 +3022,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_sqrtl:
     case Builtin::BI__builtin_sqrtf128:
     case Builtin::BI__builtin_elementwise_sqrt: {
-      llvm::Value *Call = emitUnaryFPBuiltin(
-          *this, E, Intrinsic::sqrt);
+      llvm::Value *Call = emitUnaryFPBuiltin(*this, E, Intrinsic::sqrt);
       SetSqrtFPAccuracy(Call);
       return RValue::get(Call);
     }
@@ -3062,8 +3036,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_tanl:
     case Builtin::BI__builtin_tanf128:
     case Builtin::BI__builtin_elementwise_tan:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::tan));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::tan));
 
     case Builtin::BItanh:
     case Builtin::BItanhf:
@@ -3074,8 +3047,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_tanhl:
     case Builtin::BI__builtin_tanhf128:
     case Builtin::BI__builtin_elementwise_tanh:
-      return RValue::get(emitUnaryFPBuiltin(
-          *this, E, Intrinsic::tanh));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::tanh));
 
     case Builtin::BItrunc:
     case Builtin::BItruncf:
@@ -3086,8 +3058,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_truncl:
     case Builtin::BI__builtin_truncf128:
     case Builtin::BI__builtin_elementwise_trunc:
-      return RValue::get(emitUnaryFPBuiltin(*this, E,
-                                   Intrinsic::trunc));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc));
 
     case Builtin::BIlround:
     case Builtin::BIlroundf:
@@ -3096,8 +3067,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_lroundf:
     case Builtin::BI__builtin_lroundl:
     case Builtin::BI__builtin_lroundf128:
-      return RValue::get(emitFPToIntRoundBuiltin(
-          *this, E, Intrinsic::lround));
+      return RValue::get(emitFPToIntRoundBuiltin(*this, E, Intrinsic::lround));
 
     case Builtin::BIllround:
     case Builtin::BIllroundf:
@@ -3106,8 +3076,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_llroundf:
     case Builtin::BI__builtin_llroundl:
     case Builtin::BI__builtin_llroundf128:
-      return RValue::get(emitFPToIntRoundBuiltin(
-          *this, E, Intrinsic::llround));
+      return RValue::get(emitFPToIntRoundBuiltin(*this, E, Intrinsic::llround));
 
     case Builtin::BIlrint:
     case Builtin::BIlrintf:
@@ -3116,8 +3085,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_lrintf:
     case Builtin::BI__builtin_lrintl:
     case Builtin::BI__builtin_lrintf128:
-      return RValue::get(emitFPToIntRoundBuiltin(
-          *this, E, Intrinsic::lrint));
+      return RValue::get(emitFPToIntRoundBuiltin(*this, E, Intrinsic::lrint));
 
     case Builtin::BIllrint:
     case Builtin::BIllrintf:
@@ -3126,16 +3094,14 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_llrintf:
     case Builtin::BI__builtin_llrintl:
     case Builtin::BI__builtin_llrintf128:
-      return RValue::get(emitFPToIntRoundBuiltin(
-          *this, E, Intrinsic::llrint));
+      return RValue::get(emitFPToIntRoundBuiltin(*this, E, Intrinsic::llrint));
     case Builtin::BI__builtin_ldexp:
     case Builtin::BI__builtin_ldexpf:
     case Builtin::BI__builtin_ldexpl:
     case Builtin::BI__builtin_ldexpf16:
     case Builtin::BI__builtin_ldexpf128:
     case Builtin::BI__builtin_elementwise_ldexp:
-      return RValue::get(emitBinaryExpFPBuiltin(
-          *this, E, Intrinsic::ldexp));
+      return RValue::get(emitBinaryExpFPBuiltin(*this, E, Intrinsic::ldexp));
     default:
       break;
     }
diff --git a/clang/lib/CodeGen/CGBuiltin.h b/clang/lib/CodeGen/CGBuiltin.h
index 5c95226a9120f..b83e52fed21e1 100644
--- a/clang/lib/CodeGen/CGBuiltin.h
+++ b/clang/lib/CodeGen/CGBuiltin.h
@@ -73,8 +73,7 @@ llvm::Value *emitBuiltinWithOneOverloadedType(clang::CodeGen::CodeGenFunction &C
 }
 
 llvm::Value *emitUnaryFPBuiltin(clang::CodeGen::CodeGenFunction &CGF,
-                                const clang::CallExpr *E,
-                                unsigned IntrinsicID);
+                                const clang::CallExpr *E, unsigned IntrinsicID);
 
 llvm::Value *emitBinaryExpFPBuiltin(clang::CodeGen::CodeGenFunction &CGF,
                                     const clang::CallExpr *E,
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 4f4bcb5e796c7..0395f3b2e7242 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -2180,8 +2180,7 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   case Builtin::BI__builtin_scalbnf:
   case Builtin::BIscalbn:
   case Builtin::BI__builtin_scalbn:
-    return emitBinaryExpFPBuiltin(
-        *this, E, Intrinsic::ldexp);
+    return emitBinaryExpFPBuiltin(*this, E, Intrinsic::ldexp);
   default:
     return nullptr;
   }
diff --git a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
index 899e59e0914b6..2ab9308e0b911 100644
--- a/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/ARM.cpp
@@ -340,10 +340,8 @@ translateArmToMsvcIntrin(unsigned BuiltinID) {
 }
 
 // Emit an intrinsic where all operands are of the same type as the result.
-static Value *emitFPBuiltin(CodeGenFunction &CGF,
-                                                unsigned IntrinsicID,
-                                                llvm::Type *Ty,
-                                                ArrayRef<Value *> Args) {
+static Value *emitFPBuiltin(CodeGenFunction &CGF, unsigned IntrinsicID,
+                            llvm::Type *Ty, ArrayRef<Value *> Args) {
   Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Ty);
   return CGF.Builder.CreateCall(F, Args);
 }
@@ -1446,9 +1444,7 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
     Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
 
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
-    return emitFPBuiltin(
-        *this, Intrinsic::fma, Ty,
-        {Ops[1], Ops[2], Ops[0]});
+    return emitFPBuiltin(*this, Intrinsic::fma, Ty, {Ops[1], Ops[2], Ops[0]});
   }
   case NEON::BI__builtin_neon_vld1_v:
   case NEON::BI__builtin_neon_vld1q_v: {
@@ -5796,16 +5792,13 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateFDiv(Ops[0], Ops[1], "vdivh");
   case NEON::BI__builtin_neon_vfmah_f16:
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
-    return emitFPBuiltin(
-        *this, Intrinsic::fma, HalfTy,
-        {Ops[1], Ops[2], Ops[0]});
+    return emitFPBuiltin(*this, Intrinsic::fma, HalfTy,
+                         {Ops[1], Ops[2], Ops[0]});
   case NEON::BI__builtin_neon_vfmsh_f16: {
     Value *Neg = Builder.CreateFNeg(Ops[1], "vsubh");
 
     // NEON intrinsic puts accumulator first, unlike the LLVM fma.
-    return emitFPBuiltin(
-        *this, Intrinsic::fma, HalfTy,
-        {Neg, Ops[2], Ops[0]});
+    return emitFPBuiltin(*this, Intrinsic::fma, HalfTy, {Neg, Ops[2], Ops[0]});
   }
   case NEON::BI__builtin_neon_vaddd_s64:
   case NEON::BI__builtin_neon_vaddd_u64:
@@ -6099,9 +6092,8 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
       Ops[2] = Builder.CreateBitCast(Ops[2], VTy);
       Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
       Value *Result;
-      Result = emitFPBuiltin(
-          *this, Intrinsic::fma,
-          DoubleTy, {Ops[1], Ops[2], Ops[0]});
+      Result = emitFPBuiltin(*this, Intrinsic::fma, DoubleTy,
+                             {Ops[1], Ops[2], Ops[0]});
       return Builder.CreateBitCast(Result, Ty);
     }
     Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
@@ -6114,9 +6106,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
                                                cast<ConstantInt>(Ops[3]));
     Ops[2] = Builder.CreateShuffleVector(Ops[2], Ops[2], SV, "lane");
 
-    return emitFPBuiltin(
-        *this, Intrinsic::fma, Ty,
-        {Ops[2], Ops[1], Ops[0]});
+    return emitFPBuiltin(*this, Intrinsic::fma, Ty, {Ops[2], Ops[1], Ops[0]});
   }
   case NEON::BI__builtin_neon_vfmaq_laneq_v: {
     Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
@@ -6124,9 +6114,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
 
     Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
     Ops[2] = EmitNeonSplat(Ops[2], cast<ConstantInt>(Ops[3]));
-    return emitFPBuiltin(
-        *this, Intrinsic::fma, Ty,
-        {Ops[2], Ops[1], Ops[0]});
+    return emitFPBuiltin(*this, Intrinsic::fma, Ty, {Ops[2], Ops[1], Ops[0]});
   }
   case NEON::BI__builtin_neon_vfmah_lane_f16:
   case NEON::BI__builtin_neon_vfmas_lane_f32:
@@ -6136,9 +6124,7 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
   case NEON::BI__builtin_neon_vfmad_laneq_f64: {
     llvm::Type *Ty = ConvertType(E->getCallReturnType(getContext()));
     Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
-    return emitFPBuiltin(
-        *this, Intrinsic::fma, Ty,
-        {Ops[1], Ops[2], Ops[0]});
+    return emitFPBuiltin(*this, Intrinsic::fma, Ty, {Ops[1], Ops[2], Ops[0]});
   }
   case NEON::BI__builtin_neon_vmull_v:
     // FIXME: improve sharing scheme to cope with 3 alternative LLVM intrinsics.
diff --git a/clang/lib/CodeGen/TargetBuiltins/PPC.cpp b/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
index fd3168dd003c5..aa33e61e16a4d 100644
--- a/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/PPC.cpp
@@ -1230,33 +1230,27 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
     return FDiv;
   }
   case PPC::BI__builtin_ppc_fric:
-    return RValue::get(emitUnaryFPBuiltin(
-                           *this, E, Intrinsic::rint))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::rint))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frim:
   case PPC::BI__builtin_ppc_frims:
-    return RValue::get(emitUnaryFPBuiltin(
-                           *this, E, Intrinsic::floor))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::floor))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frin:
   case PPC::BI__builtin_ppc_frins:
-    return RValue::get(emitUnaryFPBuiltin(
-                           *this, E, Intrinsic::round))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::round))
         .getScalarVal();
   case PPC::BI__builtin_ppc_frip:
   case PPC::BI__builtin_ppc_frips:
-    return RValue::get(emitUnaryFPBuiltin(
-                           *this, E, Intrinsic::ceil))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::ceil))
         .getScalarVal();
   case PPC::BI__builtin_ppc_friz:
   case PPC::BI__builtin_ppc_frizs:
-    return RValue::get(emitUnaryFPBuiltin(
-                           *this, E, Intrinsic::trunc))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc))
         .getScalarVal();
   case PPC::BI__builtin_ppc_fsqrt:
   case PPC::BI__builtin_ppc_fsqrts:
-    return RValue::get(emitUnaryFPBuiltin(
-                           *this, E, Intrinsic::sqrt))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::sqrt))
         .getScalarVal();
   case PPC::BI__builtin_ppc_test_data_class: {
     Value *Op0 = EmitScalarExpr(E->getArg(0));
diff --git a/llvm/include/llvm/IR/FloatingPointOps.def b/llvm/include/llvm/IR/FloatingPointOps.def
index 017c5266413fb..41a16743bd370 100644
--- a/llvm/include/llvm/IR/FloatingPointOps.def
+++ b/llvm/include/llvm/IR/FloatingPointOps.def
@@ -1,4 +1,4 @@
-//===- llvm/IR/FloatingPointOps.def - FP intrinsics -------------*- C++ -*-===//
+//===----------------------------------------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 7495e49ff2634..926fbdd4af3a5 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1364,7 +1364,8 @@ static ConstantFP *flushDenormalConstantFP(ConstantFP *CFP,
     return CFP;
 
   if (auto *CB = dyn_cast_or_null<CallBase>(Inst)) {
-    auto Mode = IsOutput ? CB->getOutputDenormMode() : CB->getInputDenormMode();
+    std::optional<DenormalMode::DenormalModeKind> Mode =
+        IsOutput ? CB->getOutputDenormMode() : CB->getInputDenormMode();
     return flushDenormalConstant(CFP->getType(), APF, *Mode);
   }
 
@@ -3207,7 +3208,7 @@ static Constant *evaluateCompare(const APFloat &Op1, const APFloat &Op2,
           .Case("ule", FCmpInst::FCMP_ULE)
           .Case("une", FCmpInst::FCMP_UNE)
           .Default(FCmpInst::BAD_FCMP_PREDICATE);
-  bool IsSignaling = (Call->getIntrinsicID() == Intrinsic::fcmps);
+  bool IsSignaling = Call->getIntrinsicID() == Intrinsic::fcmps;
   if (IsSignaling) {
     if (Op1.isNaN() || Op2.isNaN())
       St = APFloat::opInvalidOp;
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index a01e7875912e0..cf3ff0d4d6ada 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -2201,24 +2201,21 @@ bool IRTranslator::translateKnownIntrinsic(const CallInst &CI, Intrinsic::ID ID,
     return true;
 
   // Redirect new-form FP intrinsics with non-default bundles to G_STRICT_*.
-  {
-    fp::ExceptionBehavior EB = CI.getExceptionBehavior();
-    RoundingMode RM = CI.getRoundingMode();
-    if (EB != fp::ebStrict || RM != RoundingMode::Dynamic) {
-      if (unsigned StrictOp = getBundledFPStrictGOpcode(ID)) {
-        uint32_t Flags = MachineInstr::copyFlagsFromInstruction(CI);
-        if (EB == fp::ebIgnore)
-          Flags |= MachineInstr::NoFPExcept;
-        SmallVector<SrcOp, 4> VRegs;
-        for (const auto &Arg : CI.args()) {
-          // Skip metadata args (e.g., predicates passed as MetadataAsValue).
-          if (!isa<MetadataAsValue>(Arg.get()))
-            VRegs.push_back(getOrCreateVReg(*Arg.get()));
-        }
-        MIRBuilder.buildInstr(StrictOp, {getOrCreateVReg(CI)}, VRegs, Flags);
-        return true;
-      }
+  fp::ExceptionBehavior EB = CI.getExceptionBehavior();
+  RoundingMode RM = CI.getRoundingMode();
+  if (unsigned StrictOp = getBundledFPStrictGOpcode(ID);
+      StrictOp && (EB != fp::ebStrict || RM != RoundingMode::Dynamic)) {
+    uint32_t Flags = MachineInstr::copyFlagsFromInstruction(CI);
+    if (EB == fp::ebIgnore)
+      Flags |= MachineInstr::NoFPExcept;
+    SmallVector<SrcOp> VRegs;
+    for (const auto &Arg : CI.args()) {
+      // Skip metadata args (e.g., predicates passed as MetadataAsValue).
+      if (!isa<MetadataAsValue>(Arg.get()))
+        VRegs.push_back(getOrCreateVReg(*Arg.get()));
     }
+    MIRBuilder.buildInstr(StrictOp, {getOrCreateVReg(CI)}, VRegs, Flags);
+    return true;
   }
 
   // llvm.fcmps is a signaling FP comparison — inherently strict regardless of
diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
index 22404c17dcf56..ceaeacff25b59 100644
--- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
+++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
@@ -109,12 +109,11 @@ struct SimpleValue {
     if (CallInst *CI = dyn_cast<CallInst>(Inst)) {
       if (Function *F = CI->getCalledFunction()) {
         switch (F->getIntrinsicID()) {
-        // New-form FP intrinsics (llvm.fadd, llvm.fsub, etc.) with fp.control
-        // and/or fp.except operand bundles follow the same CSE rules as their
-        // constrained predecessors:
-        //   - ebStrict or absent exception behavior → no CSE
-        //   - Dynamic or absent rounding mode → no CSE (unknown mode)
-        //   - Fixed non-strict EB + known RM → CSE allowed
+        // For FP intrinsics (llvm.fadd, llvm.fsub, etc.) with fp.control and/or
+        // fp.except operand bundles, CSE rules are as follows:
+        //   - ebStrict or absent exception behavior -> no CSE
+        //   - Dynamic or absent rounding mode -> no CSE (unknown mode)
+        //   - Fixed non-strict EB + known RM -> CSE allowed
         case Intrinsic::fadd:
         case Intrinsic::fsub:
         case Intrinsic::fmul:
@@ -1531,7 +1530,7 @@ bool EarlyCSE::processNode(DomTreeNode *Node) {
 
     // If this is a simple instruction that we can value number, process it.
     if (SimpleValue::canHandle(&Inst)) {
-      if ([[maybe_unused]] auto *CI = dyn_cast<IntrinsicInst>(&Inst);
+      if (auto *CI = dyn_cast<IntrinsicInst>(&Inst);
           CI && Intrinsic::isConstrainedFPIntrinsic(CI->getIntrinsicID())) {
         assert(CI->getExceptionBehavior() != fp::ebStrict &&
                "Unexpected ebStrict from SimpleValue::canHandle()");

>From 4517ee0c160a04d130227730bd59fea34b9a6330 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Mon, 13 Apr 2026 08:10:48 -0700
Subject: [PATCH 04/12] Fix CI failures: build errors, SPIRV crash, bf16 ISel,
 and test CHECKs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Instructions.cpp: remove unused outer `RM` variable shadowed by inner if
- TargetLoweringBase.cpp: fix misleading indentation on constrained-FP block
- IRTranslator.cpp: restore LLT::getUseExtended() guard in containsBF16Type
- CGExprScalar.cpp: clang-format two broken if-condition line splits
- SPIRVPrepareFunctions.cpp: handle Intrinsic::fcmp alongside fcmps
- fp-intrinsics-attr.ll: update CHECKs for fcmp/fcmps → llvm.fcmp.f64 upgrade

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/lib/CodeGen/CGExprScalar.cpp            |  3 +-
 llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp  |  2 +
 llvm/lib/CodeGen/TargetLoweringBase.cpp       | 77 +++++++------------
 llvm/lib/IR/Instructions.cpp                  |  1 -
 .../Target/SPIRV/SPIRVPrepareFunctions.cpp    |  5 +-
 llvm/test/Assembler/fp-intrinsics-attr.ll     | 11 +--
 6 files changed, 38 insertions(+), 61 deletions(-)

diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 498d48b7f6071..f94871c51066a 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -4681,8 +4681,7 @@ static Value* tryEmitFMulAdd(const BinOpInfo &op,
   }
 
   if (auto *LHSBinOp = dyn_cast<llvm::CallBase>(LHS)) {
-    if (LHSBinOp->getIntrinsicID() ==
-            llvm::Intrinsic::fmul &&
+    if (LHSBinOp->getIntrinsicID() == llvm::Intrinsic::fmul &&
         (LHSBinOp->use_empty() || NegLHS)) {
       // If we looked through fneg, erase it.
       if (NegLHS)
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index cf3ff0d4d6ada..de1a57ca54cf0 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -314,6 +314,8 @@ static bool containsBF16Type(const User &U) {
   // BF16 cannot currently be represented by LLT, to avoid miscompiles we
   // prevent any instructions using them. FIXME: This can be removed once LLT
   // supports bfloat.
+  if (LLT::getUseExtended())
+    return false; // extended LLT can represent bfloat16 — don't block translation
   return U.getType()->getScalarType()->isBFloatTy() ||
          any_of(U.operands(), [](Value *V) {
            return V->getType()->getScalarType()->isBFloatTy();
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 191511a9f315e..efe134a0c5b5b 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1206,57 +1206,32 @@ void TargetLoweringBase::initActions() {
                           ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
                          VT, Expand);
 
-            // Constrained floating-point operations default to expand.
-      setOperationAction({
-          ISD::STRICT_FADD,
-          ISD::STRICT_FSUB,
-          ISD::STRICT_FMUL,
-          ISD::STRICT_FDIV,
-          ISD::STRICT_FREM,
-          ISD::STRICT_FP_EXTEND,
-          ISD::STRICT_SINT_TO_FP,
-          ISD::STRICT_UINT_TO_FP,
-          ISD::STRICT_FP_TO_SINT,
-          ISD::STRICT_FP_TO_UINT,
-          ISD::STRICT_FP_ROUND,
-          ISD::STRICT_FSETCC,
-          ISD::STRICT_FSETCCS,
-          ISD::STRICT_FACOS,
-          ISD::STRICT_FASIN,
-          ISD::STRICT_FATAN,
-          ISD::STRICT_FATAN2,
-          ISD::STRICT_FCEIL,
-          ISD::STRICT_FCOS,
-          ISD::STRICT_FCOSH,
-          ISD::STRICT_FEXP,
-          ISD::STRICT_FEXP2,
-          ISD::STRICT_FFLOOR,
-          ISD::STRICT_FMA,
-          ISD::STRICT_FLOG,
-          ISD::STRICT_FLOG10,
-          ISD::STRICT_FLOG2,
-          ISD::STRICT_LRINT,
-          ISD::STRICT_LLRINT,
-          ISD::STRICT_LROUND,
-          ISD::STRICT_LLROUND,
-          ISD::STRICT_FMAXNUM,
-          ISD::STRICT_FMINNUM,
-          ISD::STRICT_FMAXIMUM,
-          ISD::STRICT_FMINIMUM,
-          ISD::STRICT_FNEARBYINT,
-          ISD::STRICT_FPOW,
-          ISD::STRICT_FPOWI,
-          ISD::STRICT_FLDEXP,
-          ISD::STRICT_FRINT,
-          ISD::STRICT_FROUND,
-          ISD::STRICT_FROUNDEVEN,
-          ISD::STRICT_FSIN,
-          ISD::STRICT_FSINH,
-          ISD::STRICT_FSQRT,
-          ISD::STRICT_FTAN,
-          ISD::STRICT_FTANH,
-          ISD::STRICT_FTRUNC
-      }, VT, Expand);
+    // Constrained floating-point operations default to expand.
+    setOperationAction({ISD::STRICT_FADD,        ISD::STRICT_FSUB,
+                        ISD::STRICT_FMUL,        ISD::STRICT_FDIV,
+                        ISD::STRICT_FREM,        ISD::STRICT_FP_EXTEND,
+                        ISD::STRICT_SINT_TO_FP,  ISD::STRICT_UINT_TO_FP,
+                        ISD::STRICT_FP_TO_SINT,  ISD::STRICT_FP_TO_UINT,
+                        ISD::STRICT_FP_ROUND,    ISD::STRICT_FSETCC,
+                        ISD::STRICT_FSETCCS,     ISD::STRICT_FACOS,
+                        ISD::STRICT_FASIN,       ISD::STRICT_FATAN,
+                        ISD::STRICT_FATAN2,      ISD::STRICT_FCEIL,
+                        ISD::STRICT_FCOS,        ISD::STRICT_FCOSH,
+                        ISD::STRICT_FEXP,        ISD::STRICT_FEXP2,
+                        ISD::STRICT_FFLOOR,      ISD::STRICT_FMA,
+                        ISD::STRICT_FLOG,        ISD::STRICT_FLOG10,
+                        ISD::STRICT_FLOG2,       ISD::STRICT_LRINT,
+                        ISD::STRICT_LLRINT,      ISD::STRICT_LROUND,
+                        ISD::STRICT_LLROUND,     ISD::STRICT_FMAXNUM,
+                        ISD::STRICT_FMINNUM,     ISD::STRICT_FMAXIMUM,
+                        ISD::STRICT_FMINIMUM,    ISD::STRICT_FNEARBYINT,
+                        ISD::STRICT_FPOW,        ISD::STRICT_FPOWI,
+                        ISD::STRICT_FLDEXP,      ISD::STRICT_FRINT,
+                        ISD::STRICT_FROUND,      ISD::STRICT_FROUNDEVEN,
+                        ISD::STRICT_FSIN,        ISD::STRICT_FSINH,
+                        ISD::STRICT_FSQRT,       ISD::STRICT_FTAN,
+                        ISD::STRICT_FTANH,       ISD::STRICT_FTRUNC},
+                       VT, Expand);
 
     // For most targets @llvm.get.dynamic.area.offset just returns 0.
     setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, VT, Expand);
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 7b04bed84e52d..4240e8d46fea8 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -634,7 +634,6 @@ bool CallBase::hasClobberingOperandBundles() const {
 
 RoundingMode CallBase::getRoundingMode() const {
   // Try reading rounding mode from FP bundle.
-  std::optional<RoundingMode> RM;
   if (auto ControlBundle = getOperandBundle(LLVMContext::OB_fp_control)) {
     for (auto &U : ControlBundle->Inputs) {
       Value *V = U.get();
diff --git a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
index 7d7e35a1f5296..d42dfb838b4d0 100644
--- a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
@@ -564,9 +564,10 @@ bool SPIRVPrepareFunctions::substituteIntrinsicCalls(Function *F) {
           Changed = true;
         }
         break;
+      case Intrinsic::fcmp:
       case Intrinsic::fcmps: {
-        // Signaling FP compare – SPIRV has no separate signaling compare
-        // instruction; lower to a plain fcmp.
+        // SPIRV has no separate non-signaling/signaling compare instruction;
+        // lower both llvm.fcmp and llvm.fcmps to a plain fcmp.
         Value *LHS = Call->getArgOperand(0);
         Value *RHS = Call->getArgOperand(1);
         auto *PredMD =
diff --git a/llvm/test/Assembler/fp-intrinsics-attr.ll b/llvm/test/Assembler/fp-intrinsics-attr.ll
index edacb42a3bc8a..c4d4b510ca5aa 100644
--- a/llvm/test/Assembler/fp-intrinsics-attr.ll
+++ b/llvm/test/Assembler/fp-intrinsics-attr.ll
@@ -3,8 +3,8 @@
 ; Test to verify that constrained intrinsics are auto-upgraded on bitcode load.
 ; With default rounding mode (dynamic) and exception behavior (strict), arithmetic
 ; ops become plain instructions and math intrinsics become non-constrained calls.
-; fcmps (signaling compare) is upgraded to llvm.fcmps with no fp.except bundle
-; (strict is the default in a strictfp function).
+; fcmp/fcmps are upgraded to llvm.fcmp/llvm.fcmps with an explicit fp.except(strict)
+; bundle so the call is not treated as dead code by the optimizer.
 
 define void @func(double %a, double %b, double %c, i32 %i) strictfp {
 ; CHECK-LABEL: define void @func
@@ -55,8 +55,8 @@ define void @func(double %a, double %b, double %c, i32 %i) strictfp {
 ; CHECK-NEXT: {{.*}} = call double @llvm.round.f64(double [[A]])
 ; CHECK-NEXT: {{.*}} = call double @llvm.roundeven.f64(double [[A]])
 ; CHECK-NEXT: {{.*}} = call double @llvm.trunc.f64(double [[A]])
-; CHECK-NEXT: {{.*}} = fcmp oeq double [[A]], [[B]]
-; CHECK-NEXT: {{.*}} = call i1 @llvm.fcmps.f64(double [[A]], double [[B]], metadata !"oeq")
+; CHECK-NEXT: {{.*}} = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
+; CHECK-NEXT: {{.*}} = call i1 @llvm.fcmps.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"strict") ]
 ; CHECK-NEXT: ret void
 
   %add = call double @llvm.experimental.constrained.fadd.f64(
@@ -284,8 +284,9 @@ define void @func(double %a, double %b, double %c, i32 %i) strictfp {
   ret void
 }
 
-; fcmps is auto-upgraded to the new llvm.fcmps intrinsic (3 args: float, float, metadata pred).
+; fcmp/fcmps are auto-upgraded to llvm.fcmp/llvm.fcmps (3 args: float, float, metadata pred).
 ; Plain intrinsic declarations are emitted for the upgraded calls.
+; CHECK-DAG: declare i1 @llvm.fcmp.f64(double, double, metadata)
 ; CHECK-DAG: declare i1 @llvm.fcmps.f64(double, double, metadata) #[[ATTR1:[0-9]+]]
 ; CHECK-DAG: declare double @llvm.fma.f64(double, double, double)
 ; CHECK-DAG: declare double @llvm.fmuladd.f64(double, double, double)

>From e1f2c85c5fec24b120fb8e55720659480106b872 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Mon, 13 Apr 2026 08:55:07 -0700
Subject: [PATCH 05/12] Apply clang-format to all changed lines

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/lib/CodeGen/CGExprScalar.cpp            |   3 +-
 clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp  |  36 ++++--
 llvm/include/llvm/IR/IRBuilder.h              |  36 +++---
 llvm/include/llvm/IR/IntrinsicInst.h          |   3 -
 llvm/lib/Analysis/ConstantFolding.cpp         |  36 +++---
 llvm/lib/Analysis/InstructionSimplify.cpp     |   4 +-
 llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp  |  34 ++++--
 .../SelectionDAG/SelectionDAGBuilder.cpp      | 105 +++++++++++------
 llvm/lib/CodeGen/TargetLoweringBase.cpp       |  43 +++----
 llvm/lib/IR/AutoUpgrade.cpp                   | 106 +++++++++---------
 llvm/lib/IR/FPEnv.cpp                         |   7 +-
 llvm/lib/IR/Instructions.cpp                  |  19 ++--
 llvm/lib/IR/IntrinsicInst.cpp                 |   3 -
 llvm/lib/IR/Verifier.cpp                      |   4 +-
 llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp |   3 +-
 .../Target/SPIRV/SPIRVPrepareFunctions.cpp    |   6 +-
 .../InstCombine/InstCombineCalls.cpp          |   3 +-
 llvm/lib/Transforms/Scalar/EarlyCSE.cpp       |   3 +-
 llvm/lib/Transforms/Utils/CloneFunction.cpp   |  10 +-
 llvm/unittests/IR/IRBuilderTest.cpp           |   5 +-
 llvm/unittests/IR/InstructionsTest.cpp        |   1 -
 .../LLVMIR/LLVMIRToLLVMTranslation.cpp        |   4 +-
 22 files changed, 265 insertions(+), 209 deletions(-)

diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index f94871c51066a..9aa14e53c22d9 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -4690,8 +4690,7 @@ static Value* tryEmitFMulAdd(const BinOpInfo &op,
     }
   }
   if (auto *RHSBinOp = dyn_cast<llvm::CallBase>(RHS)) {
-    if (RHSBinOp->getIntrinsicID() ==
-            llvm::Intrinsic::fmul &&
+    if (RHSBinOp->getIntrinsicID() == llvm::Intrinsic::fmul &&
         (RHSBinOp->use_empty() || NegRHS)) {
       // If we looked through fneg, erase it.
       if (NegRHS)
diff --git a/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp b/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp
index 4346771ad4804..9c22616537da6 100644
--- a/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/SystemZ.cpp
@@ -197,18 +197,32 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     case 0:  // IEEE-inexact exception allowed
       switch (M5.getZExtValue()) {
       default: break;
-      case 0: ID = Intrinsic::rint; break;
+      case 0:
+        ID = Intrinsic::rint;
+        break;
       }
       break;
     case 4:  // IEEE-inexact exception suppressed
       switch (M5.getZExtValue()) {
       default: break;
-      case 0: ID = Intrinsic::nearbyint; break;
-      case 1: ID = Intrinsic::round; break;
-      case 4: ID = Intrinsic::roundeven; break;
-      case 5: ID = Intrinsic::trunc; break;
-      case 6: ID = Intrinsic::ceil; break;
-      case 7: ID = Intrinsic::floor; break;
+      case 0:
+        ID = Intrinsic::nearbyint;
+        break;
+      case 1:
+        ID = Intrinsic::round;
+        break;
+      case 4:
+        ID = Intrinsic::roundeven;
+        break;
+      case 5:
+        ID = Intrinsic::trunc;
+        break;
+      case 6:
+        ID = Intrinsic::ceil;
+        break;
+      case 7:
+        ID = Intrinsic::floor;
+        break;
       }
       break;
     }
@@ -238,7 +252,9 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     Intrinsic::ID ID = Intrinsic::not_intrinsic;
     switch (M4.getZExtValue()) {
     default: break;
-    case 4: ID = Intrinsic::maxnum; break;
+    case 4:
+      ID = Intrinsic::maxnum;
+      break;
     }
     if (ID != Intrinsic::not_intrinsic) {
       Function *F = CGM.getIntrinsic(ID, ResultType);
@@ -265,7 +281,9 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
     Intrinsic::ID ID = Intrinsic::not_intrinsic;
     switch (M4.getZExtValue()) {
     default: break;
-    case 4: ID = Intrinsic::minnum; break;
+    case 4:
+      ID = Intrinsic::minnum;
+      break;
     }
     if (ID != Intrinsic::not_intrinsic) {
       Function *F = CGM.getIntrinsic(ID, ResultType);
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index 1c31bf60fb7bb..75d38fd857214 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -1645,8 +1645,8 @@ class IRBuilderBase {
   Value *CreateFAddFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::fadd, {L->getType()}, {L, R},
-                             FMFSource, Name);
+      return CreateIntrinsic(Intrinsic::fadd, {L->getType()}, {L, R}, FMFSource,
+                             Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FAdd, L, R, FMFSource.get(FMF)))
@@ -1664,8 +1664,8 @@ class IRBuilderBase {
   Value *CreateFSubFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::fsub, {L->getType()}, {L, R},
-                             FMFSource, Name);
+      return CreateIntrinsic(Intrinsic::fsub, {L->getType()}, {L, R}, FMFSource,
+                             Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FSub, L, R, FMFSource.get(FMF)))
@@ -1683,8 +1683,8 @@ class IRBuilderBase {
   Value *CreateFMulFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::fmul, {L->getType()}, {L, R},
-                             FMFSource, Name);
+      return CreateIntrinsic(Intrinsic::fmul, {L->getType()}, {L, R}, FMFSource,
+                             Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FMul, L, R, FMFSource.get(FMF)))
@@ -1702,8 +1702,8 @@ class IRBuilderBase {
   Value *CreateFDivFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::fdiv, {L->getType()}, {L, R},
-                             FMFSource, Name);
+      return CreateIntrinsic(Intrinsic::fdiv, {L->getType()}, {L, R}, FMFSource,
+                             Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FDiv, L, R, FMFSource.get(FMF)))
@@ -1721,8 +1721,8 @@ class IRBuilderBase {
   Value *CreateFRemFMF(Value *L, Value *R, FMFSource FMFSource,
                        const Twine &Name = "", MDNode *FPMD = nullptr) {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::frem, {L->getType()}, {L, R},
-                             FMFSource, Name);
+      return CreateIntrinsic(Intrinsic::frem, {L->getType()}, {L, R}, FMFSource,
+                             Name);
 
     if (Value *V =
             Folder.FoldBinOpFMF(Instruction::FRem, L, R, FMFSource.get(FMF)))
@@ -2101,23 +2101,23 @@ class IRBuilderBase {
 
   Value *CreateFPToUI(Value *V, Type *DestTy, const Twine &Name = "") {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::fptoui, {DestTy, V->getType()}, {V},
-                             {}, Name);
+      return CreateIntrinsic(Intrinsic::fptoui, {DestTy, V->getType()}, {V}, {},
+                             Name);
     return CreateCast(Instruction::FPToUI, V, DestTy, Name);
   }
 
   Value *CreateFPToSI(Value *V, Type *DestTy, const Twine &Name = "") {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::fptosi, {DestTy, V->getType()}, {V},
-                             {}, Name);
+      return CreateIntrinsic(Intrinsic::fptosi, {DestTy, V->getType()}, {V}, {},
+                             Name);
     return CreateCast(Instruction::FPToSI, V, DestTy, Name);
   }
 
   Value *CreateUIToFP(Value *V, Type *DestTy, const Twine &Name = "",
                       bool IsNonNeg = false) {
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::uitofp, {DestTy, V->getType()}, {V},
-                             {}, Name);
+      return CreateIntrinsic(Intrinsic::uitofp, {DestTy, V->getType()}, {V}, {},
+                             Name);
     if (Value *Folded = Folder.FoldCast(Instruction::UIToFP, V, DestTy))
       return Folded;
     Instruction *I = Insert(new UIToFPInst(V, DestTy), Name);
@@ -2128,8 +2128,8 @@ class IRBuilderBase {
 
   Value *CreateSIToFP(Value *V, Type *DestTy, const Twine &Name = ""){
     if (IsFPConstrained && hasNonDefaultFPConstraints())
-      return CreateIntrinsic(Intrinsic::sitofp, {DestTy, V->getType()}, {V},
-                             {}, Name);
+      return CreateIntrinsic(Intrinsic::sitofp, {DestTy, V->getType()}, {V}, {},
+                             Name);
     return CreateCast(Instruction::SIToFP, V, DestTy, Name);
   }
 
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 5f3f0b57f0c61..81e6605e9c004 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -655,7 +655,6 @@ class VPIntrinsic : public IntrinsicInst {
   // Equivalent non-predicated intrinsic ID
   LLVM_ABI static std::optional<Intrinsic::ID>
   getFunctionalIntrinsicIDForVP(Intrinsic::ID ID);
-
 };
 
 /// This represents vector predication reduction intrinsics.
@@ -727,8 +726,6 @@ class VPBinOpIntrinsic : public VPIntrinsic {
   /// @}
 };
 
-
-
 /// This class represents min/max intrinsics.
 class MinMaxIntrinsic : public IntrinsicInst {
 public:
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 926fbdd4af3a5..a431eba46807f 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2312,8 +2312,7 @@ static bool getConstIntOrUndef(Value *Op, const APInt *&C) {
 ///
 /// \param CI Constrained intrinsic call.
 /// \param St Exception flags raised during constant evaluation.
-static bool mayFoldConstrained(const CallBase *CI,
-                               APFloat::opStatus St) {
+static bool mayFoldConstrained(const CallBase *CI, APFloat::opStatus St) {
   RoundingMode ORM = CI->getRoundingMode();
   fp::ExceptionBehavior EB = CI->getExceptionBehavior();
 
@@ -2346,8 +2345,8 @@ static bool mayFoldNewFPIntrinsic(const CallBase *CI, APFloat::opStatus St) {
     return true;
 
   // If evaluation raised an FP exception, the result can depend on the rounding
-  // mode.  If the rounding mode is dynamic (unknown at compile time), folding is
-  // not safe.
+  // mode.  If the rounding mode is dynamic (unknown at compile time), folding
+  // is not safe.
   if (CI->getRoundingMode() == RoundingMode::Dynamic)
     return false;
 
@@ -2372,8 +2371,7 @@ static RoundingMode getEvaluationRoundingModeForNewFP(const CallBase *CI) {
 }
 
 /// Returns the rounding mode that should be used for constant evaluation.
-static RoundingMode
-getEvaluationRoundingMode(const CallBase *CI) {
+static RoundingMode getEvaluationRoundingMode(const CallBase *CI) {
   RoundingMode ORM = CI->getRoundingMode();
   if (ORM == RoundingMode::Dynamic)
     // Even if the rounding mode is unknown, try evaluating the operation.
@@ -2621,8 +2619,7 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
     if (RM) {
       if (U.isFinite()) {
         APFloat::opStatus St = U.roundToIntegral(*RM);
-        if (IntrinsicID == Intrinsic::rint &&
-            St == APFloat::opInexact) {
+        if (IntrinsicID == Intrinsic::rint && St == APFloat::opInexact) {
           fp::ExceptionBehavior EB = Call->getExceptionBehavior();
           if (EB == fp::ebStrict)
             return nullptr;
@@ -3189,8 +3186,7 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
 static Constant *evaluateCompare(const APFloat &Op1, const APFloat &Op2,
                                  const IntrinsicInst *Call) {
   APFloat::opStatus St = APFloat::opOK;
-  Metadata *MD =
-      cast<MetadataAsValue>(Call->getArgOperand(2))->getMetadata();
+  Metadata *MD = cast<MetadataAsValue>(Call->getArgOperand(2))->getMetadata();
   FCmpInst::Predicate Cond =
       StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
           .Case("oeq", FCmpInst::FCMP_OEQ)
@@ -3427,11 +3423,21 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
         switch (IntrinsicID) {
         default:
           llvm_unreachable("unexpected intrinsic");
-        case Intrinsic::fadd: St = Res.add(Op2V, RM); break;
-        case Intrinsic::fsub: St = Res.subtract(Op2V, RM); break;
-        case Intrinsic::fmul: St = Res.multiply(Op2V, RM); break;
-        case Intrinsic::fdiv: St = Res.divide(Op2V, RM); break;
-        case Intrinsic::frem: St = Res.mod(Op2V); break;
+        case Intrinsic::fadd:
+          St = Res.add(Op2V, RM);
+          break;
+        case Intrinsic::fsub:
+          St = Res.subtract(Op2V, RM);
+          break;
+        case Intrinsic::fmul:
+          St = Res.multiply(Op2V, RM);
+          break;
+        case Intrinsic::fdiv:
+          St = Res.divide(Op2V, RM);
+          break;
+        case Intrinsic::frem:
+          St = Res.mod(Op2V);
+          break;
         }
         if (mayFoldNewFPIntrinsic(Call, St)) {
           DenormalMode::DenormalModeKind Mode = *Call->getOutputDenormMode();
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 194960b18cbe6..09831b9d394cd 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -7492,7 +7492,9 @@ Value *llvm::simplifyCall(CallBase *Call, Value *Callee, ArrayRef<Value *> Args,
 }
 
 Value *llvm::simplifyConstrainedFPCall(CallBase *Call, const SimplifyQuery &Q) {
-  assert(isa<IntrinsicInst>(Call) && Intrinsic::isConstrainedFPIntrinsic(cast<IntrinsicInst>(Call)->getIntrinsicID()));
+  assert(isa<IntrinsicInst>(Call) &&
+         Intrinsic::isConstrainedFPIntrinsic(
+             cast<IntrinsicInst>(Call)->getIntrinsicID()));
   SmallVector<Value *, 4> Args(Call->args());
   if (Value *V = tryConstantFoldCall(Call, Call->getCalledOperand(), Args, Q))
     return V;
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index de1a57ca54cf0..aaeae79ed4ffd 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -315,7 +315,8 @@ static bool containsBF16Type(const User &U) {
   // prevent any instructions using them. FIXME: This can be removed once LLT
   // supports bfloat.
   if (LLT::getUseExtended())
-    return false; // extended LLT can represent bfloat16 — don't block translation
+    return false; // extended LLT can represent bfloat16 — don't block
+                  // translation
   return U.getType()->getScalarType()->isBFloatTy() ||
          any_of(U.operands(), [](Value *V) {
            return V->getType()->getScalarType()->isBFloatTy();
@@ -386,8 +387,8 @@ bool IRTranslator::translateCompare(const User &U,
   else if (MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
     // In a strictfp function, plain fcmp instructions still need strict
     // semantics (e.g. when auto-upgraded from experimental_constrained_fcmp).
-    MIRBuilder.buildInstr(TargetOpcode::G_STRICT_FCMP, {Res},
-                          {Pred, Op0, Op1}, Flags);
+    MIRBuilder.buildInstr(TargetOpcode::G_STRICT_FCMP, {Res}, {Pred, Op0, Op1},
+                          Flags);
   } else
     MIRBuilder.buildFCmp(Pred, Res, Op0, Op1, Flags);
 
@@ -2098,15 +2099,24 @@ bool IRTranslator::translateSimpleIntrinsic(const CallInst &CI,
 /// opcode, or return 0 if no strict form is available.
 static unsigned getBundledFPStrictGOpcode(Intrinsic::ID ID) {
   switch (ID) {
-  case Intrinsic::fadd: return TargetOpcode::G_STRICT_FADD;
-  case Intrinsic::fsub: return TargetOpcode::G_STRICT_FSUB;
-  case Intrinsic::fmul: return TargetOpcode::G_STRICT_FMUL;
-  case Intrinsic::fdiv: return TargetOpcode::G_STRICT_FDIV;
-  case Intrinsic::frem: return TargetOpcode::G_STRICT_FREM;
-  case Intrinsic::fma:  return TargetOpcode::G_STRICT_FMA;
-  case Intrinsic::sqrt: return TargetOpcode::G_STRICT_FSQRT;
-  case Intrinsic::ldexp: return TargetOpcode::G_STRICT_FLDEXP;
-  default: return 0;
+  case Intrinsic::fadd:
+    return TargetOpcode::G_STRICT_FADD;
+  case Intrinsic::fsub:
+    return TargetOpcode::G_STRICT_FSUB;
+  case Intrinsic::fmul:
+    return TargetOpcode::G_STRICT_FMUL;
+  case Intrinsic::fdiv:
+    return TargetOpcode::G_STRICT_FDIV;
+  case Intrinsic::frem:
+    return TargetOpcode::G_STRICT_FREM;
+  case Intrinsic::fma:
+    return TargetOpcode::G_STRICT_FMA;
+  case Intrinsic::sqrt:
+    return TargetOpcode::G_STRICT_FSQRT;
+  case Intrinsic::ldexp:
+    return TargetOpcode::G_STRICT_FLDEXP;
+  default:
+    return 0;
   }
 }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 4f304e095c3f6..6ac1675a27f11 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3686,10 +3686,8 @@ void SelectionDAGBuilder::visitUnreachable(const UnreachableInst &I) {
 static bool hasNonDefaultFPBundles(const CallBase &CB) {
   // If no fp.control or fp.except bundles are present at all, the call uses
   // the default FP environment — do not treat it as non-default.
-  bool HasControl =
-      CB.getOperandBundle(LLVMContext::OB_fp_control).has_value();
-  bool HasExcept =
-      CB.getOperandBundle(LLVMContext::OB_fp_except).has_value();
+  bool HasControl = CB.getOperandBundle(LLVMContext::OB_fp_control).has_value();
+  bool HasExcept = CB.getOperandBundle(LLVMContext::OB_fp_except).has_value();
   if (!HasControl && !HasExcept)
     return false;
   // Bundles are present; check whether they request non-default behavior.
@@ -6668,11 +6666,20 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
   // Route new-form FP intrinsics with non-default bundles to STRICT_* SDNodes.
   if (hasNonDefaultFPBundles(I)) {
     switch (Intrinsic) {
-    case Intrinsic::fadd: case Intrinsic::fsub:  case Intrinsic::fmul:
-    case Intrinsic::fdiv: case Intrinsic::frem:  case Intrinsic::fma:
-    case Intrinsic::sqrt: case Intrinsic::fptoui: case Intrinsic::fptosi:
-    case Intrinsic::uitofp: case Intrinsic::sitofp:
-    case Intrinsic::fptrunc: case Intrinsic::fpext: case Intrinsic::fcmp:
+    case Intrinsic::fadd:
+    case Intrinsic::fsub:
+    case Intrinsic::fmul:
+    case Intrinsic::fdiv:
+    case Intrinsic::frem:
+    case Intrinsic::fma:
+    case Intrinsic::sqrt:
+    case Intrinsic::fptoui:
+    case Intrinsic::fptosi:
+    case Intrinsic::uitofp:
+    case Intrinsic::sitofp:
+    case Intrinsic::fptrunc:
+    case Intrinsic::fpext:
+    case Intrinsic::fcmp:
       visitBundledFPIntrinsicAsStrict(cast<IntrinsicInst>(I));
       return;
     default:
@@ -7214,8 +7221,7 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     return;
   }
   case Intrinsic::fcmp: {
-    Metadata *MD =
-        cast<MetadataAsValue>(I.getArgOperand(2))->getMetadata();
+    Metadata *MD = cast<MetadataAsValue>(I.getArgOperand(2))->getMetadata();
     FCmpInst::Predicate Pred =
         StringSwitch<FCmpInst::Predicate>(cast<MDString>(MD)->getString())
             .Case("oeq", FCmpInst::FCMP_OEQ)
@@ -7237,10 +7243,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
            "invalid predicate in llvm.fcmp");
     ISD::CondCode Condition = getFCmpCondCode(Pred);
     EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
-    setValue(&I, DAG.getSetCC(sdl, DestVT,
-                              getValue(I.getArgOperand(0)),
-                              getValue(I.getArgOperand(1)),
-                              Condition));
+    setValue(&I, DAG.getSetCC(sdl, DestVT, getValue(I.getArgOperand(0)),
+                              getValue(I.getArgOperand(1)), Condition));
     return;
   }
   case Intrinsic::fma:
@@ -7264,10 +7268,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     SDNodeFlags TruncFlags;
     TruncFlags.copyFMF(*cast<FPMathOperator>(&I));
     SelectionDAG::FlagInserter FlagsInserter(DAG, TruncFlags);
-    setValue(&I, DAG.getNode(ISD::FPTRUNC_ROUND, sdl, VT,
-                             getValue(I.getArgOperand(0)),
-                             DAG.getTargetConstant((int)*RoundMode, sdl,
-                                                   MVT::i32)));
+    setValue(&I, DAG.getNode(
+                     ISD::FPTRUNC_ROUND, sdl, VT, getValue(I.getArgOperand(0)),
+                     DAG.getTargetConstant((int)*RoundMode, sdl, MVT::i32)));
 
     return;
   }
@@ -8714,23 +8717,54 @@ void SelectionDAGBuilder::visitBundledFPIntrinsicAsStrict(
 
   unsigned Opcode;
   switch (IID) {
-  case Intrinsic::fadd:    Opcode = ISD::STRICT_FADD;       break;
-  case Intrinsic::fsub:    Opcode = ISD::STRICT_FSUB;       break;
-  case Intrinsic::fmul:    Opcode = ISD::STRICT_FMUL;       break;
-  case Intrinsic::fdiv:    Opcode = ISD::STRICT_FDIV;       break;
-  case Intrinsic::frem:    Opcode = ISD::STRICT_FREM;       break;
-  case Intrinsic::fma:     Opcode = ISD::STRICT_FMA;        break;
-  case Intrinsic::sqrt:    Opcode = ISD::STRICT_FSQRT;      break;
-  case Intrinsic::fptoui:  Opcode = ISD::STRICT_FP_TO_UINT; break;
-  case Intrinsic::fptosi:  Opcode = ISD::STRICT_FP_TO_SINT; break;
-  case Intrinsic::uitofp:  Opcode = ISD::STRICT_UINT_TO_FP; break;
-  case Intrinsic::sitofp:  Opcode = ISD::STRICT_SINT_TO_FP; break;
-  case Intrinsic::fptrunc: Opcode = ISD::STRICT_FP_ROUND;   break;
-  case Intrinsic::fpext:   Opcode = ISD::STRICT_FP_EXTEND;  break;
-  case Intrinsic::fcmp:    Opcode = ISD::STRICT_FSETCC;     break;
-  case Intrinsic::fcmps:   Opcode = ISD::STRICT_FSETCCS;    break;
+  case Intrinsic::fadd:
+    Opcode = ISD::STRICT_FADD;
+    break;
+  case Intrinsic::fsub:
+    Opcode = ISD::STRICT_FSUB;
+    break;
+  case Intrinsic::fmul:
+    Opcode = ISD::STRICT_FMUL;
+    break;
+  case Intrinsic::fdiv:
+    Opcode = ISD::STRICT_FDIV;
+    break;
+  case Intrinsic::frem:
+    Opcode = ISD::STRICT_FREM;
+    break;
+  case Intrinsic::fma:
+    Opcode = ISD::STRICT_FMA;
+    break;
+  case Intrinsic::sqrt:
+    Opcode = ISD::STRICT_FSQRT;
+    break;
+  case Intrinsic::fptoui:
+    Opcode = ISD::STRICT_FP_TO_UINT;
+    break;
+  case Intrinsic::fptosi:
+    Opcode = ISD::STRICT_FP_TO_SINT;
+    break;
+  case Intrinsic::uitofp:
+    Opcode = ISD::STRICT_UINT_TO_FP;
+    break;
+  case Intrinsic::sitofp:
+    Opcode = ISD::STRICT_SINT_TO_FP;
+    break;
+  case Intrinsic::fptrunc:
+    Opcode = ISD::STRICT_FP_ROUND;
+    break;
+  case Intrinsic::fpext:
+    Opcode = ISD::STRICT_FP_EXTEND;
+    break;
+  case Intrinsic::fcmp:
+    Opcode = ISD::STRICT_FSETCC;
+    break;
+  case Intrinsic::fcmps:
+    Opcode = ISD::STRICT_FSETCCS;
+    break;
   default:
-    llvm_unreachable("Unhandled FP intrinsic in visitBundledFPIntrinsicAsStrict");
+    llvm_unreachable(
+        "Unhandled FP intrinsic in visitBundledFPIntrinsicAsStrict");
   }
 
   // Additional operands for specific opcodes.
@@ -8776,7 +8810,6 @@ void SelectionDAGBuilder::visitBundledFPIntrinsicAsStrict(
   setValue(&I, Result.getValue(0));
 }
 
-
 static unsigned getISDForVPIntrinsic(const VPIntrinsic &VPIntrin) {
   std::optional<unsigned> ResOPC;
   switch (VPIntrin.getIntrinsicID()) {
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index efe134a0c5b5b..7a51b2b53c886 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1207,31 +1207,24 @@ void TargetLoweringBase::initActions() {
                          VT, Expand);
 
     // Constrained floating-point operations default to expand.
-    setOperationAction({ISD::STRICT_FADD,        ISD::STRICT_FSUB,
-                        ISD::STRICT_FMUL,        ISD::STRICT_FDIV,
-                        ISD::STRICT_FREM,        ISD::STRICT_FP_EXTEND,
-                        ISD::STRICT_SINT_TO_FP,  ISD::STRICT_UINT_TO_FP,
-                        ISD::STRICT_FP_TO_SINT,  ISD::STRICT_FP_TO_UINT,
-                        ISD::STRICT_FP_ROUND,    ISD::STRICT_FSETCC,
-                        ISD::STRICT_FSETCCS,     ISD::STRICT_FACOS,
-                        ISD::STRICT_FASIN,       ISD::STRICT_FATAN,
-                        ISD::STRICT_FATAN2,      ISD::STRICT_FCEIL,
-                        ISD::STRICT_FCOS,        ISD::STRICT_FCOSH,
-                        ISD::STRICT_FEXP,        ISD::STRICT_FEXP2,
-                        ISD::STRICT_FFLOOR,      ISD::STRICT_FMA,
-                        ISD::STRICT_FLOG,        ISD::STRICT_FLOG10,
-                        ISD::STRICT_FLOG2,       ISD::STRICT_LRINT,
-                        ISD::STRICT_LLRINT,      ISD::STRICT_LROUND,
-                        ISD::STRICT_LLROUND,     ISD::STRICT_FMAXNUM,
-                        ISD::STRICT_FMINNUM,     ISD::STRICT_FMAXIMUM,
-                        ISD::STRICT_FMINIMUM,    ISD::STRICT_FNEARBYINT,
-                        ISD::STRICT_FPOW,        ISD::STRICT_FPOWI,
-                        ISD::STRICT_FLDEXP,      ISD::STRICT_FRINT,
-                        ISD::STRICT_FROUND,      ISD::STRICT_FROUNDEVEN,
-                        ISD::STRICT_FSIN,        ISD::STRICT_FSINH,
-                        ISD::STRICT_FSQRT,       ISD::STRICT_FTAN,
-                        ISD::STRICT_FTANH,       ISD::STRICT_FTRUNC},
-                       VT, Expand);
+    setOperationAction(
+        {ISD::STRICT_FADD,       ISD::STRICT_FSUB,       ISD::STRICT_FMUL,
+         ISD::STRICT_FDIV,       ISD::STRICT_FREM,       ISD::STRICT_FP_EXTEND,
+         ISD::STRICT_SINT_TO_FP, ISD::STRICT_UINT_TO_FP, ISD::STRICT_FP_TO_SINT,
+         ISD::STRICT_FP_TO_UINT, ISD::STRICT_FP_ROUND,   ISD::STRICT_FSETCC,
+         ISD::STRICT_FSETCCS,    ISD::STRICT_FACOS,      ISD::STRICT_FASIN,
+         ISD::STRICT_FATAN,      ISD::STRICT_FATAN2,     ISD::STRICT_FCEIL,
+         ISD::STRICT_FCOS,       ISD::STRICT_FCOSH,      ISD::STRICT_FEXP,
+         ISD::STRICT_FEXP2,      ISD::STRICT_FFLOOR,     ISD::STRICT_FMA,
+         ISD::STRICT_FLOG,       ISD::STRICT_FLOG10,     ISD::STRICT_FLOG2,
+         ISD::STRICT_LRINT,      ISD::STRICT_LLRINT,     ISD::STRICT_LROUND,
+         ISD::STRICT_LLROUND,    ISD::STRICT_FMAXNUM,    ISD::STRICT_FMINNUM,
+         ISD::STRICT_FMAXIMUM,   ISD::STRICT_FMINIMUM,   ISD::STRICT_FNEARBYINT,
+         ISD::STRICT_FPOW,       ISD::STRICT_FPOWI,      ISD::STRICT_FLDEXP,
+         ISD::STRICT_FRINT,      ISD::STRICT_FROUND,     ISD::STRICT_FROUNDEVEN,
+         ISD::STRICT_FSIN,       ISD::STRICT_FSINH,      ISD::STRICT_FSQRT,
+         ISD::STRICT_FTAN,       ISD::STRICT_FTANH,      ISD::STRICT_FTRUNC},
+        VT, Expand);
 
     // For most targets @llvm.get.dynamic.area.offset just returns 0.
     setOperationAction(ISD::GET_DYNAMIC_AREA_OFFSET, VT, Expand);
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 18f44fc0890a8..2c58e46e20ac7 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -4967,53 +4967,53 @@ static Value *upgradeVectorSplice(CallBase *CI, IRBuilder<> &Builder) {
 /// new FP intrinsic ID (llvm.fadd, llvm.sqrt, etc.).
 static Intrinsic::ID getNewFPIntrinsicForConstrainedName(StringRef OpName) {
   return StringSwitch<Intrinsic::ID>(OpName)
-      .Case("fadd",    Intrinsic::fadd)
-      .Case("fsub",    Intrinsic::fsub)
-      .Case("fmul",    Intrinsic::fmul)
-      .Case("fdiv",    Intrinsic::fdiv)
-      .Case("frem",    Intrinsic::frem)
-      .Case("fma",     Intrinsic::fma)
+      .Case("fadd", Intrinsic::fadd)
+      .Case("fsub", Intrinsic::fsub)
+      .Case("fmul", Intrinsic::fmul)
+      .Case("fdiv", Intrinsic::fdiv)
+      .Case("frem", Intrinsic::frem)
+      .Case("fma", Intrinsic::fma)
       .Case("fmuladd", Intrinsic::fmuladd)
-      .Case("fcmp",    Intrinsic::fcmp)
-      .Case("fcmps",   Intrinsic::fcmps)
-      .Case("fptoui",  Intrinsic::fptoui)
-      .Case("fptosi",  Intrinsic::fptosi)
-      .Case("uitofp",  Intrinsic::uitofp)
-      .Case("sitofp",  Intrinsic::sitofp)
+      .Case("fcmp", Intrinsic::fcmp)
+      .Case("fcmps", Intrinsic::fcmps)
+      .Case("fptoui", Intrinsic::fptoui)
+      .Case("fptosi", Intrinsic::fptosi)
+      .Case("uitofp", Intrinsic::uitofp)
+      .Case("sitofp", Intrinsic::sitofp)
       .Case("fptrunc", Intrinsic::fptrunc)
-      .Case("fpext",   Intrinsic::fpext)
-      .Case("sqrt",    Intrinsic::sqrt)
-      .Case("powi",    Intrinsic::powi)
-      .Case("ldexp",   Intrinsic::ldexp)
-      .Case("sin",     Intrinsic::sin)
-      .Case("asin",    Intrinsic::asin)
-      .Case("cos",     Intrinsic::cos)
-      .Case("acos",    Intrinsic::acos)
-      .Case("tan",     Intrinsic::tan)
-      .Case("atan",    Intrinsic::atan)
-      .Case("atan2",   Intrinsic::atan2)
-      .Case("sinh",    Intrinsic::sinh)
-      .Case("cosh",    Intrinsic::cosh)
-      .Case("tanh",    Intrinsic::tanh)
-      .Case("pow",     Intrinsic::pow)
-      .Case("log",     Intrinsic::log)
-      .Case("log10",   Intrinsic::log10)
-      .Case("log2",    Intrinsic::log2)
-      .Case("exp",     Intrinsic::exp)
-      .Case("exp2",    Intrinsic::exp2)
-      .Case("rint",    Intrinsic::rint)
+      .Case("fpext", Intrinsic::fpext)
+      .Case("sqrt", Intrinsic::sqrt)
+      .Case("powi", Intrinsic::powi)
+      .Case("ldexp", Intrinsic::ldexp)
+      .Case("sin", Intrinsic::sin)
+      .Case("asin", Intrinsic::asin)
+      .Case("cos", Intrinsic::cos)
+      .Case("acos", Intrinsic::acos)
+      .Case("tan", Intrinsic::tan)
+      .Case("atan", Intrinsic::atan)
+      .Case("atan2", Intrinsic::atan2)
+      .Case("sinh", Intrinsic::sinh)
+      .Case("cosh", Intrinsic::cosh)
+      .Case("tanh", Intrinsic::tanh)
+      .Case("pow", Intrinsic::pow)
+      .Case("log", Intrinsic::log)
+      .Case("log10", Intrinsic::log10)
+      .Case("log2", Intrinsic::log2)
+      .Case("exp", Intrinsic::exp)
+      .Case("exp2", Intrinsic::exp2)
+      .Case("rint", Intrinsic::rint)
       .Case("nearbyint", Intrinsic::nearbyint)
-      .Case("lrint",   Intrinsic::lrint)
-      .Case("llrint",  Intrinsic::llrint)
-      .Case("ceil",    Intrinsic::ceil)
-      .Case("floor",   Intrinsic::floor)
-      .Case("round",   Intrinsic::round)
+      .Case("lrint", Intrinsic::lrint)
+      .Case("llrint", Intrinsic::llrint)
+      .Case("ceil", Intrinsic::ceil)
+      .Case("floor", Intrinsic::floor)
+      .Case("round", Intrinsic::round)
       .Case("roundeven", Intrinsic::roundeven)
-      .Case("trunc",   Intrinsic::trunc)
-      .Case("lround",  Intrinsic::lround)
+      .Case("trunc", Intrinsic::trunc)
+      .Case("lround", Intrinsic::lround)
       .Case("llround", Intrinsic::llround)
-      .Case("minnum",  Intrinsic::minnum)
-      .Case("maxnum",  Intrinsic::maxnum)
+      .Case("minnum", Intrinsic::minnum)
+      .Case("maxnum", Intrinsic::maxnum)
       .Case("minimum", Intrinsic::minimum)
       .Case("maximum", Intrinsic::maximum)
       .Default(Intrinsic::not_intrinsic);
@@ -5068,18 +5068,19 @@ static Value *upgradeConstrainedFPIntrinsicCall(CallBase *CI, StringRef Name,
   // fpext, fptosi, fptoui, fcmp, fcmps, ceil, floor, trunc, round, roundeven,
   // lround, llround, maxnum, minnum, maximum, minimum
   bool HasRM = !StringSwitch<bool>(OpName)
-      .Cases({"fpext", "fptosi", "fptoui"}, true)
-      .Cases({"fcmp", "fcmps"}, true)
-      .Cases({"ceil", "floor", "trunc"}, true)
-      .Cases({"round", "roundeven"}, true)
-      .Cases({"lround", "llround"}, true)
-      .Cases({"maxnum", "minnum", "maximum", "minimum"}, true)
-      .Default(false);
+                    .Cases({"fpext", "fptosi", "fptoui"}, true)
+                    .Cases({"fcmp", "fcmps"}, true)
+                    .Cases({"ceil", "floor", "trunc"}, true)
+                    .Cases({"round", "roundeven"}, true)
+                    .Cases({"lround", "llround"}, true)
+                    .Cases({"maxnum", "minnum", "maximum", "minimum"}, true)
+                    .Default(false);
 
   // Parse rounding mode (second-to-last arg when present).
   std::optional<RoundingMode> RM;
   if (HasRM) {
-    StringRef RMS = getConstrainedFPMetaStr(CI->getArgOperand(CI->arg_size() - 2));
+    StringRef RMS =
+        getConstrainedFPMetaStr(CI->getArgOperand(CI->arg_size() - 2));
     if (RMS.empty())
       return nullptr; // malformed
     RM = convertStrToRoundingMode(RMS);
@@ -5088,7 +5089,8 @@ static Value *upgradeConstrainedFPIntrinsicCall(CallBase *CI, StringRef Name,
   }
 
   // Parse exception behavior (always the last metadata arg).
-  StringRef EBS = getConstrainedFPMetaStr(CI->getArgOperand(CI->arg_size() - 1));
+  StringRef EBS =
+      getConstrainedFPMetaStr(CI->getArgOperand(CI->arg_size() - 1));
   if (EBS.empty())
     return nullptr; // malformed
   std::optional<fp::ExceptionBehavior> EB = convertStrToExceptionBehavior(EBS);
@@ -5100,7 +5102,7 @@ static Value *upgradeConstrainedFPIntrinsicCall(CallBase *CI, StringRef Name,
 
   // Collect the non-metadata value args.
   // Layout: [value args...] [predicate?] [rounding mode?] [exception behavior]
-  bool IsFCmp  = (OpName == "fcmp");
+  bool IsFCmp = (OpName == "fcmp");
   bool IsFCmps = (OpName == "fcmps");
   bool IsCompare = IsFCmp || IsFCmps;
   unsigned NArgs = CI->arg_size() - 1; // always has EB
diff --git a/llvm/lib/IR/FPEnv.cpp b/llvm/lib/IR/FPEnv.cpp
index 433f5d7f38dc8..e2e24d9601a7b 100644
--- a/llvm/lib/IR/FPEnv.cpp
+++ b/llvm/lib/IR/FPEnv.cpp
@@ -32,7 +32,8 @@ llvm::convertStrToRoundingMode(StringRef RoundingArg) {
       .Default(std::nullopt);
 }
 
-std::optional<RoundingMode> llvm::convertBundleToRoundingMode(StringRef RoundingArg) {
+std::optional<RoundingMode>
+llvm::convertBundleToRoundingMode(StringRef RoundingArg) {
   return StringSwitch<std::optional<RoundingMode>>(RoundingArg)
       .Case("dyn", RoundingMode::Dynamic)
       .Case("rte", RoundingMode::NearestTiesToEven)
@@ -71,7 +72,8 @@ llvm::convertRoundingModeToStr(RoundingMode UseRounding) {
   return RoundingStr;
 }
 
-std::optional<StringRef> llvm::convertRoundingModeToBundle(RoundingMode UseRounding) {
+std::optional<StringRef>
+llvm::convertRoundingModeToBundle(RoundingMode UseRounding) {
   std::optional<StringRef> RoundingStr;
   switch (UseRounding) {
   case RoundingMode::Dynamic:
@@ -149,4 +151,3 @@ llvm::convertExceptionBehaviorToBundle(fp::ExceptionBehavior UseExcept) {
   }
   return ExceptStr;
 }
-
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 4240e8d46fea8..0fe58a8786f40 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -613,12 +613,11 @@ bool CallBase::hasReadingOperandBundles() const {
   // Implementation note: this is a conservative implementation of operand
   // bundle semantics, where *any* non-assume operand bundle (other than
   // ptrauth) forces a callsite to be at least readonly.
-  return hasOperandBundlesOtherThan({LLVMContext::OB_ptrauth,
-                                     LLVMContext::OB_kcfi,
-                                     LLVMContext::OB_fp_control,
-                                     LLVMContext::OB_fp_except,
-                                     LLVMContext::OB_convergencectrl,
-                                     LLVMContext::OB_deactivation_symbol}) &&
+  return hasOperandBundlesOtherThan(
+             {LLVMContext::OB_ptrauth, LLVMContext::OB_kcfi,
+              LLVMContext::OB_fp_control, LLVMContext::OB_fp_except,
+              LLVMContext::OB_convergencectrl,
+              LLVMContext::OB_deactivation_symbol}) &&
          getIntrinsicID() != Intrinsic::assume;
 }
 
@@ -895,7 +894,7 @@ bool CallBase::hasArgumentWithAdditionalReturnCaptureComponents() const {
 }
 
 std::optional<StringRef> llvm::getBundleOperandByPrefix(OperandBundleUse Bundle,
-                                                      StringRef Prefix) {
+                                                        StringRef Prefix) {
   for (const auto &Item : Bundle.Inputs) {
     Metadata *MD = cast<MetadataAsValue>(Item.get())->getMetadata();
     if (const auto *MDS = dyn_cast<MDString>(MD)) {
@@ -908,9 +907,9 @@ std::optional<StringRef> llvm::getBundleOperandByPrefix(OperandBundleUse Bundle,
 }
 
 void llvm::addOperandToBundleTag(LLVMContext &Ctx,
-                               SmallVectorImpl<OperandBundleDef> &Bundles,
-                               StringRef Tag, size_t PrefixSize,
-                               StringRef Val) {
+                                 SmallVectorImpl<OperandBundleDef> &Bundles,
+                                 StringRef Tag, size_t PrefixSize,
+                                 StringRef Val) {
   assert(PrefixSize > 0 && "Unexpected prefix size");
   assert(PrefixSize < Val.size() && "Invalid prefix size");
   StringRef Prefix = Val.take_front(PrefixSize);
diff --git a/llvm/lib/IR/IntrinsicInst.cpp b/llvm/lib/IR/IntrinsicInst.cpp
index 446da39d61eb4..89b50d175f804 100644
--- a/llvm/lib/IR/IntrinsicInst.cpp
+++ b/llvm/lib/IR/IntrinsicInst.cpp
@@ -285,7 +285,6 @@ void InstrProfCallsite::setCallee(Value *Callee) {
   setArgOperand(4, Callee);
 }
 
-
 static FCmpInst::Predicate getFPPredicateFromMD(const Value *Op) {
   Metadata *MD = cast<MetadataAsValue>(Op)->getMetadata();
   if (!MD || !isa<MDString>(MD))
@@ -308,7 +307,6 @@ static FCmpInst::Predicate getFPPredicateFromMD(const Value *Op) {
       .Default(FCmpInst::BAD_FCMP_PREDICATE);
 }
 
-
 ElementCount VPIntrinsic::getStaticVectorLength() const {
   auto GetVectorLengthOfType = [](const Type *T) -> ElementCount {
     const auto *VT = cast<VectorType>(T);
@@ -500,7 +498,6 @@ constexpr static bool doesVPHaveNoFunctionalEquivalent(Intrinsic::ID ID) {
                 getFunctionalIntrinsicIDForVP(Intrinsic::VPID));
 #include "llvm/IR/VPIntrinsics.def"
 
-
 Intrinsic::ID VPIntrinsic::getForOpcode(unsigned IROPC) {
   switch (IROPC) {
   default:
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 4bd528a02322f..a589615c9b120 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -49,10 +49,10 @@
 
 #include "llvm/IR/Verifier.h"
 #include "llvm/ADT/APFloat.h"
-#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/FloatingPointMode.h"
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallPtrSet.h"
@@ -78,8 +78,8 @@
 #include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DebugLoc.h"
-#include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/EHPersonalities.h"
 #include "llvm/IR/FPEnv.h"
diff --git a/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp b/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
index e3596406ec21d..05554503ccb57 100644
--- a/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
@@ -1646,8 +1646,7 @@ Instruction *SPIRVEmitIntrinsics::visitCallInst(CallInst &Call) {
 }
 
 // Use a tip about rounding mode to create a decoration.
-void SPIRVEmitIntrinsics::useRoundingMode(IntrinsicInst *FPI,
-                                          IRBuilder<> &B) {
+void SPIRVEmitIntrinsics::useRoundingMode(IntrinsicInst *FPI, IRBuilder<> &B) {
   RoundingMode RM = FPI->getRoundingMode();
   unsigned RoundingModeDeco = std::numeric_limits<unsigned>::max();
   switch (RM) {
diff --git a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
index d42dfb838b4d0..bb6f7ba6501b5 100644
--- a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
@@ -453,9 +453,8 @@ lowerNewFPBinopForSPIRV(CallInst *CI, Instruction::BinaryOps PlainOpc,
 // bundle carries a non-dynamic rounding mode (checked by the caller). We expand
 // to fmul + fadd and attach FPRoundingMode decorations to both instructions so
 // the rounding mode is preserved in the SPIRV output.
-static void
-lowerNewFmuladd(CallInst *CI,
-                SmallVector<Instruction *> &EraseFromParent) {
+static void lowerNewFmuladd(CallInst *CI,
+                            SmallVector<Instruction *> &EraseFromParent) {
   IRBuilder<> Builder(CI->getParent());
   Builder.SetInsertPoint(CI);
   Value *A = CI->getArgOperand(0);
@@ -472,7 +471,6 @@ lowerNewFmuladd(CallInst *CI,
   EraseFromParent.push_back(CI);
 }
 
-
 // Substitutes calls to LLVM intrinsics with either calls to SPIR-V intrinsics
 // or calls to proper generated functions. Returns True if F was modified.
 bool SPIRVPrepareFunctions::substituteIntrinsicCalls(Function *F) {
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 574485a5c3674..bdc995646a295 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -2011,7 +2011,8 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
   // prevents it from being removed. In some cases however the side effect is
   // actually absent. To detect this case, call SimplifyConstrainedFPCall. If it
   // returns a replacement, the call may be removed.
-  if (CI.use_empty() && Intrinsic::isConstrainedFPIntrinsic(CI.getIntrinsicID())) {
+  if (CI.use_empty() &&
+      Intrinsic::isConstrainedFPIntrinsic(CI.getIntrinsicID())) {
     if (simplifyConstrainedFPCall(&CI, SQ.getWithInstruction(&CI)))
       return eraseInstFromFunction(CI);
   }
diff --git a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
index ceaeacff25b59..24a8290bb1e42 100644
--- a/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
+++ b/llvm/lib/Transforms/Scalar/EarlyCSE.cpp
@@ -135,7 +135,8 @@ struct SimpleValue {
           // ebStrict means exceptions matter; don't CSE.
           if (CI->getExceptionBehavior() == fp::ebStrict)
             return false;
-          // Dynamic rounding mode means result depends on runtime mode; don't CSE.
+          // Dynamic rounding mode means result depends on runtime mode; don't
+          // CSE.
           if (CI->getRoundingMode() == RoundingMode::Dynamic)
             return false;
           return true;
diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index 131c15b0351ac..fff8684455711 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -941,10 +941,12 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
 /// constant arguments cause a significant amount of code in the callee to be
 /// dead.  Since this doesn't produce an exact copy of the input, it can't be
 /// used for things like CloneFunction or CloneModule.
-void llvm::CloneAndPruneFunctionInto(
-    Function *NewFunc, const Function *OldFunc, ValueToValueMapTy &VMap,
-    bool ModuleLevelChanges, SmallVectorImpl<ReturnInst *> &Returns,
-    const char *NameSuffix, ClonedCodeInfo &CodeInfo) {
+void llvm::CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
+                                     ValueToValueMapTy &VMap,
+                                     bool ModuleLevelChanges,
+                                     SmallVectorImpl<ReturnInst *> &Returns,
+                                     const char *NameSuffix,
+                                     ClonedCodeInfo &CodeInfo) {
   CloneAndPruneIntoFromInst(NewFunc, OldFunc, &OldFunc->front().front(), VMap,
                             ModuleLevelChanges, Returns, NameSuffix, CodeInfo);
 }
diff --git a/llvm/unittests/IR/IRBuilderTest.cpp b/llvm/unittests/IR/IRBuilderTest.cpp
index eb9d4310a63bd..51eb59db95759 100644
--- a/llvm/unittests/IR/IRBuilderTest.cpp
+++ b/llvm/unittests/IR/IRBuilderTest.cpp
@@ -649,9 +649,8 @@ TEST_F(IRBuilderTest, FPBundlesStrict) {
   {
     Function *Fn = Intrinsic::getOrInsertDeclaration(M.get(), Intrinsic::abs,
                                                      {Type::getInt64Ty(Ctx)});
-    GlobalVariable *GVInt = new GlobalVariable(*M, Type::getInt64Ty(Ctx), true,
-                                               GlobalValue::ExternalLinkage,
-                                               nullptr);
+    GlobalVariable *GVInt = new GlobalVariable(
+        *M, Type::getInt64Ty(Ctx), true, GlobalValue::ExternalLinkage, nullptr);
     Value *IntArg = Builder.CreateLoad(Type::getInt64Ty(Ctx), GVInt);
     Value *V = Builder.CreateCall(Fn, {IntArg, Builder.getInt1(false)});
     auto *I = cast<IntrinsicInst>(V);
diff --git a/llvm/unittests/IR/InstructionsTest.cpp b/llvm/unittests/IR/InstructionsTest.cpp
index 4208089359d2a..44e4a17721ca2 100644
--- a/llvm/unittests/IR/InstructionsTest.cpp
+++ b/llvm/unittests/IR/InstructionsTest.cpp
@@ -565,7 +565,6 @@ TEST(InstructionsTest, FPMathOperator) {
   I->deleteValue();
 }
 
-
 TEST(InstructionsTest, isEliminableCastPair) {
   LLVMContext C;
   DataLayout DL1("p1:32:32-p2:64:64:64:32");
diff --git a/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp
index bbcb24a2aced4..475fa94fa05ca 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMIRToLLVMTranslation.cpp
@@ -100,8 +100,8 @@ static LogicalResult convertIntrinsicImpl(OpBuilder &odsBuilder,
       return success();
     }
     // intrinsicID == llvm::Intrinsic::fma
-    auto op =
-        LLVM::FMAOp::create(odsBuilder, loc, resultTypes, mlirOperands, mlirAttrs);
+    auto op = LLVM::FMAOp::create(odsBuilder, loc, resultTypes, mlirOperands,
+                                  mlirAttrs);
     moduleImport.setFastmathFlagsAttr(inst, op);
     moduleImport.mapValue(inst) = op;
     return success();

>From 89bee03755fca5460e1bf077d7af7260115b56ab Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Mon, 13 Apr 2026 12:53:56 -0700
Subject: [PATCH 06/12] [AArch64] Restore sqdmlal/sqdmlsl GI test variants
 accidentally reverted

Our commit 15f52c4ff73c accidentally reverted changes from 36fa27fe3e11
'[AArch64][GlobalISel] Add patterns for scalar sqdmlal/sqdmlsl (#187246)'.
Restore the _v2i32 function names, SD/GI-split CHECK labels, and the
_v4i32 test variants that were lost during the rebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 llvm/test/CodeGen/AArch64/arm64-vmul.ll | 42 ++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/llvm/test/CodeGen/AArch64/arm64-vmul.ll b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
index 7b37b88a60055..f36b5a8181845 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vmul.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
@@ -1805,8 +1805,8 @@ entry:
   ret i64 %vqdmulls_s32.i
 }
 
-define i64 @sqdmlal_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
-; CHECK-SD-LABEL: sqdmlal_lane_1d:
+define i64 @sqdmlal_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
+; CHECK-SD-LABEL: sqdmlal_lane_1d_v2i32:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    fmov s1, w1
 ; CHECK-SD-NEXT:    fmov d2, x0
@@ -1815,7 +1815,7 @@ define i64 @sqdmlal_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
 ; CHECK-SD-NEXT:    fmov x0, d2
 ; CHECK-SD-NEXT:    ret
 ;
-; CHECK-GI-LABEL: sqdmlal_lane_1d:
+; CHECK-GI-LABEL: sqdmlal_lane_1d_v2i32:
 ; CHECK-GI:       // %bb.0:
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
 ; CHECK-GI-NEXT:    fmov s1, w1
@@ -1829,11 +1829,9 @@ define i64 @sqdmlal_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
   %res = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %A, i64 %prod)
   ret i64 %res
 }
-declare i64 @llvm.aarch64.neon.sqdmulls.scalar(i32, i32)
-declare i64 @llvm.aarch64.neon.sqadd.i64(i64, i64)
 
-define i64 @sqdmlsl_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
-; CHECK-SD-LABEL: sqdmlsl_lane_1d:
+define i64 @sqdmlsl_lane_1d_v2i32(i64 %A, i32 %B, <2 x i32> %C) nounwind {
+; CHECK-SD-LABEL: sqdmlsl_lane_1d_v2i32:
 ; CHECK-SD:       // %bb.0:
 ; CHECK-SD-NEXT:    fmov s1, w1
 ; CHECK-SD-NEXT:    fmov d2, x0
@@ -1842,7 +1840,7 @@ define i64 @sqdmlsl_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
 ; CHECK-SD-NEXT:    fmov x0, d2
 ; CHECK-SD-NEXT:    ret
 ;
-; CHECK-GI-LABEL: sqdmlsl_lane_1d:
+; CHECK-GI-LABEL: sqdmlsl_lane_1d_v2i32:
 ; CHECK-GI:       // %bb.0:
 ; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
 ; CHECK-GI-NEXT:    fmov s1, w1
@@ -1858,6 +1856,34 @@ define i64 @sqdmlsl_lane_1d(i64 %A, i32 %B, <2 x i32> %C) nounwind {
 }
 declare i64 @llvm.aarch64.neon.sqsub.i64(i64, i64)
 
+define i64 @sqdmlal_lane_1d_v4i32(i64 %A, i32 %B, <4 x i32> %C) nounwind {
+; CHECK-LABEL: sqdmlal_lane_1d_v4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, w1
+; CHECK-NEXT:    fmov d2, x0
+; CHECK-NEXT:    sqdmlal d2, s1, v0.s[1]
+; CHECK-NEXT:    fmov x0, d2
+; CHECK-NEXT:    ret
+  %rhs = extractelement <4 x i32> %C, i32 1
+  %prod = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %B, i32 %rhs)
+  %res = call i64 @llvm.aarch64.neon.sqadd.i64(i64 %A, i64 %prod)
+  ret i64 %res
+}
+
+define i64 @sqdmlsl_lane_1d_v4i32(i64 %A, i32 %B, <4 x i32> %C) nounwind {
+; CHECK-LABEL: sqdmlsl_lane_1d_v4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, w1
+; CHECK-NEXT:    fmov d2, x0
+; CHECK-NEXT:    sqdmlsl d2, s1, v0.s[1]
+; CHECK-NEXT:    fmov x0, d2
+; CHECK-NEXT:    ret
+  %rhs = extractelement <4 x i32> %C, i32 1
+  %prod = call i64 @llvm.aarch64.neon.sqdmulls.scalar(i32 %B, i32 %rhs)
+  %res = call i64 @llvm.aarch64.neon.sqsub.i64(i64 %A, i64 %prod)
+  ret i64 %res
+}
+
 
 define <4 x i32> @umlal_lane_4s(<4 x i16> %A, <4 x i16> %B, <4 x i32> %C) nounwind {
 ; CHECK-LABEL: umlal_lane_4s:

>From ba8fe6a4b48b1a054f7a24ac4f40c94546b3a2cd Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Mon, 13 Apr 2026 18:15:42 -0700
Subject: [PATCH 07/12] Fix fp-undef simplification and
 getFloatingPointMemoryEffects
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two bugs prevented constrained FP undef simplification from working
correctly after the constrained→bundle migration:

1. In simplifyBinaryIntrinsic: fadd/fsub/fmul/fdiv/frem cases were
   placed in the 3+ operand switch inside simplifyIntrinsic, which is
   unreachable for 2-argument intrinsics (the early return at
   NumOperands==2 routes to simplifyBinaryIntrinsic before the switch).
   Move them into simplifyBinaryIntrinsic where they are actually called.

2. In getFloatingPointMemoryEffects: unconditionally returned
   inaccessibleMemOnly() for any FP intrinsic in a strictfp function,
   even when fp.except=ignore. This prevented DCE of dead calls in
   strictfp functions. Now return none() when exception behavior is
   explicitly ebIgnore.

Update fp-undef-poison-strictfp.ll to reflect the corrected behavior:
undef+strict/defaultfp cases now simplify to NaN as expected.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 llvm/lib/Analysis/InstructionSimplify.cpp     |  50 +-
 llvm/lib/IR/Instructions.cpp                  |   4 +
 .../InstSimplify/fp-undef-poison-strictfp.ll  | 522 ++++++++++--------
 3 files changed, 326 insertions(+), 250 deletions(-)

diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 09831b9d394cd..7eac848f03566 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -7218,6 +7218,36 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
   case Intrinsic::aarch64_sve_umaxv:
   case Intrinsic::aarch64_sve_uminv:
     return simplifySVEIntReduction(IID, ReturnType, Op0, Op1);
+  case Intrinsic::fadd:
+    if (Call)
+      return simplifyFAddInst(Op0, Op1, Call->getFastMathFlags(), Q,
+                              Call->getExceptionBehavior(),
+                              Call->getRoundingMode());
+    return simplifyFAddInst(Op0, Op1, FastMathFlags(), Q);
+  case Intrinsic::fsub:
+    if (Call)
+      return simplifyFSubInst(Op0, Op1, Call->getFastMathFlags(), Q,
+                              Call->getExceptionBehavior(),
+                              Call->getRoundingMode());
+    return simplifyFSubInst(Op0, Op1, FastMathFlags(), Q);
+  case Intrinsic::fmul:
+    if (Call)
+      return simplifyFMulInst(Op0, Op1, Call->getFastMathFlags(), Q,
+                              Call->getExceptionBehavior(),
+                              Call->getRoundingMode());
+    return simplifyFMulInst(Op0, Op1, FastMathFlags(), Q);
+  case Intrinsic::fdiv:
+    if (Call)
+      return simplifyFDivInst(Op0, Op1, Call->getFastMathFlags(), Q,
+                              Call->getExceptionBehavior(),
+                              Call->getRoundingMode());
+    return simplifyFDivInst(Op0, Op1, FastMathFlags(), Q);
+  case Intrinsic::frem:
+    if (Call)
+      return simplifyFRemInst(Op0, Op1, Call->getFastMathFlags(), Q,
+                              Call->getExceptionBehavior(),
+                              Call->getRoundingMode());
+    return simplifyFRemInst(Op0, Op1, FastMathFlags(), Q);
   default:
     break;
   }
@@ -7380,26 +7410,6 @@ static Value *simplifyIntrinsic(CallBase *Call, Value *Callee,
 
     return nullptr;
   }
-  case Intrinsic::fadd:
-    return simplifyFAddInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
-                            Call->getExceptionBehavior(),
-                            Call->getRoundingMode());
-  case Intrinsic::fsub:
-    return simplifyFSubInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
-                            Call->getExceptionBehavior(),
-                            Call->getRoundingMode());
-  case Intrinsic::fmul:
-    return simplifyFMulInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
-                            Call->getExceptionBehavior(),
-                            Call->getRoundingMode());
-  case Intrinsic::fdiv:
-    return simplifyFDivInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
-                            Call->getExceptionBehavior(),
-                            Call->getRoundingMode());
-  case Intrinsic::frem:
-    return simplifyFRemInst(Args[0], Args[1], Call->getFastMathFlags(), Q,
-                            Call->getExceptionBehavior(),
-                            Call->getRoundingMode());
   case Intrinsic::ldexp:
     return simplifyLdexp(Args[0], Args[1], Q, true);
   case Intrinsic::experimental_gc_relocate: {
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 0fe58a8786f40..74a2b0e686180 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -775,6 +775,10 @@ MemoryEffects CallBase::getFloatingPointMemoryEffects() const {
       if (const Function *F = BB->getParent())
         if (F->hasFnAttribute(Attribute::StrictFP))
           if (IntrinsicInst::isFloatingPointOperation(IntrID)) {
+            // If this operation explicitly ignores FP exceptions, it has no
+            // exception-related side effects even in a strictfp function.
+            if (getExceptionBehavior() == fp::ebIgnore)
+              return MemoryEffects::none();
             return MemoryEffects::inaccessibleMemOnly();
           }
   return MemoryEffects::none();
diff --git a/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll b/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
index 62d1de5bd7c03..8974e0fde8984 100644
--- a/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
@@ -1,5 +1,5 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s -passes=instsimplify -S | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt %s -passes=instsimplify -S | FileCheck %s
 
 ; TODO: the instructions with poison operands should return poison
 
@@ -8,7 +8,8 @@
 ;
 
 define float @fadd_undef_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op0_strict(
+; CHECK-LABEL: define float @fadd_undef_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -16,8 +17,9 @@ define float @fadd_undef_op0_strict(float %x) #0 {
 }
 
 define float @fadd_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fadd_undef_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -25,8 +27,9 @@ define float @fadd_undef_op0_maytrap(float %x) #0 {
 }
 
 define float @fadd_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fadd_undef_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -34,16 +37,17 @@ define float @fadd_undef_op0_upward(float %x) #0 {
 }
 
 define float @fadd_undef_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_undef_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fadd_poison_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op0_strict(
+; CHECK-LABEL: define float @fadd_poison_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -51,34 +55,36 @@ define float @fadd_poison_op0_strict(float %x) #0 {
 }
 
 define float @fadd_poison_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_poison_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fadd_poison_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_poison_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fadd_poison_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_poison_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fadd_undef_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op1_strict(
+; CHECK-LABEL: define float @fadd_undef_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -86,8 +92,9 @@ define float @fadd_undef_op1_strict(float %x) #0 {
 }
 
 define float @fadd_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fadd_undef_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -95,8 +102,9 @@ define float @fadd_undef_op1_maytrap(float %x) #0 {
 }
 
 define float @fadd_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fadd_undef_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -104,16 +112,17 @@ define float @fadd_undef_op1_upward(float %x) #0 {
 }
 
 define float @fadd_undef_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fadd_undef_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_undef_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fadd_poison_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op1_strict(
+; CHECK-LABEL: define float @fadd_poison_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -121,27 +130,28 @@ define float @fadd_poison_op1_strict(float %x) #0 {
 }
 
 define float @fadd_poison_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_poison_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fadd_poison_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_poison_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fadd_poison_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fadd_poison_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fadd_poison_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -152,7 +162,8 @@ define float @fadd_poison_op1_defaultfp(float %x) #0 {
 ;
 
 define float @fsub_undef_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op0_strict(
+; CHECK-LABEL: define float @fsub_undef_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -160,8 +171,9 @@ define float @fsub_undef_op0_strict(float %x) #0 {
 }
 
 define float @fsub_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fsub_undef_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -169,8 +181,9 @@ define float @fsub_undef_op0_maytrap(float %x) #0 {
 }
 
 define float @fsub_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fsub_undef_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -178,16 +191,17 @@ define float @fsub_undef_op0_upward(float %x) #0 {
 }
 
 define float @fsub_undef_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_undef_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fsub_poison_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op0_strict(
+; CHECK-LABEL: define float @fsub_poison_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -195,34 +209,36 @@ define float @fsub_poison_op0_strict(float %x) #0 {
 }
 
 define float @fsub_poison_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_poison_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fsub_poison_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_poison_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fsub_poison_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_poison_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fsub_undef_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op1_strict(
+; CHECK-LABEL: define float @fsub_undef_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -230,8 +246,9 @@ define float @fsub_undef_op1_strict(float %x) #0 {
 }
 
 define float @fsub_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fsub_undef_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -239,8 +256,9 @@ define float @fsub_undef_op1_maytrap(float %x) #0 {
 }
 
 define float @fsub_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fsub_undef_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -248,16 +266,17 @@ define float @fsub_undef_op1_upward(float %x) #0 {
 }
 
 define float @fsub_undef_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fsub_undef_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_undef_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fsub_poison_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op1_strict(
+; CHECK-LABEL: define float @fsub_poison_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -265,27 +284,28 @@ define float @fsub_poison_op1_strict(float %x) #0 {
 }
 
 define float @fsub_poison_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_poison_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fsub_poison_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_poison_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fsub_poison_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fsub_poison_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fsub_poison_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -296,7 +316,8 @@ define float @fsub_poison_op1_defaultfp(float %x) #0 {
 ;
 
 define float @fmul_undef_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op0_strict(
+; CHECK-LABEL: define float @fmul_undef_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -304,8 +325,9 @@ define float @fmul_undef_op0_strict(float %x) #0 {
 }
 
 define float @fmul_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fmul_undef_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -313,8 +335,9 @@ define float @fmul_undef_op0_maytrap(float %x) #0 {
 }
 
 define float @fmul_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fmul_undef_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -322,16 +345,17 @@ define float @fmul_undef_op0_upward(float %x) #0 {
 }
 
 define float @fmul_undef_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_undef_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fmul_poison_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op0_strict(
+; CHECK-LABEL: define float @fmul_poison_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -339,34 +363,36 @@ define float @fmul_poison_op0_strict(float %x) #0 {
 }
 
 define float @fmul_poison_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_poison_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fmul_poison_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_poison_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fmul_poison_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_poison_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fmul_undef_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op1_strict(
+; CHECK-LABEL: define float @fmul_undef_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -374,8 +400,9 @@ define float @fmul_undef_op1_strict(float %x) #0 {
 }
 
 define float @fmul_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fmul_undef_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -383,8 +410,9 @@ define float @fmul_undef_op1_maytrap(float %x) #0 {
 }
 
 define float @fmul_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fmul_undef_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -392,16 +420,17 @@ define float @fmul_undef_op1_upward(float %x) #0 {
 }
 
 define float @fmul_undef_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fmul_undef_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_undef_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fmul_poison_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op1_strict(
+; CHECK-LABEL: define float @fmul_poison_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -409,27 +438,28 @@ define float @fmul_poison_op1_strict(float %x) #0 {
 }
 
 define float @fmul_poison_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_poison_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fmul_poison_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_poison_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fmul_poison_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fmul_poison_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fmul_poison_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -440,7 +470,8 @@ define float @fmul_poison_op1_defaultfp(float %x) #0 {
 ;
 
 define float @fdiv_undef_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op0_strict(
+; CHECK-LABEL: define float @fdiv_undef_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -448,8 +479,9 @@ define float @fdiv_undef_op0_strict(float %x) #0 {
 }
 
 define float @fdiv_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fdiv_undef_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -457,8 +489,9 @@ define float @fdiv_undef_op0_maytrap(float %x) #0 {
 }
 
 define float @fdiv_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fdiv_undef_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -466,16 +499,17 @@ define float @fdiv_undef_op0_upward(float %x) #0 {
 }
 
 define float @fdiv_undef_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_undef_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fdiv_poison_op0_strict(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op0_strict(
+; CHECK-LABEL: define float @fdiv_poison_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -483,34 +517,36 @@ define float @fdiv_poison_op0_strict(float %x) #0 {
 }
 
 define float @fdiv_poison_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_poison_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fdiv_poison_op0_upward(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_poison_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fdiv_poison_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_poison_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fdiv_undef_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op1_strict(
+; CHECK-LABEL: define float @fdiv_undef_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -518,8 +554,9 @@ define float @fdiv_undef_op1_strict(float %x) #0 {
 }
 
 define float @fdiv_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fdiv_undef_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -527,8 +564,9 @@ define float @fdiv_undef_op1_maytrap(float %x) #0 {
 }
 
 define float @fdiv_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fdiv_undef_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -536,16 +574,17 @@ define float @fdiv_undef_op1_upward(float %x) #0 {
 }
 
 define float @fdiv_undef_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fdiv_undef_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_undef_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fdiv_poison_op1_strict(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op1_strict(
+; CHECK-LABEL: define float @fdiv_poison_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -553,27 +592,28 @@ define float @fdiv_poison_op1_strict(float %x) #0 {
 }
 
 define float @fdiv_poison_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_poison_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @fdiv_poison_op1_upward(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_poison_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @fdiv_poison_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @fdiv_poison_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @fdiv_poison_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -584,7 +624,8 @@ define float @fdiv_poison_op1_defaultfp(float %x) #0 {
 ;
 
 define float @frem_undef_op0_strict(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op0_strict(
+; CHECK-LABEL: define float @frem_undef_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -592,8 +633,9 @@ define float @frem_undef_op0_strict(float %x) #0 {
 }
 
 define float @frem_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op0_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @frem_undef_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -601,8 +643,9 @@ define float @frem_undef_op0_maytrap(float %x) #0 {
 }
 
 define float @frem_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op0_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @frem_undef_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -610,16 +653,17 @@ define float @frem_undef_op0_upward(float %x) #0 {
 }
 
 define float @frem_undef_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float undef, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_undef_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @frem_poison_op0_strict(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op0_strict(
+; CHECK-LABEL: define float @frem_poison_op0_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -627,34 +671,36 @@ define float @frem_poison_op0_strict(float %x) #0 {
 }
 
 define float @frem_poison_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_poison_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @frem_poison_op0_upward(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_poison_op0_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @frem_poison_op0_defaultfp(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float poison, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_poison_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float poison, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @frem_undef_op1_strict(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op1_strict(
+; CHECK-LABEL: define float @frem_undef_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -662,8 +708,9 @@ define float @frem_undef_op1_strict(float %x) #0 {
 }
 
 define float @frem_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op1_maytrap(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @frem_undef_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -671,8 +718,9 @@ define float @frem_undef_op1_maytrap(float %x) #0 {
 }
 
 define float @frem_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op1_upward(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @frem_undef_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[R]]
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -680,16 +728,17 @@ define float @frem_undef_op1_upward(float %x) #0 {
 }
 
 define float @frem_undef_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @frem_undef_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_undef_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @frem_poison_op1_strict(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op1_strict(
+; CHECK-LABEL: define float @frem_poison_op1_strict(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -697,27 +746,28 @@ define float @frem_poison_op1_strict(float %x) #0 {
 }
 
 define float @frem_poison_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_poison_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
 }
 
 define float @frem_poison_op1_upward(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_poison_op1_upward(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
 }
 
 define float @frem_poison_op1_defaultfp(float %x) #0 {
-; CHECK-LABEL: @frem_poison_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-LABEL: define float @frem_poison_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -728,8 +778,9 @@ define float @frem_poison_op1_defaultfp(float %x) #0 {
 ;
 
 define float @fma_undef_op0_strict(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op0_strict(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]])
+; CHECK-LABEL: define float @fma_undef_op0_strict(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X]], float [[Y]])
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -737,8 +788,9 @@ define float @fma_undef_op0_strict(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op0_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fma_undef_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X]], float [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -746,8 +798,8 @@ define float @fma_undef_op0_maytrap(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op0_upward(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_undef_op0_upward(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -755,8 +807,8 @@ define float @fma_undef_op0_upward(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op0_defaultfp(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_undef_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -764,8 +816,9 @@ define float @fma_undef_op0_defaultfp(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op0_strict(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op0_strict(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]])
+; CHECK-LABEL: define float @fma_poison_op0_strict(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X]], float [[Y]])
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -773,8 +826,9 @@ define float @fma_poison_op0_strict(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op0_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op0_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fma_poison_op0_maytrap(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X]], float [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -782,8 +836,8 @@ define float @fma_poison_op0_maytrap(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op0_upward(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_poison_op0_upward(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -791,8 +845,8 @@ define float @fma_poison_op0_upward(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op0_defaultfp(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float poison, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_poison_op0_defaultfp(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float poison, float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -800,8 +854,9 @@ define float @fma_poison_op0_defaultfp(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op1_strict(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op1_strict(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]])
+; CHECK-LABEL: define float @fma_undef_op1_strict(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float undef, float [[Y]])
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -809,8 +864,9 @@ define float @fma_undef_op1_strict(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op1_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fma_undef_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float undef, float [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -818,8 +874,8 @@ define float @fma_undef_op1_maytrap(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op1_upward(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_undef_op1_upward(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -827,8 +883,8 @@ define float @fma_undef_op1_upward(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op1_defaultfp(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float undef, float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_undef_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -836,8 +892,9 @@ define float @fma_undef_op1_defaultfp(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op1_strict(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op1_strict(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]])
+; CHECK-LABEL: define float @fma_poison_op1_strict(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float poison, float [[Y]])
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -845,8 +902,9 @@ define float @fma_poison_op1_strict(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op1_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op1_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fma_poison_op1_maytrap(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float poison, float [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -854,8 +912,8 @@ define float @fma_poison_op1_maytrap(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op1_upward(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_poison_op1_upward(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -863,8 +921,8 @@ define float @fma_poison_op1_upward(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op1_defaultfp(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float poison, float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_poison_op1_defaultfp(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float poison, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -872,8 +930,9 @@ define float @fma_poison_op1_defaultfp(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op2_strict(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op2_strict(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef)
+; CHECK-LABEL: define float @fma_undef_op2_strict(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float [[Y]], float undef)
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -881,8 +940,9 @@ define float @fma_undef_op2_strict(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op2_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op2_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fma_undef_op2_maytrap(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float [[Y]], float undef) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -890,8 +950,8 @@ define float @fma_undef_op2_maytrap(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op2_upward(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op2_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_undef_op2_upward(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -899,8 +959,8 @@ define float @fma_undef_op2_upward(float %x, float %y) #0 {
 }
 
 define float @fma_undef_op2_defaultfp(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_undef_op2_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float undef) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_undef_op2_defaultfp(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -908,8 +968,9 @@ define float @fma_undef_op2_defaultfp(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op2_strict(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op2_strict(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison)
+; CHECK-LABEL: define float @fma_poison_op2_strict(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float [[Y]], float poison)
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.dynamic", metadata !"fpexcept.strict")
@@ -917,8 +978,9 @@ define float @fma_poison_op2_strict(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op2_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op2_maytrap(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-LABEL: define float @fma_poison_op2_maytrap(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float [[Y]], float poison) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
@@ -926,8 +988,8 @@ define float @fma_poison_op2_maytrap(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op2_upward(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op2_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_poison_op2_upward(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -935,8 +997,8 @@ define float @fma_poison_op2_upward(float %x, float %y) #0 {
 }
 
 define float @fma_poison_op2_defaultfp(float %x, float %y) #0 {
-; CHECK-LABEL: @fma_poison_op2_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float poison) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-LABEL: define float @fma_poison_op2_defaultfp(
+; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    ret float poison
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float poison, metadata !"round.tonearest", metadata !"fpexcept.ignore")

>From d00e0930e8bc589112c43318213cf15c8d938287 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Mon, 13 Apr 2026 19:40:12 -0700
Subject: [PATCH 08/12] Fix getFloatingPointMemoryEffects for explicit vs.
 dynamic rounding mode

With fp.except=ignore and an *explicit* (non-dynamic) rounding mode,
neither exception flags nor the FP environment (MXCSR) are accessed, so
return MemoryEffects::none(). This allows EarlyCSE to merge adjacent
identical calls and lets DCE remove unused results.

With fp.except=ignore but a *dynamic* rounding mode, the operation still
reads the current rounding mode from the FP environment. Keep
inaccessibleMemOnly() (rather than a read-only variant) so that EarlyCSE
treats these calls as memory-writing and conservatively prevents CSE
across arbitrary function calls that could change the rounding mode.

Update IRBuilderTest expectations and regenerate 14 affected FileCheck
tests whose CHECK patterns changed due to the new memory-effects semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 llvm/lib/IR/Instructions.cpp                  | 15 +++-
 .../Transforms/EarlyCSE/defaultfp-strictfp.ll | 52 +++++------
 .../Transforms/EarlyCSE/nonmixed-strictfp.ll  | 26 +++---
 .../Transforms/EarlyCSE/round-dyn-strictfp.ll | 56 ++++++------
 .../Transforms/InstCombine/constrained.ll     |  2 -
 .../InstSimplify/X86/fp-nan-strictfp.ll       | 86 +++++++------------
 .../constant-fold-fp-denormal-strict.ll       |  9 --
 .../InstSimplify/constfold-constrained.ll     | 10 ---
 .../InstSimplify/fast-math-strictfp.ll        | 68 +++++----------
 .../Transforms/InstSimplify/fdiv-strictfp.ll  |  2 -
 .../floating-point-arithmetic-strictfp.ll     | 36 +++-----
 llvm/test/Transforms/InstSimplify/ldexp.ll    |  3 -
 .../Transforms/InstSimplify/strictfp-fadd.ll  | 67 ++++++---------
 .../Transforms/InstSimplify/strictfp-fsub.ll  | 48 ++++-------
 .../InstSimplify/strictfp-sqrt-nonneg.ll      | 46 ++++------
 llvm/unittests/IR/IRBuilderTest.cpp           |  8 +-
 16 files changed, 208 insertions(+), 326 deletions(-)

diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 74a2b0e686180..1482e7e240a1a 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -775,10 +775,17 @@ MemoryEffects CallBase::getFloatingPointMemoryEffects() const {
       if (const Function *F = BB->getParent())
         if (F->hasFnAttribute(Attribute::StrictFP))
           if (IntrinsicInst::isFloatingPointOperation(IntrID)) {
-            // If this operation explicitly ignores FP exceptions, it has no
-            // exception-related side effects even in a strictfp function.
-            if (getExceptionBehavior() == fp::ebIgnore)
-              return MemoryEffects::none();
+            if (getExceptionBehavior() == fp::ebIgnore) {
+              // Exceptions are ignored. If the rounding mode is also explicit
+              // (non-dynamic), there is no FP environment access at all.
+              if (getRoundingMode() != RoundingMode::Dynamic)
+                return MemoryEffects::none();
+              // Dynamic rounding mode: the operation reads the current rounding
+              // mode from the FP environment (e.g. MXCSR). Use
+              // inaccessibleMemOnly (not just Ref) so that EarlyCSE conservatively
+              // treats these as writes and prevents CSE across arbitrary function
+              // calls that might change the rounding mode.
+            }
             return MemoryEffects::inaccessibleMemOnly();
           }
   return MemoryEffects::none();
diff --git a/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll b/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll
index b5bce1d53a7a5..b478b625087b1 100644
--- a/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/defaultfp-strictfp.ll
@@ -137,10 +137,10 @@ define double @multiple_frem_split(double %a, double %b) #0 {
 
 define i32 @multiple_fptoui(double %a) #0 {
 ; CHECK-LABEL: @multiple_fptoui(
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
   %2 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -150,11 +150,11 @@ define i32 @multiple_fptoui(double %a) #0 {
 
 define i32 @multiple_fptoui_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fptoui_split(
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
   call void @arbitraryfunc() #0
@@ -191,10 +191,10 @@ define double @multiple_uitofp_split(i32 %a) #0 {
 
 define i32 @multiple_fptosi(double %a) #0 {
 ; CHECK-LABEL: @multiple_fptosi(
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
   %2 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -204,11 +204,11 @@ define i32 @multiple_fptosi(double %a) #0 {
 
 define i32 @multiple_fptosi_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fptosi_split(
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
   call void @arbitraryfunc() #0
@@ -245,12 +245,12 @@ define double @multiple_sitofp_split(i32 %a) #0 {
 
 define i1 @multiple_fcmp(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fcmp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
-; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP3]]
+; CHECK-NEXT:    [[TMP6:%.*]] = zext i1 [[TMP2]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP4]], i32 [[TMP6]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP2]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
   %2 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
@@ -264,11 +264,11 @@ define i1 @multiple_fcmp_split(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fcmp_split(
 ; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    call void @arbitraryfunc() #[[ATTR0]]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
-; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP3]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = zext i1 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP2]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP3]], i32 [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP2]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
   call void @arbitraryfunc() #0
diff --git a/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll b/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll
index bdcc0cfbd11c4..2cb99e0bbd170 100644
--- a/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/nonmixed-strictfp.ll
@@ -188,10 +188,10 @@ define double @frem_maytrap(double %a, double %b) #0 {
 
 define i32 @fptoui_defaultenv(double %a) #0 {
 ; CHECK-LABEL: @fptoui_defaultenv(
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptoui.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
   %2 = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -250,10 +250,10 @@ define double @uitofp_maytrap(i32 %a) #0 {
 
 define i32 @fptosi_defaultenv(double %a) #0 {
 ; CHECK-LABEL: @fptosi_defaultenv(
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.fptosi.i32.f64(double [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @bar.i32(i32 [[TMP1]], i32 [[TMP1]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i32 [[TMP2]]
 ;
   %1 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
   %2 = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double %a, metadata !"fpexcept.ignore") #0
@@ -312,12 +312,12 @@ define double @sitofp_maytrap(i32 %a) #0 {
 
 define i1 @fcmp_defaultenv(double %a, double %b) #0 {
 ; CHECK-LABEL: @fcmp_defaultenv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = call i1 @llvm.fcmp.f64(double [[A:%.*]], double [[B:%.*]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
-; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP2]], i32 [[TMP4]]) #[[ATTR0]]
-; CHECK-NEXT:    ret i1 [[TMP3]]
+; CHECK-NEXT:    [[TMP6:%.*]] = zext i1 [[TMP2]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @bar.i32(i32 [[TMP4]], i32 [[TMP6]]) #[[ATTR0]]
+; CHECK-NEXT:    ret i1 [[TMP2]]
 ;
   %1 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
   %2 = call i1 @llvm.experimental.constrained.fcmp.f64(double %a, double %b, metadata !"oeq", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll b/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll
index 9cd1953a38f9d..5be6c950f1122 100644
--- a/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll
+++ b/llvm/test/Transforms/EarlyCSE/round-dyn-strictfp.ll
@@ -9,10 +9,10 @@
 
 define double @multiple_fadd(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fadd(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fadd.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.fadd.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP4]]) #[[ATTR0:[0-9]+]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.fadd.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -37,10 +37,10 @@ define double @multiple_fadd_split(double %a, double %b) #0 {
 
 define double @multiple_fsub(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fsub(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fsub.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.fsub.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.fsub.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.fsub.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -65,10 +65,10 @@ define double @multiple_fsub_split(double %a, double %b) #0 {
 
 define double @multiple_fmul(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fmul(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fmul.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.fmul.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.fmul.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.fmul.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -93,10 +93,10 @@ define double @multiple_fmul_split(double %a, double %b) #0 {
 
 define double @multiple_fdiv(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_fdiv(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.fdiv.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.fdiv.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -121,10 +121,10 @@ define double @multiple_fdiv_split(double %a, double %b) #0 {
 
 define double @multiple_frem(double %a, double %b) #0 {
 ; CHECK-LABEL: @multiple_frem(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP2]]) #[[ATTR0]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.frem.f64(double [[A:%.*]], double [[B:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.frem.f64(double [[A]], double [[B]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP4]]) #[[ATTR0]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.frem.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.frem.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -149,10 +149,10 @@ define double @multiple_frem_split(double %a, double %b) #0 {
 
 define double @multiple_uitofp(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_uitofp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.uitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP2]]) #[[ATTR0]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -177,10 +177,10 @@ define double @multiple_uitofp_split(i32 %a) #0 {
 
 define double @multiple_sitofp(i32 %a) #0 {
 ; CHECK-LABEL: @multiple_sitofp(
-; CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP1]], double [[TMP1]]) #[[ATTR0]]
-; CHECK-NEXT:    ret double [[TMP2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP4:%.*]] = call double @llvm.sitofp.f64.i32(i32 [[A]]) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call double @foo.f64(double [[TMP2]], double [[TMP2]]) #[[ATTR0]]
+; CHECK-NEXT:    ret double [[TMP4]]
 ;
   %1 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %2 = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/Transforms/InstCombine/constrained.ll b/llvm/test/Transforms/InstCombine/constrained.ll
index 3eaea8a88110d..b9f6d88bd54ec 100644
--- a/llvm/test/Transforms/InstCombine/constrained.ll
+++ b/llvm/test/Transforms/InstCombine/constrained.ll
@@ -31,7 +31,6 @@ entry:
 define float @f_unused_ignore() #0 {
 ; CHECK-LABEL: @f_unused_ignore(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
 entry:
@@ -93,7 +92,6 @@ entry:
 define float @f_eval_ignore() #0 {
 ; CHECK-LABEL: @f_eval_ignore(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fdiv.f32(float 1.000000e+00, float 3.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x3FD5555540000000
 ;
 entry:
diff --git a/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll b/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll
index dc2cbb79a1ce0..9275891ad265e 100644
--- a/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/X86/fp-nan-strictfp.ll
@@ -16,7 +16,7 @@ define float @fadd_nan_op0_strict(float %x) #0 {
 define float @fadd_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -24,8 +24,7 @@ define float @fadd_nan_op0_maytrap(float %x) #0 {
 
 define float @fadd_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -33,8 +32,7 @@ define float @fadd_nan_op0_upward(float %x) #0 {
 
 define float @fadd_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -51,7 +49,7 @@ define float @fadd_nan_op1_strict(float %x) #0 {
 define float @fadd_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -59,8 +57,7 @@ define float @fadd_nan_op1_maytrap(float %x) #0 {
 
 define float @fadd_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -68,8 +65,7 @@ define float @fadd_nan_op1_upward(float %x) #0 {
 
 define float @fadd_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fadd_nan_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fadd.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -90,7 +86,7 @@ define float @fsub_nan_op0_strict(float %x) #0 {
 define float @fsub_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -98,8 +94,7 @@ define float @fsub_nan_op0_maytrap(float %x) #0 {
 
 define float @fsub_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -107,8 +102,7 @@ define float @fsub_nan_op0_upward(float %x) #0 {
 
 define float @fsub_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -125,7 +119,7 @@ define float @fsub_nan_op1_strict(float %x) #0 {
 define float @fsub_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -133,8 +127,7 @@ define float @fsub_nan_op1_maytrap(float %x) #0 {
 
 define float @fsub_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -142,8 +135,7 @@ define float @fsub_nan_op1_upward(float %x) #0 {
 
 define float @fsub_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fsub_nan_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fsub.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -164,7 +156,7 @@ define float @fmul_nan_op0_strict(float %x) #0 {
 define float @fmul_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -172,8 +164,7 @@ define float @fmul_nan_op0_maytrap(float %x) #0 {
 
 define float @fmul_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -181,8 +172,7 @@ define float @fmul_nan_op0_upward(float %x) #0 {
 
 define float @fmul_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -199,7 +189,7 @@ define float @fmul_nan_op1_strict(float %x) #0 {
 define float @fmul_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -207,8 +197,7 @@ define float @fmul_nan_op1_maytrap(float %x) #0 {
 
 define float @fmul_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -216,8 +205,7 @@ define float @fmul_nan_op1_upward(float %x) #0 {
 
 define float @fmul_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fmul_nan_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fmul.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -238,7 +226,7 @@ define float @fdiv_nan_op0_strict(float %x) #0 {
 define float @fdiv_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -246,8 +234,7 @@ define float @fdiv_nan_op0_maytrap(float %x) #0 {
 
 define float @fdiv_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -255,8 +242,7 @@ define float @fdiv_nan_op0_upward(float %x) #0 {
 
 define float @fdiv_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -273,7 +259,7 @@ define float @fdiv_nan_op1_strict(float %x) #0 {
 define float @fdiv_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -281,8 +267,7 @@ define float @fdiv_nan_op1_maytrap(float %x) #0 {
 
 define float @fdiv_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -290,8 +275,7 @@ define float @fdiv_nan_op1_upward(float %x) #0 {
 
 define float @fdiv_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @fdiv_nan_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fdiv.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -312,7 +296,7 @@ define float @frem_nan_op0_strict(float %x) #0 {
 define float @frem_nan_op0_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -320,8 +304,7 @@ define float @frem_nan_op0_maytrap(float %x) #0 {
 
 define float @frem_nan_op0_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -329,8 +312,7 @@ define float @frem_nan_op0_upward(float %x) #0 {
 
 define float @frem_nan_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float 0x7FF8000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float 0x7FF8000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -347,7 +329,7 @@ define float @frem_nan_op1_strict(float %x) #0 {
 define float @frem_nan_op1_maytrap(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_maytrap(
 ; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
   ret float %r
@@ -355,8 +337,7 @@ define float @frem_nan_op1_maytrap(float %x) #0 {
 
 define float @frem_nan_op1_upward(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
   ret float %r
@@ -364,8 +345,7 @@ define float @frem_nan_op1_upward(float %x) #0 {
 
 define float @frem_nan_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: @frem_nan_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.frem.f32(float [[X:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.frem.f32(float %x, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -395,7 +375,6 @@ define float @fma_nan_op0_maytrap(float %x, float %y) #0 {
 
 define float @fma_nan_op0_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op0_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -404,7 +383,6 @@ define float @fma_nan_op0_upward(float %x, float %y) #0 {
 
 define float @fma_nan_op0_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op0_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float 0x7FF8000000000000, float [[X:%.*]], float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float 0x7FF8000000000000, float %x, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -431,7 +409,6 @@ define float @fma_nan_op1_maytrap(float %x, float %y) #0 {
 
 define float @fma_nan_op1_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op1_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float 0x7FF8000000000000, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -440,7 +417,6 @@ define float @fma_nan_op1_upward(float %x, float %y) #0 {
 
 define float @fma_nan_op1_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op1_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float 0x7FF8000000000000, float [[Y:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float 0x7FF8000000000000, float %y, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -467,7 +443,6 @@ define float @fma_nan_op2_maytrap(float %x, float %y) #0 {
 
 define float @fma_nan_op2_upward(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op2_upward(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float 0x7FF8000000000000, metadata !"round.upward", metadata !"fpexcept.ignore")
@@ -476,7 +451,6 @@ define float @fma_nan_op2_upward(float %x, float %y) #0 {
 
 define float @fma_nan_op2_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: @fma_nan_op2_defaultfp(
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X:%.*]], float [[Y:%.*]], float 0x7FF8000000000000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0x7FF8000000000000
 ;
   %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float 0x7FF8000000000000, metadata !"round.tonearest", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll b/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll
index b08bb4fe2edc8..608a368a61dc3 100644
--- a/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll
+++ b/llvm/test/Transforms/InstSimplify/constant-fold-fp-denormal-strict.ll
@@ -7,7 +7,6 @@
 define float @test_float_fadd_ieee_strict() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_ieee_strict(
 ; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore")
@@ -17,7 +16,6 @@ define float @test_float_fadd_ieee_strict() #0 {
 define float @test_float_fadd_strict_ieee() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_ieee(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=ieee") ]
@@ -27,7 +25,6 @@ define float @test_float_fadd_strict_ieee() #0 {
 define float @test_float_fadd_strict_inzero() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_inzero(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=zero", metadata !"denorm.out=ieee") ]
@@ -37,7 +34,6 @@ define float @test_float_fadd_strict_inzero() #0 {
 define float @test_float_fadd_strict_inpzero() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_inpzero(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=pzero", metadata !"denorm.out=ieee") ]
@@ -47,7 +43,6 @@ define float @test_float_fadd_strict_inpzero() #0 {
 define float @test_float_fadd_strict_indyn() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_indyn(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=dyn", metadata !"denorm.out=ieee") ]
@@ -57,7 +52,6 @@ define float @test_float_fadd_strict_indyn() #0 {
 define float @test_float_fadd_strict_ieee_outzero() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_ieee_outzero(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=zero") ]
@@ -67,7 +61,6 @@ define float @test_float_fadd_strict_ieee_outzero() #0 {
 define float @test_float_fadd_strict_ieee_outpzero() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_ieee_outpzero(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=pzero") ]
@@ -77,7 +70,6 @@ define float @test_float_fadd_strict_ieee_outpzero() #0 {
 define float @test_float_fadd_strict_ieee_outdyn() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_ieee_outdyn(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee", metadata !"denorm.out=dyn") ]
@@ -87,7 +79,6 @@ define float @test_float_fadd_strict_ieee_outdyn() #0 {
 define float @test_float_fadd_strict_zero_outdef() #0 {
 ; CHECK-LABEL: define float @test_float_fadd_strict_zero_outdef(
 ; CHECK-SAME: ) #[[ATTR0]] {
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 0xB810000000000000, float 0x3800000000000000) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 0xB800000000000000
 ;
   %result = call float @llvm.experimental.constrained.fadd.f32(float 0xB810000000000000, float 0x3800000000000000, metadata !"round.towardzero", metadata !"fpexcept.ignore") [ "fp.control" (metadata !"denorm.in=ieee") ]
diff --git a/llvm/test/Transforms/InstSimplify/constfold-constrained.ll b/llvm/test/Transforms/InstSimplify/constfold-constrained.ll
index 07d55ca750037..14e269d2f5742 100644
--- a/llvm/test/Transforms/InstSimplify/constfold-constrained.ll
+++ b/llvm/test/Transforms/InstSimplify/constfold-constrained.ll
@@ -102,7 +102,6 @@ entry:
 define double @nearbyint_01() #0 {
 ; CHECK-LABEL: @nearbyint_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.nearbyint.f64(double 1.050000e+01) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 1.000000e+01
 ;
 entry:
@@ -245,7 +244,6 @@ entry:
 define float @fadd_01() #0 {
 ; CHECK-LABEL: @fadd_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call float @llvm.fadd.f32(float 1.000000e+01, float 2.000000e+01) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 3.000000e+01
 ;
 entry:
@@ -258,7 +256,6 @@ entry:
 define double @fadd_02() #0 {
 ; CHECK-LABEL: @fadd_02(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 2.000000e+00
 ;
 entry:
@@ -269,7 +266,6 @@ entry:
 define double @fadd_03() #0 {
 ; CHECK-LABEL: @fadd_03(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 1.000000e+00, double 0x3FF0000000000001) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 0x4000000000000001
 ;
 entry:
@@ -328,7 +324,6 @@ entry:
 define double @fadd_08() #0 {
 ; CHECK-LABEL: @fadd_08(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fadd.f64(double 0x7FEFFFFFFFFFFFFF, double 0x7FEFFFFFFFFFFFFF) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 0x7FF0000000000000
 ;
 entry:
@@ -350,7 +345,6 @@ entry:
 define half @fadd_10() #0 {
 ; CHECK-LABEL: @fadd_10(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call half @llvm.fadd.f16(half 0xH3C00, half 0xH4000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret half 0xH4200
 ;
 entry:
@@ -361,7 +355,6 @@ entry:
 define bfloat @fadd_11() #0 {
 ; CHECK-LABEL: @fadd_11(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call bfloat @llvm.fadd.bf16(bfloat 0xR3F80, bfloat 0xR4000) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret bfloat 0xR4040
 ;
 entry:
@@ -372,7 +365,6 @@ entry:
 define double @fsub_01() #0 {
 ; CHECK-LABEL: @fsub_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fsub.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double -1.000000e+00
 ;
 entry:
@@ -383,7 +375,6 @@ entry:
 define double @fmul_01() #0 {
 ; CHECK-LABEL: @fmul_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fmul.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 2.000000e+00
 ;
 entry:
@@ -394,7 +385,6 @@ entry:
 define double @fdiv_01() #0 {
 ; CHECK-LABEL: @fdiv_01(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[RESULT1:%.*]] = call double @llvm.fdiv.f64(double 1.000000e+00, double 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret double 5.000000e-01
 ;
 entry:
diff --git a/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll b/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll
index 6e06e2cdca1a4..8620cdd3fc92b 100644
--- a/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fast-math-strictfp.ll
@@ -4,8 +4,7 @@
 ;; x * 0 ==> 0 when no-nans and no-signed-zero
 define float @mul_zero_1(float %a) #0 {
 ; CHECK-LABEL: @mul_zero_1(
-; CHECK-NEXT:    [[B1:%.*]] = call nnan nsz float @llvm.fmul.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[B1]]
+; CHECK-NEXT:    ret float 0.000000e+00
 ;
   %b = call nsz nnan float @llvm.experimental.constrained.fmul.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %b
@@ -13,8 +12,7 @@ define float @mul_zero_1(float %a) #0 {
 
 define float @mul_zero_2(float %a) #0 {
 ; CHECK-LABEL: @mul_zero_2(
-; CHECK-NEXT:    [[B1:%.*]] = call fast float @llvm.fmul.f32(float 0.000000e+00, float [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[B1]]
+; CHECK-NEXT:    ret float 0.000000e+00
 ;
   %b = call fast float @llvm.experimental.constrained.fmul.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %b
@@ -22,8 +20,7 @@ define float @mul_zero_2(float %a) #0 {
 
 define <2 x float> @mul_zero_nsz_nnan_vec_poison(<2 x float> %a) #0 {
 ; CHECK-LABEL: @mul_zero_nsz_nnan_vec_poison(
-; CHECK-NEXT:    [[B1:%.*]] = call nnan nsz <2 x float> @llvm.fmul.v2f32(<2 x float> [[A:%.*]], <2 x float> <float 0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[B1]]
+; CHECK-NEXT:    ret <2 x float> zeroinitializer
 ;
   %b = call nsz nnan <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> %a, <2 x float><float 0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %b
@@ -72,9 +69,7 @@ define float @fadd_binary_fnegx(float %x) #0 {
 
 define float @fadd_unary_fnegx(float %x) #0 {
 ; CHECK-LABEL: @fadd_unary_fnegx(
-; CHECK-NEXT:    [[NEGX:%.*]] = fneg float [[X:%.*]]
-; CHECK-NEXT:    [[R1:%.*]] = call nnan float @llvm.fadd.f32(float [[NEGX]], float [[X]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R1]]
+; CHECK-NEXT:    ret float 0.000000e+00
 ;
   %negx = fneg float %x
   %r = call nnan float @llvm.experimental.constrained.fadd.f32(float %negx, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -96,9 +91,7 @@ define <2 x float> @fadd_binary_fnegx_commute_vec(<2 x float> %x) #0 {
 
 define <2 x float> @fadd_unary_fnegx_commute_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_unary_fnegx_commute_vec(
-; CHECK-NEXT:    [[NEGX:%.*]] = fneg <2 x float> [[X:%.*]]
-; CHECK-NEXT:    [[R1:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[X]], <2 x float> [[NEGX]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[R1]]
+; CHECK-NEXT:    ret <2 x float> zeroinitializer
 ;
   %negx = fneg <2 x float> %x
   %r = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> %negx, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -207,12 +200,10 @@ define float @fadd_fsub_nnan(float %x) #0 {
 define float @fsub_x_x(float %a) #0 {
 ; X - X ==> 0
 ; CHECK-LABEL: @fsub_x_x(
-; CHECK-NEXT:    [[ZERO15:%.*]] = call nnan float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[NO_ZERO16:%.*]] = call ninf float @llvm.fsub.f32(float [[A]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[NO_ZERO16:%.*]] = call ninf float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[NO_ZERO27:%.*]] = call float @llvm.fsub.f32(float [[A]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[NO_ZERO3:%.*]] = call float @llvm.fadd.f32(float [[NO_ZERO16]], float [[NO_ZERO27]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[NO_ZERO:%.*]] = call nsz float @llvm.fadd.f32(float [[NO_ZERO3]], float [[ZERO15]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[NO_ZERO]]
+; CHECK-NEXT:    ret float [[NO_ZERO3]]
 ;
   %zero1 = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
@@ -242,9 +233,7 @@ define float @fsub_0_0_x(float %a) #0 {
 ; fsub nsz 0.0, (fneg X) ==> X
 define float @fneg_x(float %a) #0 {
 ; CHECK-LABEL: @fneg_x(
-; CHECK-NEXT:    [[T1:%.*]] = fneg float [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %t1 = fneg float %a
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -264,9 +253,7 @@ define <2 x float> @fsub_0_0_x_vec_poison1(<2 x float> %a) #0 {
 
 define <2 x float> @fneg_x_vec_poison1(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fneg_x_vec_poison1(
-; CHECK-NEXT:    [[T1:%.*]] = fneg <2 x float> [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fsub.v2f32(<2 x float> <float 0.000000e+00, float poison>, <2 x float> [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %t1 = fneg <2 x float> %a
   %ret = call nsz <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> <float 0.0, float poison>, <2 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -288,8 +275,7 @@ define <2 x float> @fsub_0_0_x_vec_poison2(<2 x float> %a) #0 {
 
 define <2 x float> @fadd_zero_nsz_vec(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_zero_nsz_vec(
-; CHECK-NEXT:    [[X:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[X1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[X]]
+; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
 ;
   %r = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %r
@@ -297,8 +283,7 @@ define <2 x float> @fadd_zero_nsz_vec(<2 x float> %x) #0 {
 
 define <2 x float> @fadd_zero_nsz_vec_poison(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fadd_zero_nsz_vec_poison(
-; CHECK-NEXT:    [[X:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[X1:%.*]], <2 x float> <float 0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[X]]
+; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
 ;
   %r = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %x, <2 x float> <float 0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %r
@@ -320,8 +305,7 @@ define float @nofold_fadd_x_0(float %a) #0 {
 
 define float @fold_fadd_nsz_x_0(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %add
@@ -342,8 +326,7 @@ define float @fold_fadd_cannot_be_neg0_nsz_src_x_0(float %a, float %b) #0 {
 
 define float @fold_fadd_cannot_be_neg0_fabs_src_x_0(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_cannot_be_neg0_fabs_src_x_0(
-; CHECK-NEXT:    [[FABS1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fadd.f32(float [[FABS1]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[FABS:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret float [[FABS]]
 ;
   %fabs = call float @llvm.fabs.f32(float %a) #0
@@ -386,8 +369,7 @@ define float @fold_fadd_cannot_be_neg0_canonicalize_nsz_src_x_0(float %a, float
 
 define double @fdiv_zero_by_x(double %x) #0 {
 ; CHECK-LABEL: @fdiv_zero_by_x(
-; CHECK-NEXT:    [[R1:%.*]] = call nnan nsz double @llvm.fdiv.f64(double 0.000000e+00, double [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret double [[R1]]
+; CHECK-NEXT:    ret double 0.000000e+00
 ;
   %r = call nnan nsz double @llvm.experimental.constrained.fdiv.f64(double 0.0, double %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %r
@@ -395,8 +377,7 @@ define double @fdiv_zero_by_x(double %x) #0 {
 
 define <2 x double> @fdiv_zero_by_x_vec_poison(<2 x double> %x) #0 {
 ; CHECK-LABEL: @fdiv_zero_by_x_vec_poison(
-; CHECK-NEXT:    [[R1:%.*]] = call nnan nsz <2 x double> @llvm.fdiv.v2f64(<2 x double> <double 0.000000e+00, double poison>, <2 x double> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x double> [[R1]]
+; CHECK-NEXT:    ret <2 x double> zeroinitializer
 ;
   %r = call nnan nsz <2 x double> @llvm.experimental.constrained.fdiv.v2f64(<2 x double> <double 0.0, double poison>, <2 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x double> %r
@@ -407,8 +388,7 @@ define <2 x double> @fdiv_zero_by_x_vec_poison(<2 x double> %x) #0 {
 
 define double @frem_zero_by_x(double %x) #0 {
 ; CHECK-LABEL: @frem_zero_by_x(
-; CHECK-NEXT:    [[R1:%.*]] = call nnan double @llvm.frem.f64(double 0.000000e+00, double [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret double [[R1]]
+; CHECK-NEXT:    ret double 0.000000e+00
 ;
   %r = call nnan double @llvm.experimental.constrained.frem.f64(double 0.0, double %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %r
@@ -416,8 +396,7 @@ define double @frem_zero_by_x(double %x) #0 {
 
 define <2 x double> @frem_poszero_by_x_vec_poison(<2 x double> %x) #0 {
 ; CHECK-LABEL: @frem_poszero_by_x_vec_poison(
-; CHECK-NEXT:    [[R1:%.*]] = call nnan <2 x double> @llvm.frem.v2f64(<2 x double> <double 0.000000e+00, double poison>, <2 x double> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x double> [[R1]]
+; CHECK-NEXT:    ret <2 x double> zeroinitializer
 ;
   %r = call nnan <2 x double> @llvm.experimental.constrained.frem.v2f64(<2 x double> <double 0.0, double poison>, <2 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x double> %r
@@ -428,8 +407,7 @@ define <2 x double> @frem_poszero_by_x_vec_poison(<2 x double> %x) #0 {
 
 define double @frem_negzero_by_x(double %x) #0 {
 ; CHECK-LABEL: @frem_negzero_by_x(
-; CHECK-NEXT:    [[R1:%.*]] = call nnan double @llvm.frem.f64(double -0.000000e+00, double [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret double [[R1]]
+; CHECK-NEXT:    ret double -0.000000e+00
 ;
   %r = call nnan double @llvm.experimental.constrained.frem.f64(double -0.0, double %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %r
@@ -437,8 +415,7 @@ define double @frem_negzero_by_x(double %x) #0 {
 
 define <2 x double> @frem_negzero_by_x_vec_poison(<2 x double> %x) #0 {
 ; CHECK-LABEL: @frem_negzero_by_x_vec_poison(
-; CHECK-NEXT:    [[R1:%.*]] = call nnan <2 x double> @llvm.frem.v2f64(<2 x double> <double poison, double -0.000000e+00>, <2 x double> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x double> [[R1]]
+; CHECK-NEXT:    ret <2 x double> splat (double -0.000000e+00)
 ;
   %r = call nnan <2 x double> @llvm.experimental.constrained.frem.v2f64(<2 x double> <double poison, double -0.0>, <2 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x double> %r
@@ -446,8 +423,7 @@ define <2 x double> @frem_negzero_by_x_vec_poison(<2 x double> %x) #0 {
 
 define float @fdiv_self(float %f) #0 {
 ; CHECK-LABEL: @fdiv_self(
-; CHECK-NEXT:    [[DIV1:%.*]] = call nnan float @llvm.fdiv.f32(float [[F:%.*]], float [[F]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[DIV1]]
+; CHECK-NEXT:    ret float 1.000000e+00
 ;
   %div = call nnan float @llvm.experimental.constrained.fdiv.f32(float %f, float %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %div
@@ -535,9 +511,7 @@ declare double @llvm.sqrt.f64(double)
 
 define double @sqrt_squared(double %f) #0 {
 ; CHECK-LABEL: @sqrt_squared(
-; CHECK-NEXT:    [[SQRT1:%.*]] = call double @llvm.sqrt.f64(double [[F:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[MUL:%.*]] = call reassoc nnan nsz double @llvm.fmul.f64(double [[SQRT1]], double [[SQRT1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret double [[MUL]]
+; CHECK-NEXT:    ret double [[MUL:%.*]]
 ;
   %sqrt = call double @llvm.experimental.constrained.sqrt.f64(double %f, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   %mul = call reassoc nnan nsz double @llvm.experimental.constrained.fmul.f64(double %sqrt, double %sqrt, metadata !"round.tonearest", metadata !"fpexcept.ignore")
diff --git a/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll b/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
index 39a636cafe447..0c34e4691fbde 100644
--- a/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fdiv-strictfp.ll
@@ -3,7 +3,6 @@
 
 define float @fdiv_constant_fold() #0 {
 ; CHECK-LABEL: @fdiv_constant_fold(
-; CHECK-NEXT:    [[F1:%.*]] = call float @llvm.fdiv.f32(float 3.000000e+00, float 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.500000e+00
 ;
   %f = call float @llvm.experimental.constrained.fdiv.f32(float 3.0, float 2.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -33,7 +32,6 @@ define float @fdiv_constant_fold_strict2() #0 {
 
 define float @frem_constant_fold() #0 {
 ; CHECK-LABEL: @frem_constant_fold(
-; CHECK-NEXT:    [[F1:%.*]] = call float @llvm.frem.f32(float 3.000000e+00, float 2.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float 1.000000e+00
 ;
   %f = call float @llvm.experimental.constrained.frem.f32(float 3.0, float 2.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
diff --git a/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll b/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
index e4f7b8e9f59d0..928c2c89feb77 100644
--- a/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/floating-point-arithmetic-strictfp.ll
@@ -59,9 +59,7 @@ define float @fsub_-0_-0_x(float %a) #0 {
 ; fsub -0.0, (fneg X) ==> X
 define float @fneg_x(float %a) #0 {
 ; CHECK-LABEL: @fneg_x(
-; CHECK-NEXT:    [[T1:%.*]] = fneg float [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %t1 = fneg float %a
   %ret = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -81,9 +79,7 @@ define <2 x float> @fsub_-0_-0_x_vec(<2 x float> %a) #0 {
 
 define <2 x float> @fneg_x_vec(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fneg_x_vec(
-; CHECK-NEXT:    [[T1:%.*]] = fneg <2 x float> [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %t1 = fneg <2 x float> %a
   %ret = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float -0.0, float -0.0>, <2 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -103,9 +99,7 @@ define <2 x float> @fsub_-0_-0_x_vec_poison_elts(<2 x float> %a) #0 {
 
 define <2 x float> @fneg_x_vec_poison_elts(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fneg_x_vec_poison_elts(
-; CHECK-NEXT:    [[T1:%.*]] = fneg <2 x float> [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> <float -0.000000e+00, float poison>, <2 x float> [[T1]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %t1 = fneg <2 x float> %a
   %ret = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float><float -0.0, float poison>, <2 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -139,8 +133,7 @@ define float @fsub_0_-0_x(float %a) #0 {
 ; fsub X, 0 ==> X
 define float @fsub_x_0(float %x) #0 {
 ; CHECK-LABEL: @fsub_x_0(
-; CHECK-NEXT:    [[X:%.*]] = call float @llvm.fsub.f32(float [[X1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[X]]
+; CHECK-NEXT:    ret float [[X:%.*]]
 ;
   %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %r
@@ -148,8 +141,7 @@ define float @fsub_x_0(float %x) #0 {
 
 define <2 x float> @fsub_x_0_vec_poison(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fsub_x_0_vec_poison(
-; CHECK-NEXT:    [[X:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> [[X1:%.*]], <2 x float> <float poison, float 0.000000e+00>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[X]]
+; CHECK-NEXT:    ret <2 x float> [[X:%.*]]
 ;
   %r = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> %x, <2 x float><float poison, float 0.0>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %r
@@ -158,8 +150,7 @@ define <2 x float> @fsub_x_0_vec_poison(<2 x float> %x) #0 {
 ; fadd X, -0 ==> X
 define float @fadd_x_n0(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %ret
@@ -167,8 +158,7 @@ define float @fadd_x_n0(float %a) #0 {
 
 define <2 x float> @fadd_x_n0_vec_poison_elt(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_vec_poison_elt(
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> <float -0.000000e+00, float poison>) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> <float -0.0, float poison>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <2 x float> %ret
@@ -196,8 +186,7 @@ define <2 x float> @fadd_x_p0_vec_poison_elt(<2 x float> %a) #0 {
 ; fmul X, 1.0 ==> X
 define double @fmul_X_1(double %a) #0 {
 ; CHECK-LABEL: @fmul_X_1(
-; CHECK-NEXT:    [[A:%.*]] = call double @llvm.fmul.f64(double 1.000000e+00, double [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret double [[A]]
+; CHECK-NEXT:    ret double [[A:%.*]]
 ;
   %b = call double @llvm.experimental.constrained.fmul.f64(double 1.0, double %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret double %b
@@ -206,8 +195,7 @@ define double @fmul_X_1(double %a) #0 {
 ; Originally PR2642
 define <4 x float> @fmul_X_1_vec(<4 x float> %x) #0 {
 ; CHECK-LABEL: @fmul_X_1_vec(
-; CHECK-NEXT:    [[X:%.*]] = call <4 x float> @llvm.fmul.v4f32(<4 x float> [[X1:%.*]], <4 x float> splat (float 1.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <4 x float> [[X]]
+; CHECK-NEXT:    ret <4 x float> [[X:%.*]]
 ;
   %m = call <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float> %x, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret <4 x float> %m
@@ -216,8 +204,7 @@ define <4 x float> @fmul_X_1_vec(<4 x float> %x) #0 {
 ; fdiv X, 1.0 ==> X
 define float @fdiv_x_1(float %a) #0 {
 ; CHECK-LABEL: @fdiv_x_1(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fdiv.f32(float [[A1:%.*]], float 1.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call float @llvm.experimental.constrained.fdiv.f32(float %a, float 1.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
@@ -574,8 +561,7 @@ define float @new_fadd_neg0_rtz(float %a) #0 {
 
 define float @new_fadd_neg0_rtz_ignore(float %a) #0 {
 ; CHECK-LABEL: @new_fadd_neg0_rtz_ignore(
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[A:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
+; CHECK-NEXT:    ret float [[R:%.*]]
 ;
   %r = call float @llvm.fadd.f32(float %a, float -0.0)
   [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
diff --git a/llvm/test/Transforms/InstSimplify/ldexp.ll b/llvm/test/Transforms/InstSimplify/ldexp.ll
index ad67cbbbb49d7..2c87532f908d3 100644
--- a/llvm/test/Transforms/InstSimplify/ldexp.ll
+++ b/llvm/test/Transforms/InstSimplify/ldexp.ll
@@ -476,13 +476,10 @@ define void @constant_fold_ldexp_f32_val_strictfp(i32 %y) #0 {
 ; CHECK-LABEL: @constant_fold_ldexp_f32_val_strictfp(
 ; CHECK-NEXT:    [[SNAN_MAY_TRAP1:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0000020000000, i32 3) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[SNAN_MAY_NOT_TRAP2:%.*]] = call float @llvm.ldexp.f32.i32(float 0x7FF0000020000000, i32 3) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    [[UNKNOWN_ROUNDING3:%.*]] = call float @llvm.ldexp.f32.i32(float 2.500000e+00, i32 42) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    store volatile float 0x42A4000000000000, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NORMAL4:%.*]] = call float @llvm.ldexp.f32.i32(float 2.500000e+00, i32 42) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    store volatile float 0x42A4000000000000, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[NORMAL_DOWN5:%.*]] = call float @llvm.ldexp.f32.i32(float 2.500000e+00, i32 42) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    store volatile float 0x42A4000000000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll b/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
index d471cfad4bf96..fdbacddcd84ce 100644
--- a/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+++ b/llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
@@ -11,8 +11,7 @@
 
 define float @fadd_x_n0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %ret
@@ -20,8 +19,7 @@ define float @fadd_x_n0_defaultenv(float %a) #0 {
 
 define <2 x float> @fadd_vec_x_n0_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %ret
@@ -103,8 +101,7 @@ define <2 x float> @fadd_vec_x_n0_dynamic(<2 x float> %a) #0 {
 ; Test one of the remaining rounding modes and the rest will be fine.
 define float @fadd_x_n0_towardzero(float %a) #0 {
 ; CHECK-LABEL: @fadd_x_n0_towardzero(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
   ret float %ret
@@ -114,8 +111,7 @@ define float @fadd_x_n0_towardzero(float %a) #0 {
 ; Test one of the remaining rounding modes and the rest will be fine.
 define <2 x float> @fadd_vec_x_n0_towardzero(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_x_n0_towardzero(
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
   ret <2 x float> %ret
@@ -124,7 +120,7 @@ define <2 x float> @fadd_vec_x_n0_towardzero(<2 x float> %a) #0 {
 define float @fadd_nnan_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fadd_nnan_x_n0_ebmaytrap(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret float %ret
@@ -133,7 +129,7 @@ define float @fadd_nnan_x_n0_ebmaytrap(float %a) #0 {
 define <2 x float> @fadd_vec_nnan_x_n0_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_nnan_x_n0_ebmaytrap(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A1]]
 ;
   %ret = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %ret
@@ -142,7 +138,7 @@ define <2 x float> @fadd_vec_nnan_x_n0_ebmaytrap(<2 x float> %a) #0 {
 define float @fadd_nnan_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fadd_nnan_x_n0_ebstrict(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fadd.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fadd.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %ret
@@ -151,7 +147,7 @@ define float @fadd_nnan_x_n0_ebstrict(float %a) #0 {
 define <2 x float> @fadd_vec_nnan_x_n0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_nnan_x_n0_ebstrict(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> splat (float -0.000000e+00)) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A1]]
 ;
   %ret = call nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float><float -0.0, float -0.0>, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret <2 x float> %ret
@@ -179,8 +175,7 @@ define <2 x float> @fadd_vec_ninf_x_n0_ebstrict(<2 x float> %a) #0 {
 
 define float @fadd_n0_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fadd_n0_x_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fadd.f32(float -0.000000e+00, float [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call float @llvm.experimental.constrained.fadd.f32(float -0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %ret
@@ -188,8 +183,7 @@ define float @fadd_n0_x_defaultenv(float %a) #0 {
 
 define <2 x float> @fadd_vec_n0_x_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fadd_vec_n0_x_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> splat (float -0.000000e+00), <2 x float> [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %ret = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float -0.0, float -0.0>, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %ret
@@ -221,8 +215,7 @@ define <2 x float> @fadd_vec_n0_x_ebmaytrap(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_x_0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -230,8 +223,7 @@ define float @fold_fadd_nsz_x_0_defaultenv(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_x_0_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_x_0_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -239,8 +231,7 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_defaultenv(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_x_0_neginf(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_x_0_neginf(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.downward", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -248,8 +239,7 @@ define float @fold_fadd_nsz_x_0_neginf(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_x_0_neginf(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_x_0_neginf(
-; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.downward", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -276,7 +266,7 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_ebmaytrap(<2 x float> %a) #0 {
 define float @fold_fadd_nnan_nsz_x_0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nnan_nsz_x_0_ebmaytrap(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %add = call nnan nsz float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret float %add
@@ -285,7 +275,7 @@ define float @fold_fadd_nnan_nsz_x_0_ebmaytrap(float %a) #0 {
 define <2 x float> @fold_fadd_vec_nnan_nsz_x_0_ebmaytrap(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nnan_nsz_x_0_ebmaytrap(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A1]]
 ;
   %add = call nnan nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %add
@@ -312,7 +302,7 @@ define <2 x float> @fold_fadd_vec_nsz_x_0_ebstrict(<2 x float> %a) #0 {
 define float @fold_fadd_nsz_nnan_x_0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_nnan_x_0_ebstrict(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fadd.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %add = call nsz nnan float @llvm.experimental.constrained.fadd.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret float %add
@@ -321,7 +311,7 @@ define float @fold_fadd_nsz_nnan_x_0_ebstrict(float %a) #0 {
 define <2 x float> @fold_fadd_vec_nsz_nnan_x_0_ebstrict(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_nnan_x_0_ebstrict(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz <2 x float> @llvm.fadd.v2f32(<2 x float> [[A1:%.*]], <2 x float> zeroinitializer) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A1]]
 ;
   %add = call nsz nnan <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> %a, <2 x float> zeroinitializer, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   ret <2 x float> %add
@@ -329,8 +319,7 @@ define <2 x float> @fold_fadd_vec_nsz_nnan_x_0_ebstrict(<2 x float> %a) #0 {
 
 define float @fold_fadd_nsz_0_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fadd_nsz_0_x_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fadd.f32(float 0.000000e+00, float [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %add = call nsz float @llvm.experimental.constrained.fadd.f32(float 0.0, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -338,8 +327,7 @@ define float @fold_fadd_nsz_0_x_defaultenv(float %a) #0 {
 
 define <2 x float> @fold_fadd_vec_nsz_0_x_defaultenv(<2 x float> %a) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_nsz_0_x_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call nsz <2 x float> @llvm.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> [[A1:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[A]]
+; CHECK-NEXT:    ret <2 x float> [[A:%.*]]
 ;
   %add = call nsz <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> zeroinitializer, <2 x float> %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -391,8 +379,7 @@ define float @fold_fadd_qnan_qnan_ebstrict() #0 {
 
 define float @fold_fadd_snan_variable_ebignore(float %x) #0 {
 ; CHECK-LABEL: @fold_fadd_snan_variable_ebignore(
-; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[ADD1]]
+; CHECK-NEXT:    ret float 0x7FFC000000000000
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret float %add
@@ -403,7 +390,7 @@ define float @fold_fadd_snan_variable_ebignore(float %x) #0 {
 define float @fold_fadd_snan_variable_ebmaytrap(float %x) #0 {
 ; CHECK-LABEL: @fold_fadd_snan_variable_ebmaytrap(
 ; CHECK-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float 0x7FF4000000000000, float [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[ADD1]]
+; CHECK-NEXT:    ret float 0x7FFC000000000000
 ;
   %add = call float @llvm.experimental.constrained.fadd.f32(float 0x7ff4000000000000, float %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret float %add
@@ -413,8 +400,7 @@ define float @fold_fadd_snan_variable_ebmaytrap(float %x) #0 {
 
 define <2 x float> @fold_fadd_vec_snan_variable_ebignore(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_snan_variable_ebignore(
-; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0x7FF4000000000000, float 0xFFF4000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[ADD1]]
+; CHECK-NEXT:    ret <2 x float> <float 0x7FFC000000000000, float 0xFFFC000000000000>
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0x7ff4000000000000, float 0xfff4000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -425,7 +411,7 @@ define <2 x float> @fold_fadd_vec_snan_variable_ebignore(<2 x float> %x) #0 {
 define <2 x float> @fold_fadd_vec_snan_variable_ebmaytrap(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_snan_variable_ebmaytrap(
 ; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0xFFF4000000000000, float 0x7FF4000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret <2 x float> [[ADD1]]
+; CHECK-NEXT:    ret <2 x float> <float 0xFFFC000000000000, float 0x7FFC000000000000>
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0xfff4000000000000, float 0x7ff4000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %add
@@ -435,8 +421,7 @@ define <2 x float> @fold_fadd_vec_snan_variable_ebmaytrap(<2 x float> %x) #0 {
 
 define <2 x float> @fold_fadd_vec_partial_snan_variable_ebignore(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_partial_snan_variable_ebignore(
-; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0x7FF4000000000000, float 0xFFFF000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret <2 x float> [[ADD1]]
+; CHECK-NEXT:    ret <2 x float> <float 0x7FFC000000000000, float 0xFFFF000000000000>
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0x7ff4000000000000, float 0xffff000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   ret <2 x float> %add
@@ -447,7 +432,7 @@ define <2 x float> @fold_fadd_vec_partial_snan_variable_ebignore(<2 x float> %x)
 define <2 x float> @fold_fadd_vec_partial_snan_variable_ebmaytrap(<2 x float> %x) #0 {
 ; CHECK-LABEL: @fold_fadd_vec_partial_snan_variable_ebmaytrap(
 ; CHECK-NEXT:    [[ADD1:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> <float 0xFFF8000000000000, float 0x7FF4000000000000>, <2 x float> [[X:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret <2 x float> [[ADD1]]
+; CHECK-NEXT:    ret <2 x float> <float 0xFFF8000000000000, float 0x7FFC000000000000>
 ;
   %add = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float><float 0xfff8000000000000, float 0x7ff4000000000000>, <2 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   ret <2 x float> %add
diff --git a/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll b/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
index 73e205361e938..ae45e570097c6 100644
--- a/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+++ b/llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
@@ -11,8 +11,7 @@
 
 define float @fsub_x_p0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_x_p0_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %ret
@@ -31,7 +30,7 @@ define float @fsub_x_p0_ebmaytrap(float %a) #0 {
 define float @fsub_nnan_x_p0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_p0_ebmaytrap(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
   ret float %ret
@@ -51,7 +50,7 @@ define float @fsub_x_p0_ebstrict(float %a) #0 {
 define float @fsub_nnan_x_p0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_p0_ebstrict(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
   ret float %ret
@@ -91,8 +90,7 @@ define float @fsub_x_p0_dynamic(float %a) #0 {
 ; With nsz we don't have to worry about -0.0 so the transform is valid.
 define float @fsub_nsz_x_p0_neginf(float %a) #0 {
 ; CHECK-LABEL: @fsub_nsz_x_p0_neginf(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.downward", metadata !"fpexcept.ignore")
   ret float %ret
@@ -101,7 +99,7 @@ define float @fsub_nsz_x_p0_neginf(float %a) #0 {
 ; With nsz we don't have to worry about -0.0 so the transform is valid.
 define float @fsub_nsz_x_p0_dynamic(float %a) #0 {
 ; CHECK-LABEL: @fsub_nsz_x_p0_dynamic(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float [[A1:%.*]], float 0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[RET1:%.*]] = call nsz float @llvm.fsub.f32(float [[A:%.*]], float 0.000000e+00) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    ret float [[A]]
 ;
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float 0.0, metadata !"round.dynamic", metadata !"fpexcept.ignore")
@@ -115,8 +113,7 @@ define float @fsub_nsz_x_p0_dynamic(float %a) #0 {
 
 define float @fold_fsub_nsz_x_n0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nsz_x_n0_defaultenv(
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %sub = call nsz float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %sub
@@ -135,7 +132,7 @@ define float @fold_fsub_nsz_x_n0_ebmaytrap(float %a) #0 {
 define float @fold_fsub_nnan_nsz_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nnan_nsz_x_n0_ebmaytrap(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fsub.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %sub = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
   ret float %sub
@@ -155,7 +152,7 @@ define float @fold_fsub_nsz_x_n0_ebstrict(float %a) #0 {
 define float @fold_fsub_nsz_nnan_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_nsz_nnan_x_n0_ebstrict(
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fsub.f32(float [[A1:%.*]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %sub = call nsz nnan float @llvm.experimental.constrained.fsub.f32(float %a, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
   ret float %sub
@@ -168,8 +165,7 @@ define float @fold_fsub_nsz_nnan_x_n0_ebstrict(float %a) #0 {
 
 define float @fold_fsub_fabs_x_n0_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_x_n0_defaultenv(
-; CHECK-NEXT:    [[ABSA1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
-; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fsub.f32(float [[ABSA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    [[ABSA:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0:[0-9]+]]
 ; CHECK-NEXT:    ret float [[ABSA]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
@@ -193,7 +189,7 @@ define float @fold_fsub_fabs_nnan_x_n0_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_nnan_x_n0_ebmaytrap(
 ; CHECK-NEXT:    [[ABSA1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
 ; CHECK-NEXT:    [[ABSA:%.*]] = call nnan float @llvm.fsub.f32(float [[ABSA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[ABSA]]
+; CHECK-NEXT:    ret float [[ABSA1]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
   %sub = call nnan float @llvm.experimental.constrained.fsub.f32(float %absa, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -217,7 +213,7 @@ define float @fold_fsub_fabs_nnan_x_n0_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fold_fsub_fabs_nnan_x_n0_ebstrict(
 ; CHECK-NEXT:    [[ABSA1:%.*]] = call float @llvm.fabs.f32(float [[A:%.*]]) #[[ATTR0]]
 ; CHECK-NEXT:    [[ABSA:%.*]] = call nnan float @llvm.fsub.f32(float [[ABSA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[ABSA]]
+; CHECK-NEXT:    ret float [[ABSA1]]
 ;
   %absa = call float @llvm.fabs.f32(float %a) #0
   %sub = call nnan float @llvm.experimental.constrained.fsub.f32(float %absa, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -227,8 +223,7 @@ define float @fold_fsub_fabs_nnan_x_n0_ebstrict(float %a) #0 {
 define float @fold_fsub_sitofp_x_n0_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @fold_fsub_sitofp_x_n0_defaultenv(
 ; CHECK-NEXT:    [[FPA2:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[FPA:%.*]] = call float @llvm.fsub.f32(float [[FPA2]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[FPA]]
+; CHECK-NEXT:    ret float [[FPA2]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   %sub = call float @llvm.experimental.constrained.fsub.f32(float %fpa, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -241,9 +236,7 @@ define float @fold_fsub_sitofp_x_n0_defaultenv(i32 %a) #0 {
 
 define float @fsub_fneg_n0_fnX_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_n0_fnX_defaultenv(
-; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %nega = fneg float %a
   %ret = call float @llvm.experimental.constrained.fsub.f32(float -0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -266,7 +259,7 @@ define float @fsub_fneg_nnan_n0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nnan_n0_fnX_ebmaytrap(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
 ; CHECK-NEXT:    [[A:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %nega = fneg float %a
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -290,7 +283,7 @@ define float @fsub_fneg_nnan_n0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nnan_n0_fnX_ebstrict(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
 ; CHECK-NEXT:    [[RET1:%.*]] = call nnan float @llvm.fsub.f32(float -0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[RET1]]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %nega = fneg float %a
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float -0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -367,9 +360,7 @@ define float @fsub_fsub_nnan_n0_fnX_ebstrict(float %a) #0 {
 
 define float @fsub_fneg_nsz_p0_fnX_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nsz_p0_fnX_defaultenv(
-; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
-; CHECK-NEXT:    [[A:%.*]] = call nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A:%.*]]
 ;
   %nega = fneg float %a
   %ret = call nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.ignore")
@@ -392,7 +383,7 @@ define float @fsub_fneg_nsz_nnan_p0_fnX_ebmaytrap(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nsz_nnan_p0_fnX_ebmaytrap(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A1:%.*]]
 ; CHECK-NEXT:    [[A:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[A]]
+; CHECK-NEXT:    ret float [[A1]]
 ;
   %nega = fneg float %a
   %ret = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
@@ -416,7 +407,7 @@ define float @fsub_fneg_nnan_nsz_p0_fnX_ebstrict(float %a) #0 {
 ; CHECK-LABEL: @fsub_fneg_nnan_nsz_p0_fnX_ebstrict(
 ; CHECK-NEXT:    [[NEGA:%.*]] = fneg float [[A:%.*]]
 ; CHECK-NEXT:    [[RET1:%.*]] = call nnan nsz float @llvm.fsub.f32(float 0.000000e+00, float [[NEGA]]) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[RET1]]
+; CHECK-NEXT:    ret float [[A]]
 ;
   %nega = fneg float %a
   %ret = call nnan nsz float @llvm.experimental.constrained.fsub.f32(float 0.0, float %nega, metadata !"round.tonearest", metadata !"fpexcept.strict")
@@ -503,8 +494,7 @@ define float @fsub_x_x_defaultenv(float %a) #0 {
 
 define float @fsub_nnan_x_x_defaultenv(float %a) #0 {
 ; CHECK-LABEL: @fsub_nnan_x_x_defaultenv(
-; CHECK-NEXT:    [[RET1:%.*]] = call nnan float @llvm.fsub.f32(float [[A:%.*]], float [[A]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[RET1]]
+; CHECK-NEXT:    ret float 0.000000e+00
 ;
   %ret = call nnan float @llvm.experimental.constrained.fsub.f32(float %a, float %a, metadata !"round.tonearest", metadata !"fpexcept.ignore")
   ret float %ret
diff --git a/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll b/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll
index b40973afaeabd..798737e311899 100644
--- a/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll
+++ b/llvm/test/Transforms/InstSimplify/strictfp-sqrt-nonneg.ll
@@ -9,8 +9,7 @@ define float @nonneg_u_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_defaultenv(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -22,8 +21,7 @@ define float @nonneg_s_defaultenv(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_defaultenv(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
@@ -36,7 +34,7 @@ define float @nonneg_u_maytrap(i32 %a) #0 {
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -49,7 +47,7 @@ define float @nonneg_s_maytrap(i32 %a) #0 {
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.maytrap") #0
@@ -63,7 +61,7 @@ define float @nonneg_u_ebstrict(i32 %a) #0 {
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -77,7 +75,7 @@ define float @nonneg_s_ebstrict(i32 %a) #0 {
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rte") ]
 ; CHECK-NEXT:    [[SQRA:%.*]] = call nnan float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rte") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
@@ -92,8 +90,7 @@ define float @nonneg_u_downward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_downward(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.downward", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.downward", metadata !"fpexcept.ignore") #0
@@ -105,8 +102,7 @@ define float @nonneg_s_downward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_downward(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.downward", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.downward", metadata !"fpexcept.ignore") #0
@@ -118,8 +114,7 @@ define float @nonneg_u_upward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_upward(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.upward", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.upward", metadata !"fpexcept.ignore") #0
@@ -131,8 +126,7 @@ define float @nonneg_s_upward(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_upward(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.upward", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.upward", metadata !"fpexcept.ignore") #0
@@ -144,8 +138,7 @@ define float @nonneg_u_towardzero(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_towardzero(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
@@ -157,8 +150,7 @@ define float @nonneg_s_towardzero(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_towardzero(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.towardzero", metadata !"fpexcept.ignore") #0
@@ -170,8 +162,7 @@ define float @nonneg_u_tonearestaway(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_tonearestaway(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #0
@@ -183,8 +174,7 @@ define float @nonneg_s_tonearestaway(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_tonearestaway(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.control"(metadata !"rmm"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearestaway", metadata !"fpexcept.ignore") #0
@@ -196,8 +186,8 @@ define float @nonneg_u_dynamic(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_u_dynamic(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    [[SUB2:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
@@ -209,8 +199,8 @@ define float @nonneg_s_dynamic(i32 %a) #0 {
 ; CHECK-LABEL: @nonneg_s_dynamic(
 ; CHECK-NEXT:    [[FPA3:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A:%.*]]) [ "fp.except"(metadata !"ignore") ]
 ; CHECK-NEXT:    [[SQRA1:%.*]] = call float @llvm.sqrt.f32(float [[FPA3]]) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    [[SQRA:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[SQRA]]
+; CHECK-NEXT:    [[SUB2:%.*]] = call float @llvm.fsub.f32(float [[SQRA1]], float -0.000000e+00) [ "fp.except"(metadata !"ignore") ]
+; CHECK-NEXT:    ret float [[SQRA1]]
 ;
   %fpa = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %a, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
   %sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0
diff --git a/llvm/unittests/IR/IRBuilderTest.cpp b/llvm/unittests/IR/IRBuilderTest.cpp
index 51eb59db95759..47956c533f598 100644
--- a/llvm/unittests/IR/IRBuilderTest.cpp
+++ b/llvm/unittests/IR/IRBuilderTest.cpp
@@ -623,11 +623,12 @@ TEST_F(IRBuilderTest, FPBundlesStrict) {
     EXPECT_EQ(RoundingMode::TowardZero, I->getRoundingMode());
     EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
     MemoryEffects ME = I->getMemoryEffects();
-    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+    // fp.except=ignore + explicit non-dynamic rounding mode: no FP env access.
+    EXPECT_TRUE(ME.doesNotAccessMemory());
   }
 
   // Check call with both FP bundles.
-  // nearbyint(%x) [ "fp.control" (metadata !"rtz"),
+  // nearbyint(%x) [ "fp.control" (metadata !"rte"),
   //                 "fp.except" (metadata !"ignore") ]
   {
     SmallVector<OperandBundleDef, 1> Bundles;
@@ -641,7 +642,8 @@ TEST_F(IRBuilderTest, FPBundlesStrict) {
     EXPECT_EQ(RoundingMode::NearestTiesToEven, I->getRoundingMode());
     EXPECT_EQ(fp::ebIgnore, I->getExceptionBehavior());
     MemoryEffects ME = I->getMemoryEffects();
-    EXPECT_TRUE(ME.doesAccessInaccessibleMem());
+    // fp.except=ignore + explicit non-dynamic rounding mode: no FP env access.
+    EXPECT_TRUE(ME.doesNotAccessMemory());
   }
 
   // Integer intrinsics never receive FP operand bundles and have no FP

>From baa4446cbe179682de098314386644b00cc212e2 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Tue, 14 Apr 2026 00:16:39 -0700
Subject: [PATCH 09/12] [Clang] Update constrained FP test CHECK patterns for
 operand bundle IR

Update CHECK lines in Clang CodeGen tests to match the new IR format
produced by PR #191613, which replaces @llvm.experimental.constrained.*
intrinsics with plain instructions/intrinsics carrying fp.control and
fp.except operand bundles.

Key pattern changes:
- @llvm.experimental.constrained.fadd/fsub/fmul/fdiv/etc. ->
  call @llvm.fadd/etc. with ["fp.control"(metadata \!"rte")] bundle
- Dynamic RM + strict -> plain fadd/fsub/etc. instructions
- @llvm.experimental.constrained.fcmp/fcmps -> unified @llvm.fcmp
- @llvm.experimental.constrained.roundeven/floor/ceil/trunc ->
  @llvm.roundeven/etc. with ["fp.except"(metadata \!"ignore")] bundle

Auto-generated tests regenerated with update_cc_test_checks.py using
the updated clang and opt binaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 clang/test/CodeGen/RISCV/fpconstrained.c      |    6 +-
 .../X86/avx-builtins-constrained-cmp.c        |  224 +-
 .../X86/sse-builtins-constrained-cmp.c        |   24 +-
 .../X86/sse2-builtins-constrained-cmp.c       |   24 +-
 .../CodeGen/X86/sse41-builtins-constrained.c  |    8 +-
 clang/test/CodeGen/X86/strictfp_builtins.c    |    6 +-
 clang/test/CodeGen/builtin_float_strictfp.c   |  161 +-
 clang/test/CodeGen/complex-strictfp.c         |   40 +-
 .../test/CodeGen/constrained-math-builtins.c  |  814 +++--
 clang/test/CodeGen/cx-complex-range-real.c    |   84 +-
 clang/test/CodeGen/cx-complex-range.c         |  836 ++---
 clang/test/CodeGen/exprs-strictfp.c           |   15 +-
 .../test/CodeGen/fp-contract-fast-pragma.cpp  |  147 +-
 clang/test/CodeGen/fp-floatcontrol-class.cpp  |   23 +-
 clang/test/CodeGen/fp-floatcontrol-pragma.cpp | 3029 ++++++++++++++++-
 clang/test/CodeGen/fp-floatcontrol-stack.cpp  |  802 ++++-
 clang/test/CodeGen/fp-strictfp-exp.cpp        |    6 +-
 clang/test/CodeGen/fp-template.cpp            |   53 +-
 clang/test/CodeGen/fpconstrained-cmp-double.c |  365 +-
 clang/test/CodeGen/fpconstrained-cmp-float.c  |  365 +-
 clang/test/CodeGen/fpconstrained.c            |   67 +-
 clang/test/CodeGen/fpconstrained.cpp          |   73 +-
 clang/test/CodeGen/math-libcalls.c            | 2410 ++++++++++---
 clang/test/CodeGen/pragma-fenv_access.c       | 1485 +++++++-
 .../CodeGen/strictfp-elementwise-builtins.cpp |   72 +-
 clang/test/CodeGen/strictfp_builtins.c        |   60 +-
 26 files changed, 9210 insertions(+), 1989 deletions(-)

diff --git a/clang/test/CodeGen/RISCV/fpconstrained.c b/clang/test/CodeGen/RISCV/fpconstrained.c
index d5a7a4aab1556..f520db504b92a 100644
--- a/clang/test/CodeGen/RISCV/fpconstrained.c
+++ b/clang/test/CodeGen/RISCV/fpconstrained.c
@@ -13,9 +13,9 @@ float f0, f1, f2;
 void foo(void) {
   // CHECK-LABEL: define {{.*}}void @foo()
 
-  // MAYTRAP: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-  // EXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // FPMODELSTRICT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
+  // MAYTRAP: call fast float @llvm.fadd.f32(float %{{.*}}, float %{{.*}}) {{.*}}[ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+  // EXCEPT: call fast float @llvm.fadd.f32(float %{{.*}}, float %{{.*}}) {{.*}}[ "fp.control"(metadata !"rte") ]
+  // FPMODELSTRICT: fadd float %{{.*}}, %{{.*}}
   // STRICTEXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
   // STRICTNOEXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.ignore")
   // PRECISE: fadd contract float %{{.*}}, %{{.*}}
diff --git a/clang/test/CodeGen/X86/avx-builtins-constrained-cmp.c b/clang/test/CodeGen/X86/avx-builtins-constrained-cmp.c
index c81282b0de8e7..ef653425b61c2 100644
--- a/clang/test/CodeGen/X86/avx-builtins-constrained-cmp.c
+++ b/clang/test/CodeGen/X86/avx-builtins-constrained-cmp.c
@@ -13,67 +13,67 @@
 
 __m256d test_mm256_cmp_pd_eq_oq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_eq_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oeq")
   return _mm256_cmp_pd(a, b, _CMP_EQ_OQ);
 }
 
 __m256d test_mm256_cmp_pd_lt_os(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_lt_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"olt")
   return _mm256_cmp_pd(a, b, _CMP_LT_OS);
 }
 
 __m256d test_mm256_cmp_pd_le_os(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_le_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ole")
   return _mm256_cmp_pd(a, b, _CMP_LE_OS);
 }
 
 __m256d test_mm256_cmp_pd_unord_q(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_unord_q
-  // CHECK: all <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: all <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uno")
   return _mm256_cmp_pd(a, b, _CMP_UNORD_Q);
 }
 
 __m256d test_mm256_cmp_pd_neq_uq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_neq_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"une")
   return _mm256_cmp_pd(a, b, _CMP_NEQ_UQ);
 }
 
 __m256d test_mm256_cmp_pd_nlt_us(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_nlt_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uge")
   return _mm256_cmp_pd(a, b, _CMP_NLT_US);
 }
 
 __m256d test_mm256_cmp_pd_nle_us(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_nle_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ugt")
   return _mm256_cmp_pd(a, b, _CMP_NLE_US);
 }
 
 __m256d test_mm256_cmp_pd_ord_q(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_ord_q
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ord")
   return _mm256_cmp_pd(a, b, _CMP_ORD_Q);
 }
 
 __m256d test_mm256_cmp_pd_eq_uq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_eq_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ueq")
   return _mm256_cmp_pd(a, b, _CMP_EQ_UQ);
 }
 
 __m256d test_mm256_cmp_pd_nge_us(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_nge_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ult")
   return _mm256_cmp_pd(a, b, _CMP_NGE_US);
 }
 
 __m256d test_mm256_cmp_pd_ngt_us(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_ngt_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ule")
   return _mm256_cmp_pd(a, b, _CMP_NGT_US);
 }
 
@@ -85,19 +85,19 @@ __m256d test_mm256_cmp_pd_false_oq(__m256d a, __m256d b) {
 
 __m256d test_mm256_cmp_pd_neq_oq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_neq_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"one")
   return _mm256_cmp_pd(a, b, _CMP_NEQ_OQ);
 }
 
 __m256d test_mm256_cmp_pd_ge_os(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_ge_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oge")
   return _mm256_cmp_pd(a, b, _CMP_GE_OS);
 }
 
 __m256d test_mm256_cmp_pd_gt_os(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_gt_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ogt")
   return _mm256_cmp_pd(a, b, _CMP_GT_OS);
 }
 
@@ -109,67 +109,67 @@ __m256d test_mm256_cmp_pd_true_uq(__m256d a, __m256d b) {
 
 __m256d test_mm256_cmp_pd_eq_os(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_eq_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oeq")
   return _mm256_cmp_pd(a, b, _CMP_EQ_OS);
 }
 
 __m256d test_mm256_cmp_pd_lt_oq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_lt_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"olt")
   return _mm256_cmp_pd(a, b, _CMP_LT_OQ);
 }
 
 __m256d test_mm256_cmp_pd_le_oq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_le_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ole")
   return _mm256_cmp_pd(a, b, _CMP_LE_OQ);
 }
 
 __m256d test_mm256_cmp_pd_unord_s(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_unord_s
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uno")
   return _mm256_cmp_pd(a, b, _CMP_UNORD_S);
 }
 
 __m256d test_mm256_cmp_pd_neq_us(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_neq_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"une")
   return _mm256_cmp_pd(a, b, _CMP_NEQ_US);
 }
 
 __m256d test_mm256_cmp_pd_nlt_uq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_nlt_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"uge")
   return _mm256_cmp_pd(a, b, _CMP_NLT_UQ);
 }
 
 __m256d test_mm256_cmp_pd_nle_uq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_nle_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ugt")
   return _mm256_cmp_pd(a, b, _CMP_NLE_UQ);
 }
 
 __m256d test_mm256_cmp_pd_ord_s(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_ord_s
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ord")
   return _mm256_cmp_pd(a, b, _CMP_ORD_S);
 }
 
 __m256d test_mm256_cmp_pd_eq_us(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_eq_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ueq")
   return _mm256_cmp_pd(a, b, _CMP_EQ_US);
 }
 
 __m256d test_mm256_cmp_pd_nge_uq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_nge_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ult")
   return _mm256_cmp_pd(a, b, _CMP_NGE_UQ);
 }
 
 __m256d test_mm256_cmp_pd_ngt_uq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_ngt_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ule")
   return _mm256_cmp_pd(a, b, _CMP_NGT_UQ);
 }
 
@@ -181,19 +181,19 @@ __m256d test_mm256_cmp_pd_false_os(__m256d a, __m256d b) {
 
 __m256d test_mm256_cmp_pd_neq_os(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_neq_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"one")
   return _mm256_cmp_pd(a, b, _CMP_NEQ_OS);
 }
 
 __m256d test_mm256_cmp_pd_ge_oq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_ge_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"oge")
   return _mm256_cmp_pd(a, b, _CMP_GE_OQ);
 }
 
 __m256d test_mm256_cmp_pd_gt_oq(__m256d a, __m256d b) {
   // CHECK-LABEL: test_mm256_cmp_pd_gt_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !"ogt")
   return _mm256_cmp_pd(a, b, _CMP_GT_OQ);
 }
 
@@ -205,67 +205,67 @@ __m256d test_mm256_cmp_pd_true_us(__m256d a, __m256d b) {
 
 __m256 test_mm256_cmp_ps_eq_oq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_eq_oq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oeq")
   return _mm256_cmp_ps(a, b, _CMP_EQ_OQ);
 }
 
 __m256 test_mm256_cmp_ps_lt_os(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_lt_os
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"olt")
   return _mm256_cmp_ps(a, b, _CMP_LT_OS);
 }
 
 __m256 test_mm256_cmp_ps_le_os(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_le_os
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ole")
   return _mm256_cmp_ps(a, b, _CMP_LE_OS);
 }
 
 __m256 test_mm256_cmp_ps_unord_q(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_unord_q
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uno")
   return _mm256_cmp_ps(a, b, _CMP_UNORD_Q);
 }
 
 __m256 test_mm256_cmp_ps_neq_uq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_neq_uq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"une")
   return _mm256_cmp_ps(a, b, _CMP_NEQ_UQ);
 }
 
 __m256 test_mm256_cmp_ps_nlt_us(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_nlt_us
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uge")
   return _mm256_cmp_ps(a, b, _CMP_NLT_US);
 }
 
 __m256 test_mm256_cmp_ps_nle_us(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_nle_us
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ugt")
   return _mm256_cmp_ps(a, b, _CMP_NLE_US);
 }
 
 __m256 test_mm256_cmp_ps_ord_q(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_ord_q
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ord")
   return _mm256_cmp_ps(a, b, _CMP_ORD_Q);
 }
 
 __m256 test_mm256_cmp_ps_eq_uq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_eq_uq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ueq")
   return _mm256_cmp_ps(a, b, _CMP_EQ_UQ);
 }
 
 __m256 test_mm256_cmp_ps_nge_us(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_nge_us
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ult")
   return _mm256_cmp_ps(a, b, _CMP_NGE_US);
 }
 
 __m256 test_mm256_cmp_ps_ngt_us(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_ngt_us
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ule")
   return _mm256_cmp_ps(a, b, _CMP_NGT_US);
 }
 
@@ -277,19 +277,19 @@ __m256 test_mm256_cmp_ps_false_oq(__m256 a, __m256 b) {
 
 __m256 test_mm256_cmp_ps_neq_oq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_neq_oq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"one")
   return _mm256_cmp_ps(a, b, _CMP_NEQ_OQ);
 }
 
 __m256 test_mm256_cmp_ps_ge_os(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_ge_os
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oge")
   return _mm256_cmp_ps(a, b, _CMP_GE_OS);
 }
 
 __m256 test_mm256_cmp_ps_gt_os(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_gt_os
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ogt")
   return _mm256_cmp_ps(a, b, _CMP_GT_OS);
 }
 
@@ -301,67 +301,67 @@ __m256 test_mm256_cmp_ps_true_uq(__m256 a, __m256 b) {
 
 __m256 test_mm256_cmp_ps_eq_os(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_eq_os
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oeq")
   return _mm256_cmp_ps(a, b, _CMP_EQ_OS);
 }
 
 __m256 test_mm256_cmp_ps_lt_oq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_lt_oq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"olt")
   return _mm256_cmp_ps(a, b, _CMP_LT_OQ);
 }
 
 __m256 test_mm256_cmp_ps_le_oq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_le_oq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ole")
   return _mm256_cmp_ps(a, b, _CMP_LE_OQ);
 }
 
 __m256 test_mm256_cmp_ps_unord_s(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_unord_s
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uno")
   return _mm256_cmp_ps(a, b, _CMP_UNORD_S);
 }
 
 __m256 test_mm256_cmp_ps_neq_us(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_neq_us
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"une")
   return _mm256_cmp_ps(a, b, _CMP_NEQ_US);
 }
 
 __m256 test_mm256_cmp_ps_nlt_uq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_nlt_uq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"uge")
   return _mm256_cmp_ps(a, b, _CMP_NLT_UQ);
 }
 
 __m256 test_mm256_cmp_ps_nle_uq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_nle_uq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ugt")
   return _mm256_cmp_ps(a, b, _CMP_NLE_UQ);
 }
 
 __m256 test_mm256_cmp_ps_ord_s(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_ord_s
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ord")
   return _mm256_cmp_ps(a, b, _CMP_ORD_S);
 }
 
 __m256 test_mm256_cmp_ps_eq_us(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_eq_us
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ueq")
   return _mm256_cmp_ps(a, b, _CMP_EQ_US);
 }
 
 __m256 test_mm256_cmp_ps_nge_uq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_nge_uq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ult")
   return _mm256_cmp_ps(a, b, _CMP_NGE_UQ);
 }
 
 __m256 test_mm256_cmp_ps_ngt_uq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_ngt_uq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ule")
   return _mm256_cmp_ps(a, b, _CMP_NGT_UQ);
 }
 
@@ -373,19 +373,19 @@ __m256 test_mm256_cmp_ps_false_os(__m256 a, __m256 b) {
 
 __m256 test_mm256_cmp_ps_neq_os(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_neq_os
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmps.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"one")
   return _mm256_cmp_ps(a, b, _CMP_NEQ_OS);
 }
 
 __m256 test_mm256_cmp_ps_ge_oq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_ge_oq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"oge")
   return _mm256_cmp_ps(a, b, _CMP_GE_OQ);
 }
 
 __m256 test_mm256_cmp_ps_gt_oq(__m256 a, __m256 b) {
   // CHECK-LABEL: test_mm256_cmp_ps_gt_oq
-  // CHECK: call <8 x i1> @llvm.experimental.constrained.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <8 x i1> @llvm.fcmp.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !"ogt")
   return _mm256_cmp_ps(a, b, _CMP_GT_OQ);
 }
 
@@ -397,67 +397,67 @@ __m256 test_mm256_cmp_ps_true_us(__m256 a, __m256 b) {
 
 __m128d test_mm_cmp_pd_eq_oq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_eq_oq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq")
   return _mm_cmp_pd(a, b, _CMP_EQ_OQ);
 }
 
 __m128d test_mm_cmp_pd_lt_os(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_lt_os
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt")
   return _mm_cmp_pd(a, b, _CMP_LT_OS);
 }
 
 __m128d test_mm_cmp_pd_le_os(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_le_os
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole")
   return _mm_cmp_pd(a, b, _CMP_LE_OS);
 }
 
 __m128d test_mm_cmp_pd_unord_q(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_unord_q
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uno")
   return _mm_cmp_pd(a, b, _CMP_UNORD_Q);
 }
 
 __m128d test_mm_cmp_pd_neq_uq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_neq_uq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"une")
   return _mm_cmp_pd(a, b, _CMP_NEQ_UQ);
 }
 
 __m128d test_mm_cmp_pd_nlt_us(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_nlt_us
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge")
   return _mm_cmp_pd(a, b, _CMP_NLT_US);
 }
 
 __m128d test_mm_cmp_pd_nle_us(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_nle_us
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt")
   return _mm_cmp_pd(a, b, _CMP_NLE_US);
 }
 
 __m128d test_mm_cmp_pd_ord_q(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_ord_q
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ord")
   return _mm_cmp_pd(a, b, _CMP_ORD_Q);
 }
 
 __m128d test_mm_cmp_pd_eq_uq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_eq_uq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ueq")
   return _mm_cmp_pd(a, b, _CMP_EQ_UQ);
 }
 
 __m128d test_mm_cmp_pd_nge_us(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_nge_us
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ult")
   return _mm_cmp_pd(a, b, _CMP_NGE_US);
 }
 
 __m128d test_mm_cmp_pd_ngt_us(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_ngt_us
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ule")
   return _mm_cmp_pd(a, b, _CMP_NGT_US);
 }
 
@@ -469,19 +469,19 @@ __m128d test_mm_cmp_pd_false_oq(__m128d a, __m128d b) {
 
 __m128d test_mm_cmp_pd_neq_oq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_neq_oq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"one")
   return _mm_cmp_pd(a, b, _CMP_NEQ_OQ);
 }
 
 __m128d test_mm_cmp_pd_ge_os(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_ge_os
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oge")
   return _mm_cmp_pd(a, b, _CMP_GE_OS);
 }
 
 __m128d test_mm_cmp_pd_gt_os(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_gt_os
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ogt")
   return _mm_cmp_pd(a, b, _CMP_GT_OS);
 }
 
@@ -493,67 +493,67 @@ __m128d test_mm_cmp_pd_true_uq(__m128d a, __m128d b) {
 
 __m128d test_mm_cmp_pd_eq_os(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_eq_os
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq")
   return _mm_cmp_pd(a, b, _CMP_EQ_OS);
 }
 
 __m128d test_mm_cmp_pd_lt_oq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_lt_oq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt")
   return _mm_cmp_pd(a, b, _CMP_LT_OQ);
 }
 
 __m128d test_mm_cmp_pd_le_oq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_le_oq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole")
   return _mm_cmp_pd(a, b, _CMP_LE_OQ);
 }
 
 __m128d test_mm_cmp_pd_unord_s(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_unord_s
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uno")
   return _mm_cmp_pd(a, b, _CMP_UNORD_S);
 }
 
 __m128d test_mm_cmp_pd_neq_us(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_neq_us
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"une")
   return _mm_cmp_pd(a, b, _CMP_NEQ_US);
 }
 
 __m128d test_mm_cmp_pd_nlt_uq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_nlt_uq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge")
   return _mm_cmp_pd(a, b, _CMP_NLT_UQ);
 }
 
 __m128d test_mm_cmp_pd_nle_uq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_nle_uq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt")
   return _mm_cmp_pd(a, b, _CMP_NLE_UQ);
 }
 
 __m128d test_mm_cmp_pd_ord_s(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_ord_s
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ord")
   return _mm_cmp_pd(a, b, _CMP_ORD_S);
 }
 
 __m128d test_mm_cmp_pd_eq_us(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_eq_us
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ueq")
   return _mm_cmp_pd(a, b, _CMP_EQ_US);
 }
 
 __m128d test_mm_cmp_pd_nge_uq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_nge_uq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ult")
   return _mm_cmp_pd(a, b, _CMP_NGE_UQ);
 }
 
 __m128d test_mm_cmp_pd_ngt_uq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_ngt_uq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ule")
   return _mm_cmp_pd(a, b, _CMP_NGT_UQ);
 }
 
@@ -565,19 +565,19 @@ __m128d test_mm_cmp_pd_false_os(__m128d a, __m128d b) {
 
 __m128d test_mm_cmp_pd_neq_os(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_neq_os
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"one")
   return _mm_cmp_pd(a, b, _CMP_NEQ_OS);
 }
 
 __m128d test_mm_cmp_pd_ge_oq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_ge_oq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oge")
   return _mm_cmp_pd(a, b, _CMP_GE_OQ);
 }
 
 __m128d test_mm_cmp_pd_gt_oq(__m128d a, __m128d b) {
   // CHECK-LABEL: test_mm_cmp_pd_gt_oq
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ogt")
   return _mm_cmp_pd(a, b, _CMP_GT_OQ);
 }
 
@@ -589,67 +589,67 @@ __m128d test_mm_cmp_pd_true_us(__m128d a, __m128d b) {
 
 __m128 test_mm_cmp_ps_eq_oq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_eq_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq")
   return _mm_cmp_ps(a, b, _CMP_EQ_OQ);
 }
 
 __m128 test_mm_cmp_ps_lt_os(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_lt_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt")
   return _mm_cmp_ps(a, b, _CMP_LT_OS);
 }
 
 __m128 test_mm_cmp_ps_le_os(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_le_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole")
   return _mm_cmp_ps(a, b, _CMP_LE_OS);
 }
 
 __m128 test_mm_cmp_ps_unord_q(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_unord_q
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uno")
   return _mm_cmp_ps(a, b, _CMP_UNORD_Q);
 }
 
 __m128 test_mm_cmp_ps_neq_uq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_neq_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"une")
   return _mm_cmp_ps(a, b, _CMP_NEQ_UQ);
 }
 
 __m128 test_mm_cmp_ps_nlt_us(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_nlt_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge")
   return _mm_cmp_ps(a, b, _CMP_NLT_US);
 }
 
 __m128 test_mm_cmp_ps_nle_us(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_nle_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt")
   return _mm_cmp_ps(a, b, _CMP_NLE_US);
 }
 
 __m128 test_mm_cmp_ps_ord_q(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_ord_q
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ord")
   return _mm_cmp_ps(a, b, _CMP_ORD_Q);
 }
 
 __m128 test_mm_cmp_ps_eq_uq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_eq_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ueq")
   return _mm_cmp_ps(a, b, _CMP_EQ_UQ);
 }
 
 __m128 test_mm_cmp_ps_nge_us(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_nge_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ult")
   return _mm_cmp_ps(a, b, _CMP_NGE_US);
 }
 
 __m128 test_mm_cmp_ps_ngt_us(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_ngt_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ule")
   return _mm_cmp_ps(a, b, _CMP_NGT_US);
 }
 
@@ -661,19 +661,19 @@ __m128 test_mm_cmp_ps_false_oq(__m128 a, __m128 b) {
 
 __m128 test_mm_cmp_ps_neq_oq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_neq_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"one")
   return _mm_cmp_ps(a, b, _CMP_NEQ_OQ);
 }
 
 __m128 test_mm_cmp_ps_ge_os(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_ge_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oge")
   return _mm_cmp_ps(a, b, _CMP_GE_OS);
 }
 
 __m128 test_mm_cmp_ps_gt_os(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_gt_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ogt")
   return _mm_cmp_ps(a, b, _CMP_GT_OS);
 }
 
@@ -685,67 +685,67 @@ __m128 test_mm_cmp_ps_true_uq(__m128 a, __m128 b) {
 
 __m128 test_mm_cmp_ps_eq_os(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_eq_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq")
   return _mm_cmp_ps(a, b, _CMP_EQ_OS);
 }
 
 __m128 test_mm_cmp_ps_lt_oq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_lt_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt")
   return _mm_cmp_ps(a, b, _CMP_LT_OQ);
 }
 
 __m128 test_mm_cmp_ps_le_oq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_le_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole")
   return _mm_cmp_ps(a, b, _CMP_LE_OQ);
 }
 
 __m128 test_mm_cmp_ps_unord_s(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_unord_s
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uno")
   return _mm_cmp_ps(a, b, _CMP_UNORD_S);
 }
 
 __m128 test_mm_cmp_ps_neq_us(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_neq_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"une")
   return _mm_cmp_ps(a, b, _CMP_NEQ_US);
 }
 
 __m128 test_mm_cmp_ps_nlt_uq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_nlt_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge")
   return _mm_cmp_ps(a, b, _CMP_NLT_UQ);
 }
 
 __m128 test_mm_cmp_ps_nle_uq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_nle_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt")
   return _mm_cmp_ps(a, b, _CMP_NLE_UQ);
 }
 
 __m128 test_mm_cmp_ps_ord_s(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_ord_s
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ord")
   return _mm_cmp_ps(a, b, _CMP_ORD_S);
 }
 
 __m128 test_mm_cmp_ps_eq_us(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_eq_us
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ueq", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ueq")
   return _mm_cmp_ps(a, b, _CMP_EQ_US);
 }
 
 __m128 test_mm_cmp_ps_nge_uq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_nge_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ult", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ult")
   return _mm_cmp_ps(a, b, _CMP_NGE_UQ);
 }
 
 __m128 test_mm_cmp_ps_ngt_uq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_ngt_uq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ule", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ule")
   return _mm_cmp_ps(a, b, _CMP_NGT_UQ);
 }
 
@@ -757,19 +757,19 @@ __m128 test_mm_cmp_ps_false_os(__m128 a, __m128 b) {
 
 __m128 test_mm_cmp_ps_neq_os(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_neq_os
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"one")
   return _mm_cmp_ps(a, b, _CMP_NEQ_OS);
 }
 
 __m128 test_mm_cmp_ps_ge_oq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_ge_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oge")
   return _mm_cmp_ps(a, b, _CMP_GE_OQ);
 }
 
 __m128 test_mm_cmp_ps_gt_oq(__m128 a, __m128 b) {
   // CHECK-LABEL: test_mm_cmp_ps_gt_oq
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ogt")
   return _mm_cmp_ps(a, b, _CMP_GT_OQ);
 }
 
diff --git a/clang/test/CodeGen/X86/sse-builtins-constrained-cmp.c b/clang/test/CodeGen/X86/sse-builtins-constrained-cmp.c
index 563fe3d86d821..5f89ccb0dfa32 100644
--- a/clang/test/CodeGen/X86/sse-builtins-constrained-cmp.c
+++ b/clang/test/CodeGen/X86/sse-builtins-constrained-cmp.c
@@ -6,7 +6,7 @@
 
 __m128 test_mm_cmpeq_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpeq_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -15,7 +15,7 @@ __m128 test_mm_cmpeq_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpge_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpge_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -24,7 +24,7 @@ __m128 test_mm_cmpge_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpgt_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpgt_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -33,7 +33,7 @@ __m128 test_mm_cmpgt_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmple_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmple_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -42,7 +42,7 @@ __m128 test_mm_cmple_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmplt_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmplt_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -51,7 +51,7 @@ __m128 test_mm_cmplt_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpneq_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpneq_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"une")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -60,7 +60,7 @@ __m128 test_mm_cmpneq_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpnge_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpnge_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -69,7 +69,7 @@ __m128 test_mm_cmpnge_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpngt_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpngt_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -78,7 +78,7 @@ __m128 test_mm_cmpngt_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpnle_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpnle_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ugt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -87,7 +87,7 @@ __m128 test_mm_cmpnle_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpnlt_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpnlt_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uge")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -96,7 +96,7 @@ __m128 test_mm_cmpnlt_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpord_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpord_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ord")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
@@ -105,7 +105,7 @@ __m128 test_mm_cmpord_ps(__m128 __a, __m128 __b) {
 
 __m128 test_mm_cmpunord_ps(__m128 __a, __m128 __b) {
   // CHECK-LABEL: test_mm_cmpunord_ps
-  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"uno")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
   // CHECK-NEXT:    ret <4 x float> [[BC]]
diff --git a/clang/test/CodeGen/X86/sse2-builtins-constrained-cmp.c b/clang/test/CodeGen/X86/sse2-builtins-constrained-cmp.c
index 732d4d53e4cf8..06a3734f2529d 100644
--- a/clang/test/CodeGen/X86/sse2-builtins-constrained-cmp.c
+++ b/clang/test/CodeGen/X86/sse2-builtins-constrained-cmp.c
@@ -8,7 +8,7 @@
 
 __m128d test_mm_cmpeq_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpeq_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpeq_pd(A, B);
@@ -16,7 +16,7 @@ __m128d test_mm_cmpeq_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpge_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpge_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpge_pd(A, B);
@@ -24,7 +24,7 @@ __m128d test_mm_cmpge_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpgt_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpgt_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpgt_pd(A, B);
@@ -32,7 +32,7 @@ __m128d test_mm_cmpgt_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmple_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmple_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmple_pd(A, B);
@@ -40,7 +40,7 @@ __m128d test_mm_cmple_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmplt_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmplt_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmplt_pd(A, B);
@@ -48,7 +48,7 @@ __m128d test_mm_cmplt_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpneq_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpneq_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"une")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpneq_pd(A, B);
@@ -56,7 +56,7 @@ __m128d test_mm_cmpneq_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpnge_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpnge_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpnge_pd(A, B);
@@ -64,7 +64,7 @@ __m128d test_mm_cmpnge_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpngt_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpngt_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpngt_pd(A, B);
@@ -72,7 +72,7 @@ __m128d test_mm_cmpngt_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpnle_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpnle_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ugt")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpnle_pd(A, B);
@@ -80,7 +80,7 @@ __m128d test_mm_cmpnle_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpnlt_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpnlt_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uge")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpnlt_pd(A, B);
@@ -88,7 +88,7 @@ __m128d test_mm_cmpnlt_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpord_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpord_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ord", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ord")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpord_pd(A, B);
@@ -96,7 +96,7 @@ __m128d test_mm_cmpord_pd(__m128d A, __m128d B) {
 
 __m128d test_mm_cmpunord_pd(__m128d A, __m128d B) {
   // CHECK-LABEL: test_mm_cmpunord_pd
-  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
+  // CHECK:         [[CMP:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"uno")
   // CHECK-NEXT:    [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
   // CHECK-NEXT:    [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
   return _mm_cmpunord_pd(A, B);
diff --git a/clang/test/CodeGen/X86/sse41-builtins-constrained.c b/clang/test/CodeGen/X86/sse41-builtins-constrained.c
index 6b25bd27af7e0..a0b98776870cc 100644
--- a/clang/test/CodeGen/X86/sse41-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/sse41-builtins-constrained.c
@@ -21,7 +21,7 @@
 
 __m128d test_mm_round_pd_roundeven(__m128d x) {
   // CHECK-LABEL: test_mm_round_pd_roundeven
-  // CHECK: %{{.*}} = call <2 x double> @llvm.experimental.constrained.roundeven.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.ignore")
+  // CHECK: %{{.*}} = call <2 x double> @llvm.roundeven.v2f64(<2 x double> %{{.*}}) {{.*}}[ "fp.except"(metadata !"ignore") ]
   return _mm_round_pd(x, 0b1000);
 }
 
@@ -39,7 +39,7 @@ __m128d test_mm_round_pd_fround_no_exc(__m128d x) {
 
 __m128 test_mm_round_ps_floor(__m128 x) {
   // CHECK-LABEL: test_mm_round_ps_floor
-  // CHECK: %{{.*}} = call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.ignore")
+  // CHECK: %{{.*}} = call <4 x float> @llvm.floor.v4f32(<4 x float> %{{.*}}) {{.*}}[ "fp.except"(metadata !"ignore") ]
   return _mm_round_ps(x, 0b1001);
 }
 
@@ -58,7 +58,7 @@ __m128 test_mm_round_ps_fround_no_exc(__m128 x) {
 __m128d test_mm_round_sd_ceil(__m128d x, __m128d y) {
   // CHECK-LABEL: test_mm_round_sd_ceil
   // CHECK: %[[A:.*]] = extractelement <2 x double> %{{.*}}, i32 0
-  // CHECK: %[[B:.*]] = call double @llvm.experimental.constrained.ceil.f64(double %[[A:.*]], metadata !"fpexcept.ignore")
+  // CHECK: %[[B:.*]] = call double @llvm.ceil.f64(double %[[A:.*]]) {{.*}}[ "fp.except"(metadata !"ignore") ]
   // CHECK: %{{.*}} = insertelement <2 x double> %0, double %[[B:.*]], i32 0
   return _mm_round_sd(x, y, 0b1010);
 }
@@ -78,7 +78,7 @@ __m128d test_mm_round_sd_fround_no_exc(__m128d x, __m128d y) {
 __m128 test_mm_round_ss_trunc(__m128 x, __m128 y) {
   // CHECK-LABEL: test_mm_round_ss_trunc
   // CHECK: %[[A:.*]] = extractelement <4 x float> %{{.*}}, i32 0
-  // CHECK: %[[B:.*]] = call float @llvm.experimental.constrained.trunc.f32(float %[[A:.*]], metadata !"fpexcept.ignore") 
+  // CHECK: %[[B:.*]] = call float @llvm.trunc.f32(float %[[A:.*]]) {{.*}}[ "fp.except"(metadata !"ignore") ]
   // CHECK: %{{.*}} = insertelement <4 x float> %0, float %[[B:.*]], i32 0
   return _mm_round_ss(x, y, 0b1011);
 }
diff --git a/clang/test/CodeGen/X86/strictfp_builtins.c b/clang/test/CodeGen/X86/strictfp_builtins.c
index 43e4060bef259..aad076f154cea 100644
--- a/clang/test/CodeGen/X86/strictfp_builtins.c
+++ b/clang/test/CodeGen/X86/strictfp_builtins.c
@@ -27,7 +27,7 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -43,7 +43,7 @@ void test_long_double_isinf(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -59,7 +59,7 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGen/builtin_float_strictfp.c b/clang/test/CodeGen/builtin_float_strictfp.c
index 81bf89228f59c..532c0d0f9fe09 100644
--- a/clang/test/CodeGen/builtin_float_strictfp.c
+++ b/clang/test/CodeGen/builtin_float_strictfp.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -emit-llvm -triple x86_64-windows-pc -ffp-exception-behavior=maytrap -o - %s | FileCheck %s --check-prefixes=CHECK,FP16
 // RUN: %clang_cc1 -emit-llvm -triple ppc64-be -ffp-exception-behavior=maytrap -o - %s | FileCheck %s --check-prefixes=CHECK,NOFP16
 
@@ -8,50 +9,142 @@
 
 #pragma float_control(except, on)
 
-// CHECK-LABEL: @test_half
+// FP16-LABEL: define dso_local void @test_half(
+// FP16-SAME: ptr noundef [[H:%.*]], ptr noundef [[H2:%.*]]) #[[ATTR0:[0-9]+]] {
+// FP16-NEXT:  [[ENTRY:.*:]]
+// FP16-NEXT:    [[H2_ADDR:%.*]] = alloca ptr, align 8
+// FP16-NEXT:    [[H_ADDR:%.*]] = alloca ptr, align 8
+// FP16-NEXT:    store ptr [[H2]], ptr [[H2_ADDR]], align 8
+// FP16-NEXT:    store ptr [[H]], ptr [[H_ADDR]], align 8
+// FP16-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[H_ADDR]], align 8
+// FP16-NEXT:    [[TMP1:%.*]] = load half, ptr [[TMP0]], align 2
+// FP16-NEXT:    [[CONV:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP1]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[H2_ADDR]], align 8
+// FP16-NEXT:    [[TMP3:%.*]] = load half, ptr [[TMP2]], align 2
+// FP16-NEXT:    [[CONV1:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP3]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV]], float [[CONV1]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP4:%.*]] = zext i1 [[CMP]] to i32
+// FP16-NEXT:    [[TMP5:%.*]] = load ptr, ptr [[H_ADDR]], align 8
+// FP16-NEXT:    [[TMP6:%.*]] = load half, ptr [[TMP5]], align 2
+// FP16-NEXT:    [[TMP7:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP6]], i32 516) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP8:%.*]] = zext i1 [[TMP7]] to i32
+// FP16-NEXT:    ret void
+//
+// NOFP16-LABEL: define dso_local void @test_half(
+// NOFP16-SAME: ptr noundef [[H:%.*]], ptr noundef [[H2:%.*]]) #[[ATTR0:[0-9]+]] {
+// NOFP16-NEXT:  [[ENTRY:.*:]]
+// NOFP16-NEXT:    [[H_ADDR:%.*]] = alloca ptr, align 8
+// NOFP16-NEXT:    [[H2_ADDR:%.*]] = alloca ptr, align 8
+// NOFP16-NEXT:    store ptr [[H]], ptr [[H_ADDR]], align 8
+// NOFP16-NEXT:    store ptr [[H2]], ptr [[H2_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[H_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// NOFP16-NEXT:    [[TMP2:%.*]] = bitcast i16 [[TMP1]] to half
+// NOFP16-NEXT:    [[CONV:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP2]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[H2_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP4:%.*]] = load i16, ptr [[TMP3]], align 2
+// NOFP16-NEXT:    [[TMP5:%.*]] = bitcast i16 [[TMP4]] to half
+// NOFP16-NEXT:    [[CONV1:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP5]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV]], float [[CONV1]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP6:%.*]] = zext i1 [[CMP]] to i32
+// NOFP16-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[H_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP8:%.*]] = load i16, ptr [[TMP7]], align 2
+// NOFP16-NEXT:    [[TMP9:%.*]] = bitcast i16 [[TMP8]] to half
+// NOFP16-NEXT:    [[CONV2:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP9]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP10:%.*]] = call i1 @llvm.is.fpclass.f32(float [[CONV2]], i32 516) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP11:%.*]] = zext i1 [[TMP10]] to i32
+// NOFP16-NEXT:    ret void
+//
 void test_half(__fp16 *H, __fp16 *H2) {
   (void)__builtin_isgreater(*H, *H2);
-  // FP16: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // FP16: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
   (void)__builtin_isinf(*H);
-  // NOFP16:       [[LDADDR:%.*]] = load ptr, ptr %{{.*}}, align 8
-  // NOFP16-NEXT:  [[IHALF:%.*]]  = load i16, ptr [[LDADDR]], align 2
-  // NOFP16-NEXT:  [[BITCAST:%.*]] = bitcast i16 [[IHALF]] to half
-  // NOFP16-NEXT:  [[CONV:%.*]]   = call float @llvm.experimental.constrained.fpext.f32.f16(half [[BITCAST]], metadata !"fpexcept.strict")
-  // NOFP16-NEXT:  [[RES1:%.*]]   = call i1 @llvm.is.fpclass.f32(float [[CONV]], i32 516)
-  // NOFP16-NEXT:                   zext i1 [[RES1]] to i32
-  // FP16:         [[LDADDR:%.*]] = load ptr, ptr %{{.*}}, align 8
-  // FP16-NEXT:    [[HALF:%.*]]   = load half, ptr [[LDADDR]], align 2
-  // FP16-NEXT:    [[RES1:%.*]]   = call i1 @llvm.is.fpclass.f16(half [[HALF]], i32 516)
-  // FP16-NEXT:                     zext i1 [[RES1]] to i32
 }
 
-// CHECK-LABEL: @test_mixed
+// FP16-LABEL: define dso_local void @test_mixed(
+// FP16-SAME: double noundef [[D1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FP16-NEXT:  [[ENTRY:.*:]]
+// FP16-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FP16-NEXT:    [[D1_ADDR:%.*]] = alloca double, align 8
+// FP16-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    store double [[D1]], ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP0:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[CONV]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP2:%.*]] = zext i1 [[CMP]] to i32
+// FP16-NEXT:    [[TMP3:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP4:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP4]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP3]], double [[CONV1]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP5:%.*]] = zext i1 [[CMP2]] to i32
+// FP16-NEXT:    [[TMP6:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP7:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    [[CONV3:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP7]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP4:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP6]], double [[CONV3]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP8:%.*]] = zext i1 [[CMP4]] to i32
+// FP16-NEXT:    [[TMP9:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP10:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    [[CONV5:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP10]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP6:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP9]], double [[CONV5]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP11:%.*]] = zext i1 [[CMP6]] to i32
+// FP16-NEXT:    [[TMP12:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP13:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    [[CONV7:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP13]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP8:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP12]], double [[CONV7]], metadata !"one") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP14:%.*]] = zext i1 [[CMP8]] to i32
+// FP16-NEXT:    [[TMP15:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// FP16-NEXT:    [[TMP16:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FP16-NEXT:    [[CONV9:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP16]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[CMP10:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP15]], double [[CONV9]], metadata !"uno") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// FP16-NEXT:    [[TMP17:%.*]] = zext i1 [[CMP10]] to i32
+// FP16-NEXT:    ret void
+//
+// NOFP16-LABEL: define dso_local void @test_mixed(
+// NOFP16-SAME: double noundef [[D1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// NOFP16-NEXT:  [[ENTRY:.*:]]
+// NOFP16-NEXT:    [[D1_ADDR:%.*]] = alloca double, align 8
+// NOFP16-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// NOFP16-NEXT:    store double [[D1]], ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[TMP0:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[CONV]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP2:%.*]] = zext i1 [[CMP]] to i32
+// NOFP16-NEXT:    [[TMP3:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP4:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP4]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP2:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP3]], double [[CONV1]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP5:%.*]] = zext i1 [[CMP2]] to i32
+// NOFP16-NEXT:    [[TMP6:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP7:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[CONV3:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP7]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP4:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP6]], double [[CONV3]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP8:%.*]] = zext i1 [[CMP4]] to i32
+// NOFP16-NEXT:    [[TMP9:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP10:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[CONV5:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP10]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP6:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP9]], double [[CONV5]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP11:%.*]] = zext i1 [[CMP6]] to i32
+// NOFP16-NEXT:    [[TMP12:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP13:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[CONV7:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP13]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP8:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP12]], double [[CONV7]], metadata !"one") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP14:%.*]] = zext i1 [[CMP8]] to i32
+// NOFP16-NEXT:    [[TMP15:%.*]] = load double, ptr [[D1_ADDR]], align 8
+// NOFP16-NEXT:    [[TMP16:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// NOFP16-NEXT:    [[CONV9:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP16]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[CMP10:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP15]], double [[CONV9]], metadata !"uno") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOFP16-NEXT:    [[TMP17:%.*]] = zext i1 [[CMP10]] to i32
+// NOFP16-NEXT:    ret void
+//
 void test_mixed(double d1, float f2) {
   (void)__builtin_isgreater(d1, f2);
-  // CHECK: [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK-NEXT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double [[CONV]], metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
   (void)__builtin_isgreaterequal(d1, f2);
-  // CHECK: [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK-NEXT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double [[CONV]], metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
   (void)__builtin_isless(d1, f2);
-  // CHECK: [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK-NEXT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double [[CONV]], metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
   (void)__builtin_islessequal(d1, f2);
-  // CHECK: [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK-NEXT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double [[CONV]], metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
   (void)__builtin_islessgreater(d1, f2);
-  // CHECK: [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK-NEXT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double [[CONV]], metadata !"one", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
   (void)__builtin_isunordered(d1, f2);
-  // CHECK: [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK-NEXT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double [[CONV]], metadata !"uno", metadata !"fpexcept.strict")
-  // CHECK-NEXT: zext i1
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
diff --git a/clang/test/CodeGen/complex-strictfp.c b/clang/test/CodeGen/complex-strictfp.c
index e89c8f3bbdd63..0c9611d714ffb 100644
--- a/clang/test/CodeGen/complex-strictfp.c
+++ b/clang/test/CodeGen/complex-strictfp.c
@@ -20,11 +20,11 @@ double D;
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr @D, align 8
 // CHECK-NEXT:    [[CF_REAL:%.*]] = load float, ptr @cf, align 4
 // CHECK-NEXT:    [[CF_IMAG:%.*]] = load float, ptr getelementptr inbounds nuw (i8, ptr @cf, i64 4), align 4
-// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CF_REAL]], metadata !"fpexcept.strict") #[[ATTR2:[0-9]+]]
-// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CF_IMAG]], metadata !"fpexcept.strict") #[[ATTR2]]
-// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[CONV]], double [[TMP0]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
-// CHECK-NEXT:    [[CONV2:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[ADD_R]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
-// CHECK-NEXT:    [[CONV3:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[CONV1]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[CF_REAL]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[CF_IMAG]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.fadd.f64(double [[CONV]], double [[TMP0]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[CONV2:%.*]] = call float @llvm.fptrunc.f32.f64(double [[ADD_R]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[CONV3:%.*]] = call float @llvm.fptrunc.f32.f64(double [[CONV1]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store float [[CONV2]], ptr @cf, align 4
 // CHECK-NEXT:    store float [[CONV3]], ptr getelementptr inbounds nuw (i8, ptr @cf, i64 4), align 4
 // CHECK-NEXT:    ret void
@@ -37,10 +37,10 @@ void test3a(void) {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CF_REAL:%.*]] = load float, ptr @cf, align 4
 // CHECK-NEXT:    [[CF_IMAG:%.*]] = load float, ptr getelementptr inbounds nuw (i8, ptr @cf, i64 4), align 4
-// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CF_REAL]], metadata !"fpexcept.strict") #[[ATTR2]]
-// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CF_IMAG]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[CF_REAL]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[CF_IMAG]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr @D, align 8
-// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP0]], double [[CONV]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.fadd.f64(double [[TMP0]], double [[CONV]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store double [[ADD_R]], ptr @D, align 8
 // CHECK-NEXT:    ret void
 //
@@ -54,13 +54,13 @@ void test3b(void) {
 // CHECK-NEXT:    [[G1_IMAG:%.*]] = load double, ptr getelementptr inbounds nuw (i8, ptr @g1, i64 8), align 8
 // CHECK-NEXT:    [[CF_REAL:%.*]] = load float, ptr @cf, align 4
 // CHECK-NEXT:    [[CF_IMAG:%.*]] = load float, ptr getelementptr inbounds nuw (i8, ptr @cf, i64 4), align 4
-// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CF_REAL]], metadata !"fpexcept.strict") #[[ATTR2]]
-// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CF_IMAG]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[CF_REAL]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[CF_IMAG]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    [[CALL:%.*]] = call { double, double } @__divdc3(double noundef [[CONV]], double noundef [[CONV1]], double noundef [[G1_REAL]], double noundef [[G1_IMAG]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    [[TMP0:%.*]] = extractvalue { double, double } [[CALL]], 0
 // CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { double, double } [[CALL]], 1
-// CHECK-NEXT:    [[CONV2:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP0]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
-// CHECK-NEXT:    [[CONV3:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP1]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV2:%.*]] = call float @llvm.fptrunc.f32.f64(double [[TMP0]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
+// CHECK-NEXT:    [[CONV3:%.*]] = call float @llvm.fptrunc.f32.f64(double [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store float [[CONV2]], ptr @cf, align 4
 // CHECK-NEXT:    store float [[CONV3]], ptr getelementptr inbounds nuw (i8, ptr @cf, i64 4), align 4
 // CHECK-NEXT:    ret void
@@ -74,7 +74,7 @@ void test3c(void) {
 // CHECK-NEXT:    [[G1_REAL:%.*]] = load double, ptr @g1, align 8
 // CHECK-NEXT:    [[G1_IMAG:%.*]] = load double, ptr getelementptr inbounds nuw (i8, ptr @g1, i64 8), align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr @D, align 8
-// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[G1_REAL]], double [[TMP0]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.fadd.f64(double [[G1_REAL]], double [[TMP0]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store double [[ADD_R]], ptr @g1, align 8
 // CHECK-NEXT:    store double [[G1_IMAG]], ptr getelementptr inbounds nuw (i8, ptr @g1, i64 8), align 8
 // CHECK-NEXT:    ret void
@@ -88,7 +88,7 @@ void test3d(void) {
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr @D, align 8
 // CHECK-NEXT:    [[G1_REAL:%.*]] = load double, ptr @g1, align 8
 // CHECK-NEXT:    [[G1_IMAG:%.*]] = load double, ptr getelementptr inbounds nuw (i8, ptr @g1, i64 8), align 8
-// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP0]], double [[G1_REAL]], metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[ADD_R:%.*]] = call double @llvm.fadd.f64(double [[TMP0]], double [[G1_REAL]]) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store double [[ADD_R]], ptr @g1, align 8
 // CHECK-NEXT:    store double [[G1_IMAG]], ptr getelementptr inbounds nuw (i8, ptr @g1, i64 8), align 8
 // CHECK-NEXT:    ret void
@@ -99,7 +99,7 @@ void test3e(void) {
 
 // CHECK-LABEL: @t1(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CONV:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double 4.000000e+00, metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call float @llvm.fptrunc.f32.f64(double 4.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store float [[CONV]], ptr @cf, align 4
 // CHECK-NEXT:    ret void
 //
@@ -109,7 +109,7 @@ void t1(void) {
 
 // CHECK-LABEL: @t2(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[CONV:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double 4.000000e+00, metadata !"round.upward", metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call float @llvm.fptrunc.f32.f64(double 4.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    store float [[CONV]], ptr getelementptr inbounds nuw (i8, ptr @cf, i64 4), align 4
 // CHECK-NEXT:    ret void
 //
@@ -122,10 +122,10 @@ void t2(void) {
 // CHECK-NEXT:    [[C:%.*]] = alloca [0 x i8], align 1
 // CHECK-NEXT:    br i1 false, label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
 // CHECK:       cond.true:
-// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float 2.000000e+00, metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float 2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    br label [[COND_END:%.*]]
 // CHECK:       cond.false:
-// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float 2.000000e+00, metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float 2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    br label [[COND_END]]
 // CHECK:       cond.end:
 // CHECK-NEXT:    [[COND_R:%.*]] = phi double [ [[CONV]], [[COND_TRUE]] ], [ [[CONV1]], [[COND_FALSE]] ]
@@ -144,10 +144,10 @@ void t91(void) {
 // CHECK-NEXT:    [[C:%.*]] = alloca [0 x i8], align 1
 // CHECK-NEXT:    br i1 false, label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
 // CHECK:       cond.true:
-// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float 2.000000e+00, metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float 2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    br label [[COND_END:%.*]]
 // CHECK:       cond.false:
-// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float 2.000000e+00, metadata !"fpexcept.strict") #[[ATTR2]]
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float 2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rtp") ]
 // CHECK-NEXT:    br label [[COND_END]]
 // CHECK:       cond.end:
 // CHECK-NEXT:    [[COND_R:%.*]] = phi double [ [[CONV]], [[COND_TRUE]] ], [ [[CONV1]], [[COND_FALSE]] ]
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 68b9e75283c54..c3c48873be68c 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple x86_64-linux -ffp-exception-behavior=maytrap -w -o - -emit-llvm %s | FileCheck %s
 
 // Test codegen of constrained math builtins.
@@ -7,382 +8,647 @@
 
 #pragma float_control(except, on)
 
+// CHECK-LABEL: define dso_local void @foo(
+// CHECK-SAME: ptr noundef [[D:%.*]], float noundef [[F:%.*]], ptr noundef [[FP:%.*]], ptr noundef [[L:%.*]], ptr noundef [[I:%.*]], ptr noundef [[C:%.*]], half noundef [[H:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[D_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    [[FP_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[L_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[C_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
+// CHECK-NEXT:    store ptr [[D]], ptr [[D_ADDR]], align 8
+// CHECK-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    store ptr [[FP]], ptr [[FP_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[L]], ptr [[L_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[C]], ptr [[C_ADDR]], align 8
+// CHECK-NEXT:    store half [[H]], ptr [[H_ADDR]], align 2
+// CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP0]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP1]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[FMOD:%.*]] = call double @llvm.frem.f64(double [[CONV]], double [[CONV1]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[CONV2:%.*]] = call float @llvm.fptrunc.f32.f64(double [[FMOD]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store float [[CONV2]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[FMOD3:%.*]] = call float @llvm.frem.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store float [[FMOD3]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV4:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP4]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP5:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV5:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP5]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[FMOD6:%.*]] = call x86_fp80 @llvm.frem.f80(x86_fp80 [[CONV4]], x86_fp80 [[CONV5]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[CONV7:%.*]] = call float @llvm.fptrunc.f32.f80(x86_fp80 [[FMOD6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store float [[CONV7]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP6:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV8:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP7:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV9:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP7]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[FMOD10:%.*]] = call fp128 @llvm.frem.f128(fp128 [[CONV8]], fp128 [[CONV9]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[CONV11:%.*]] = call float @llvm.fptrunc.f32.f128(fp128 [[FMOD10]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store float [[CONV11]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP8:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV12:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP9:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV13:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP9]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP10:%.*]] = call double @llvm.pow.f64(double [[CONV12]], double [[CONV13]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP11:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP12:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP13:%.*]] = call float @llvm.pow.f32(float [[TMP11]], float [[TMP12]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP14:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV14:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP14]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP15:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV15:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP15]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP16:%.*]] = call x86_fp80 @llvm.pow.f80(x86_fp80 [[CONV14]], x86_fp80 [[CONV15]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP17:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV16:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP17]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP18:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV17:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP18]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP19:%.*]] = call fp128 @llvm.pow.f128(fp128 [[CONV16]], fp128 [[CONV17]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP20:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV18:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP20]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP21:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV19:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP21]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP22:%.*]] = call double @llvm.powi.f64.i32(double [[CONV18]], i32 [[CONV19]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CHECK-NEXT:    [[TMP23:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP24:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV20:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP24]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP25:%.*]] = call float @llvm.powi.f32.i32(float [[TMP23]], i32 [[CONV20]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CHECK-NEXT:    [[TMP26:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV21:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP26]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP27:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV22:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP27]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP28:%.*]] = call x86_fp80 @llvm.powi.f80.i32(x86_fp80 [[CONV21]], i32 [[CONV22]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CHECK-NEXT:    [[TMP29:%.*]] = load half, ptr [[H_ADDR]], align 2
+// CHECK-NEXT:    [[TMP30:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// CHECK-NEXT:    [[TMP31:%.*]] = load i32, ptr [[TMP30]], align 4
+// CHECK-NEXT:    [[TMP32:%.*]] = call half @llvm.ldexp.f16.i32(half [[TMP29]], i32 [[TMP31]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store half [[TMP32]], ptr [[H_ADDR]], align 2
+// CHECK-NEXT:    [[TMP33:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// CHECK-NEXT:    [[TMP34:%.*]] = load double, ptr [[TMP33]], align 8
+// CHECK-NEXT:    [[TMP35:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// CHECK-NEXT:    [[TMP36:%.*]] = load i32, ptr [[TMP35]], align 4
+// CHECK-NEXT:    [[TMP37:%.*]] = call double @llvm.ldexp.f64.i32(double [[TMP34]], i32 [[TMP36]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP38:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// CHECK-NEXT:    store double [[TMP37]], ptr [[TMP38]], align 8
+// CHECK-NEXT:    [[TMP39:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP40:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// CHECK-NEXT:    [[TMP41:%.*]] = load i32, ptr [[TMP40]], align 4
+// CHECK-NEXT:    [[TMP42:%.*]] = call float @llvm.ldexp.f32.i32(float [[TMP39]], i32 [[TMP41]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store float [[TMP42]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP43:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// CHECK-NEXT:    [[TMP44:%.*]] = load x86_fp80, ptr [[TMP43]], align 16
+// CHECK-NEXT:    [[TMP45:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// CHECK-NEXT:    [[TMP46:%.*]] = load i32, ptr [[TMP45]], align 4
+// CHECK-NEXT:    [[TMP47:%.*]] = call x86_fp80 @llvm.ldexp.f80.i32(x86_fp80 [[TMP44]], i32 [[TMP46]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP48:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV23:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP48]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP49:%.*]] = call double @llvm.acos.f64(double [[CONV23]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP50:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP51:%.*]] = call float @llvm.acos.f32(float [[TMP50]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP52:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV24:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP52]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP53:%.*]] = call x86_fp80 @llvm.acos.f80(x86_fp80 [[CONV24]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP54:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV25:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP54]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP55:%.*]] = call fp128 @llvm.acos.f128(fp128 [[CONV25]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP56:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV26:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP56]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP57:%.*]] = call double @llvm.asin.f64(double [[CONV26]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP58:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP59:%.*]] = call float @llvm.asin.f32(float [[TMP58]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP60:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV27:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP60]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP61:%.*]] = call x86_fp80 @llvm.asin.f80(x86_fp80 [[CONV27]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP62:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV28:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP62]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP63:%.*]] = call fp128 @llvm.asin.f128(fp128 [[CONV28]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP64:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV29:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP64]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP65:%.*]] = call double @llvm.atan.f64(double [[CONV29]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP66:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP67:%.*]] = call float @llvm.atan.f32(float [[TMP66]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP68:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV30:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP68]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP69:%.*]] = call x86_fp80 @llvm.atan.f80(x86_fp80 [[CONV30]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP70:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV31:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP70]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP71:%.*]] = call fp128 @llvm.atan.f128(fp128 [[CONV31]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP72:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV32:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP72]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP73:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV33:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP73]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP74:%.*]] = call double @llvm.atan2.f64(double [[CONV32]], double [[CONV33]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP75:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP76:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP77:%.*]] = call float @llvm.atan2.f32(float [[TMP75]], float [[TMP76]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP78:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV34:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP78]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP79:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV35:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP79]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP80:%.*]] = call x86_fp80 @llvm.atan2.f80(x86_fp80 [[CONV34]], x86_fp80 [[CONV35]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP81:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV36:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP81]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP82:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV37:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP82]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP83:%.*]] = call fp128 @llvm.atan2.f128(fp128 [[CONV36]], fp128 [[CONV37]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP84:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV38:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP84]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP85:%.*]] = call double @llvm.ceil.f64(double [[CONV38]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP86:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP87:%.*]] = call float @llvm.ceil.f32(float [[TMP86]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP88:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV39:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP88]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP89:%.*]] = call x86_fp80 @llvm.ceil.f80(x86_fp80 [[CONV39]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP90:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV40:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP90]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP91:%.*]] = call fp128 @llvm.ceil.f128(fp128 [[CONV40]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP92:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV41:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP92]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP93:%.*]] = call double @llvm.cos.f64(double [[CONV41]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP94:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP95:%.*]] = call float @llvm.cos.f32(float [[TMP94]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP96:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV42:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP96]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP97:%.*]] = call x86_fp80 @llvm.cos.f80(x86_fp80 [[CONV42]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP98:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV43:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP98]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP99:%.*]] = call fp128 @llvm.cos.f128(fp128 [[CONV43]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP100:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV44:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP100]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP101:%.*]] = call double @llvm.cosh.f64(double [[CONV44]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP102:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP103:%.*]] = call float @llvm.cosh.f32(float [[TMP102]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP104:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV45:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP104]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP105:%.*]] = call x86_fp80 @llvm.cosh.f80(x86_fp80 [[CONV45]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP106:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV46:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP106]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP107:%.*]] = call fp128 @llvm.cosh.f128(fp128 [[CONV46]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP108:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV47:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP108]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP109:%.*]] = call double @llvm.exp.f64(double [[CONV47]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP110:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP111:%.*]] = call float @llvm.exp.f32(float [[TMP110]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP112:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV48:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP112]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP113:%.*]] = call x86_fp80 @llvm.exp.f80(x86_fp80 [[CONV48]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP114:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV49:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP114]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP115:%.*]] = call fp128 @llvm.exp.f128(fp128 [[CONV49]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP116:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV50:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP116]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP117:%.*]] = call double @llvm.exp2.f64(double [[CONV50]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP118:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP119:%.*]] = call float @llvm.exp2.f32(float [[TMP118]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP120:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV51:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP120]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP121:%.*]] = call x86_fp80 @llvm.exp2.f80(x86_fp80 [[CONV51]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP122:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV52:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP122]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP123:%.*]] = call fp128 @llvm.exp2.f128(fp128 [[CONV52]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP124:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV53:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP124]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[CALL:%.*]] = call double @exp10(double noundef [[CONV53]]) #[[ATTR5:[0-9]+]]
+// CHECK-NEXT:    [[TMP125:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CALL54:%.*]] = call float @exp10f(float noundef [[TMP125]]) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP126:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV55:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP126]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[CALL56:%.*]] = call x86_fp80 @exp10l(x86_fp80 noundef [[CONV55]]) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP127:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV57:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP127]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[CALL58:%.*]] = call fp128 @exp10f128(fp128 noundef [[CONV57]]) #[[ATTR5]]
+// CHECK-NEXT:    [[TMP128:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV59:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP128]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP129:%.*]] = call double @llvm.floor.f64(double [[CONV59]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP130:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP131:%.*]] = call float @llvm.floor.f32(float [[TMP130]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP132:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV60:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP132]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP133:%.*]] = call x86_fp80 @llvm.floor.f80(x86_fp80 [[CONV60]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP134:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV61:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP134]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP135:%.*]] = call fp128 @llvm.floor.f128(fp128 [[CONV61]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP136:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV62:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP136]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP137:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV63:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP137]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP138:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV64:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP138]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP139:%.*]] = call double @llvm.fma.f64(double [[CONV62]], double [[CONV63]], double [[CONV64]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP140:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP141:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP142:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP143:%.*]] = call float @llvm.fma.f32(float [[TMP140]], float [[TMP141]], float [[TMP142]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP144:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV65:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP144]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP145:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV66:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP145]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP146:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV67:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP146]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP147:%.*]] = call x86_fp80 @llvm.fma.f80(x86_fp80 [[CONV65]], x86_fp80 [[CONV66]], x86_fp80 [[CONV67]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP148:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV68:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP148]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP149:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV69:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP149]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP150:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV70:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP150]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP151:%.*]] = call fp128 @llvm.fma.f128(fp128 [[CONV68]], fp128 [[CONV69]], fp128 [[CONV70]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP152:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV71:%.*]] = call half @llvm.fptrunc.f16.f32(float [[TMP152]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP153:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV72:%.*]] = call half @llvm.fptrunc.f16.f32(float [[TMP153]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP154:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV73:%.*]] = call half @llvm.fptrunc.f16.f32(float [[TMP154]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP155:%.*]] = call half @llvm.fma.f16(half [[CONV71]], half [[CONV72]], half [[CONV73]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP156:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV74:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP156]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP157:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV75:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP157]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP158:%.*]] = call double @llvm.maxnum.f64(double [[CONV74]], double [[CONV75]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP159:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP160:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP161:%.*]] = call float @llvm.maxnum.f32(float [[TMP159]], float [[TMP160]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP162:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV76:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP162]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP163:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV77:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP163]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP164:%.*]] = call x86_fp80 @llvm.maxnum.f80(x86_fp80 [[CONV76]], x86_fp80 [[CONV77]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP165:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV78:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP165]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP166:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV79:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP166]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP167:%.*]] = call fp128 @llvm.maxnum.f128(fp128 [[CONV78]], fp128 [[CONV79]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP168:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV80:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP168]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP169:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV81:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP169]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP170:%.*]] = call double @llvm.minnum.f64(double [[CONV80]], double [[CONV81]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP171:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP172:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP173:%.*]] = call float @llvm.minnum.f32(float [[TMP171]], float [[TMP172]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP174:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV82:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP174]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP175:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV83:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP175]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP176:%.*]] = call x86_fp80 @llvm.minnum.f80(x86_fp80 [[CONV82]], x86_fp80 [[CONV83]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP177:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV84:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP177]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP178:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV85:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP178]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP179:%.*]] = call fp128 @llvm.minnum.f128(fp128 [[CONV84]], fp128 [[CONV85]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP180:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV86:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP180]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP181:%.*]] = call i64 @llvm.llrint.i64.f64(double [[CONV86]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP182:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP183:%.*]] = call i64 @llvm.llrint.i64.f32(float [[TMP182]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP184:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV87:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP184]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP185:%.*]] = call i64 @llvm.llrint.i64.f80(x86_fp80 [[CONV87]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP186:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV88:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP186]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP187:%.*]] = call i64 @llvm.llrint.i64.f128(fp128 [[CONV88]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP188:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV89:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP188]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP189:%.*]] = call i64 @llvm.llround.i64.f64(double [[CONV89]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP190:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP191:%.*]] = call i64 @llvm.llround.i64.f32(float [[TMP190]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP192:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV90:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP192]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP193:%.*]] = call i64 @llvm.llround.i64.f80(x86_fp80 [[CONV90]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP194:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV91:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP194]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP195:%.*]] = call i64 @llvm.llround.i64.f128(fp128 [[CONV91]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP196:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV92:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP196]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP197:%.*]] = call double @llvm.log.f64(double [[CONV92]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP198:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP199:%.*]] = call float @llvm.log.f32(float [[TMP198]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP200:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV93:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP200]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP201:%.*]] = call x86_fp80 @llvm.log.f80(x86_fp80 [[CONV93]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP202:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV94:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP202]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP203:%.*]] = call fp128 @llvm.log.f128(fp128 [[CONV94]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP204:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV95:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP204]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP205:%.*]] = call double @llvm.log10.f64(double [[CONV95]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP206:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP207:%.*]] = call float @llvm.log10.f32(float [[TMP206]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP208:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV96:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP208]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP209:%.*]] = call x86_fp80 @llvm.log10.f80(x86_fp80 [[CONV96]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP210:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV97:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP210]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP211:%.*]] = call fp128 @llvm.log10.f128(fp128 [[CONV97]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP212:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV98:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP212]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP213:%.*]] = call double @llvm.log2.f64(double [[CONV98]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP214:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP215:%.*]] = call float @llvm.log2.f32(float [[TMP214]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP216:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV99:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP216]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP217:%.*]] = call x86_fp80 @llvm.log2.f80(x86_fp80 [[CONV99]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP218:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV100:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP218]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP219:%.*]] = call fp128 @llvm.log2.f128(fp128 [[CONV100]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP220:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV101:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP220]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP221:%.*]] = call i64 @llvm.lrint.i64.f64(double [[CONV101]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP222:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP223:%.*]] = call i64 @llvm.lrint.i64.f32(float [[TMP222]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP224:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV102:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP224]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP225:%.*]] = call i64 @llvm.lrint.i64.f80(x86_fp80 [[CONV102]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP226:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV103:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP226]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP227:%.*]] = call i64 @llvm.lrint.i64.f128(fp128 [[CONV103]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP228:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV104:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP228]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP229:%.*]] = call i64 @llvm.lround.i64.f64(double [[CONV104]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP230:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP231:%.*]] = call i64 @llvm.lround.i64.f32(float [[TMP230]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP232:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV105:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP232]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP233:%.*]] = call i64 @llvm.lround.i64.f80(x86_fp80 [[CONV105]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP234:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV106:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP234]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP235:%.*]] = call i64 @llvm.lround.i64.f128(fp128 [[CONV106]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP236:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV107:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP236]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP237:%.*]] = call double @llvm.nearbyint.f64(double [[CONV107]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP238:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP239:%.*]] = call float @llvm.nearbyint.f32(float [[TMP238]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP240:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV108:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP240]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP241:%.*]] = call x86_fp80 @llvm.nearbyint.f80(x86_fp80 [[CONV108]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP242:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV109:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP242]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP243:%.*]] = call fp128 @llvm.nearbyint.f128(fp128 [[CONV109]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP244:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV110:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP244]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP245:%.*]] = call double @llvm.rint.f64(double [[CONV110]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP246:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP247:%.*]] = call float @llvm.rint.f32(float [[TMP246]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP248:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV111:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP248]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP249:%.*]] = call x86_fp80 @llvm.rint.f80(x86_fp80 [[CONV111]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP250:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV112:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP250]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP251:%.*]] = call fp128 @llvm.rint.f128(fp128 [[CONV112]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP252:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV113:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP252]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP253:%.*]] = call double @llvm.round.f64(double [[CONV113]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP254:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP255:%.*]] = call float @llvm.round.f32(float [[TMP254]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP256:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV114:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP256]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP257:%.*]] = call x86_fp80 @llvm.round.f80(x86_fp80 [[CONV114]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP258:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV115:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP258]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP259:%.*]] = call fp128 @llvm.round.f128(fp128 [[CONV115]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP260:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV116:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP260]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP261:%.*]] = call double @llvm.sin.f64(double [[CONV116]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP262:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP263:%.*]] = call float @llvm.sin.f32(float [[TMP262]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP264:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV117:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP264]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP265:%.*]] = call x86_fp80 @llvm.sin.f80(x86_fp80 [[CONV117]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP266:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV118:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP266]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP267:%.*]] = call fp128 @llvm.sin.f128(fp128 [[CONV118]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP268:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV119:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP268]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP269:%.*]] = call double @llvm.sinh.f64(double [[CONV119]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP270:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP271:%.*]] = call float @llvm.sinh.f32(float [[TMP270]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP272:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV120:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP272]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP273:%.*]] = call x86_fp80 @llvm.sinh.f80(x86_fp80 [[CONV120]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP274:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV121:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP274]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP275:%.*]] = call fp128 @llvm.sinh.f128(fp128 [[CONV121]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP276:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV122:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP276]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP277:%.*]] = call double @llvm.sqrt.f64(double [[CONV122]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP278:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP279:%.*]] = call float @llvm.sqrt.f32(float [[TMP278]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP280:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV123:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP280]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP281:%.*]] = call x86_fp80 @llvm.sqrt.f80(x86_fp80 [[CONV123]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP282:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV124:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP282]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP283:%.*]] = call fp128 @llvm.sqrt.f128(fp128 [[CONV124]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP284:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV125:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP284]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP285:%.*]] = call double @llvm.tan.f64(double [[CONV125]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP286:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP287:%.*]] = call float @llvm.tan.f32(float [[TMP286]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP288:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV126:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP288]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP289:%.*]] = call x86_fp80 @llvm.tan.f80(x86_fp80 [[CONV126]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP290:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV127:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP290]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP291:%.*]] = call fp128 @llvm.tan.f128(fp128 [[CONV127]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP292:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV128:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP292]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP293:%.*]] = call double @llvm.tanh.f64(double [[CONV128]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP294:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP295:%.*]] = call float @llvm.tanh.f32(float [[TMP294]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP296:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV129:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP296]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP297:%.*]] = call x86_fp80 @llvm.tanh.f80(x86_fp80 [[CONV129]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP298:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV130:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP298]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP299:%.*]] = call fp128 @llvm.tanh.f128(fp128 [[CONV130]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP300:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV131:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP300]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP301:%.*]] = call double @llvm.trunc.f64(double [[CONV131]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP302:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP303:%.*]] = call float @llvm.trunc.f32(float [[TMP302]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP304:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV132:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP304]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP305:%.*]] = call x86_fp80 @llvm.trunc.f80(x86_fp80 [[CONV132]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP306:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV133:%.*]] = call fp128 @llvm.fpext.f128.f32(float [[TMP306]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP307:%.*]] = call fp128 @llvm.trunc.f128(fp128 [[CONV133]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    ret void
+//
 void foo(double *d, float f, float *fp, long double *l, int *i, const char *c, _Float16 h) {
   f = __builtin_fmod(f,f);    f = __builtin_fmodf(f,f);   f =  __builtin_fmodl(f,f); f = __builtin_fmodf128(f,f);
 
-// CHECK: call double @llvm.experimental.constrained.frem.f64(double %{{.*}}, double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.frem.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.frem.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.frem.f128(fp128 %{{.*}}, fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_pow(f,f);        __builtin_powf(f,f);       __builtin_powl(f,f); __builtin_powf128(f,f);
 
-// CHECK: call double @llvm.experimental.constrained.pow.f64(double %{{.*}}, double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.pow.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.pow.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.pow.f128(fp128 %{{.*}}, fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_powi(f,f);        __builtin_powif(f,f);       __builtin_powil(f,f);
 
-// CHECK: call double @llvm.experimental.constrained.powi.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.powi.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.powi.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 
   h = __builtin_ldexpf16(h, *i);  *d = __builtin_ldexp(*d, *i);        f = __builtin_ldexpf(f, *i);       __builtin_ldexpl(*l, *i);
 
-// CHECK: call half @llvm.experimental.constrained.ldexp.f16.i32(half %{{.*}}, i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call double @llvm.experimental.constrained.ldexp.f64.i32(double %{{.*}}, i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.ldexp.f32.i32(float %{{.*}}, i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.ldexp.f80.i32(x86_fp80 %{{.*}}, i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_acos(f);        __builtin_acosf(f);       __builtin_acosl(f); __builtin_acosf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.acos.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.acos.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.acos.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.acos.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 __builtin_asin(f);        __builtin_asinf(f);       __builtin_asinl(f); __builtin_asinf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.asin.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.asin.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.asin.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.asin.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 __builtin_atan(f);        __builtin_atanf(f);       __builtin_atanl(f); __builtin_atanf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.atan.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.atan.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.atan.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.atan.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 __builtin_atan2(f,f);        __builtin_atan2f(f,f);       __builtin_atan2l(f,f); __builtin_atan2f128(f,f);
 
-// CHECK: call double @llvm.experimental.constrained.atan2.f64(double %{{.*}}, double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.atan2.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.atan2.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.atan2.f128(fp128 %{{.*}}, fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_ceil(f);       __builtin_ceilf(f);      __builtin_ceill(f); __builtin_ceilf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.ceil.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.ceil.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.ceil.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.ceil.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_cos(f);        __builtin_cosf(f);       __builtin_cosl(f); __builtin_cosf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.cos.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.cos.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.cos.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.cos.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_cosh(f);        __builtin_coshf(f);       __builtin_coshl(f); __builtin_coshf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.cosh.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.cosh.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.cosh.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.cosh.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_exp(f);        __builtin_expf(f);       __builtin_expl(f); __builtin_expf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.exp.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.exp.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.exp.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.exp.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_exp2(f);       __builtin_exp2f(f);      __builtin_exp2l(f); __builtin_exp2f128(f);
 
-// CHECK: call double @llvm.experimental.constrained.exp2.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.exp2.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.exp2.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.exp2.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_exp10(f);       __builtin_exp10f(f);      __builtin_exp10l(f); __builtin_exp10f128(f);
 
-// CHECK: call double @exp10(double noundef %{{.*}})
-// CHECK: call float @exp10f(float noundef %{{.*}})
-// CHECK: call x86_fp80 @exp10l(x86_fp80 noundef %{{.*}})
-// CHECK: call fp128 @exp10f128(fp128 noundef %{{.*}})
 
   __builtin_floor(f);      __builtin_floorf(f);     __builtin_floorl(f); __builtin_floorf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.floor.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.floor.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.floor.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.floor.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_fma(f,f,f);        __builtin_fmaf(f,f,f);       __builtin_fmal(f,f,f);  __builtin_fmaf128(f,f,f); __builtin_fmaf16(f,f,f);
 
-// CHECK: call double @llvm.experimental.constrained.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.fma.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.fma.f128(fp128 %{{.*}}, fp128 %{{.*}}, fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call half @llvm.experimental.constrained.fma.f16(half %{{.*}}, half %{{.*}}, half %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_fmax(f,f);       __builtin_fmaxf(f,f);      __builtin_fmaxl(f,f); __builtin_fmaxf128(f,f);
 
-// CHECK: call double @llvm.experimental.constrained.maxnum.f64(double %{{.*}}, double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.maxnum.f32(float %{{.*}}, float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.maxnum.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.maxnum.f128(fp128 %{{.*}}, fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_fmin(f,f);       __builtin_fminf(f,f);      __builtin_fminl(f,f); __builtin_fminf128(f,f);
 
-// CHECK: call double @llvm.experimental.constrained.minnum.f64(double %{{.*}}, double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.minnum.f32(float %{{.*}}, float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.minnum.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.minnum.f128(fp128 %{{.*}}, fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_llrint(f);     __builtin_llrintf(f);    __builtin_llrintl(f); __builtin_llrintf128(f);
 
-// CHECK: call i64 @llvm.experimental.constrained.llrint.i64.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.llrint.i64.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.llrint.i64.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.llrint.i64.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_llround(f);    __builtin_llroundf(f);   __builtin_llroundl(f); __builtin_llroundf128(f);
 
-// CHECK: call i64 @llvm.experimental.constrained.llround.i64.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.llround.i64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.llround.i64.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.llround.i64.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_log(f);        __builtin_logf(f);       __builtin_logl(f); __builtin_logf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.log.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.log.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.log.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.log.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_log10(f);      __builtin_log10f(f);     __builtin_log10l(f); __builtin_log10f128(f);
 
-// CHECK: call double @llvm.experimental.constrained.log10.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.log10.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.log10.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.log10.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_log2(f);       __builtin_log2f(f);      __builtin_log2l(f); __builtin_log2f128(f);
 
-// CHECK: call double @llvm.experimental.constrained.log2.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.log2.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.log2.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.log2.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_lrint(f);      __builtin_lrintf(f);     __builtin_lrintl(f); __builtin_lrintf128(f);
 
-// CHECK: call i64 @llvm.experimental.constrained.lrint.i64.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.lrint.i64.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.lrint.i64.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.lrint.i64.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_lround(f);     __builtin_lroundf(f);    __builtin_lroundl(f); __builtin_lroundf128(f);
 
-// CHECK: call i64 @llvm.experimental.constrained.lround.i64.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.lround.i64.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.lround.i64.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call i64 @llvm.experimental.constrained.lround.i64.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_nearbyint(f);  __builtin_nearbyintf(f); __builtin_nearbyintl(f); __builtin_nearbyintf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.nearbyint.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.nearbyint.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.nearbyint.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.nearbyint.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_rint(f);       __builtin_rintf(f);      __builtin_rintl(f); __builtin_rintf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.rint.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.rint.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.rint.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.rint.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_round(f);      __builtin_roundf(f);     __builtin_roundl(f); __builtin_roundf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.round.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.round.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.round.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.round.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 
   __builtin_sin(f);        __builtin_sinf(f);       __builtin_sinl(f); __builtin_sinf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.sin.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.sin.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.sin.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.sin.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_sinh(f);        __builtin_sinhf(f);       __builtin_sinhl(f); __builtin_sinhf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.sinh.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.sinh.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.sinh.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.sinh.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_sqrt(f);       __builtin_sqrtf(f);      __builtin_sqrtl(f); __builtin_sqrtf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.sqrt.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.sqrt.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.sqrt.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_tan(f);        __builtin_tanf(f);       __builtin_tanl(f); __builtin_tanf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.tan.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.tan.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.tan.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_tanh(f);        __builtin_tanhf(f);       __builtin_tanhl(f); __builtin_tanhf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.tanh.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.tanh.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.tanh.f80(x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.tanh.f128(fp128 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
   __builtin_trunc(f);      __builtin_truncf(f);     __builtin_truncl(f); __builtin_truncf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.trunc.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.trunc.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
 };
 
-// CHECK: declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.frem.f32(float, float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.frem.f80(x86_fp80, x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.frem.f128(fp128, fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.pow.f32(float, float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.pow.f80(x86_fp80, x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.pow.f128(fp128, fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.powi.f32(float, i32, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.powi.f80(x86_fp80, i32, metadata, metadata)
-
-// CHECK: declare half @llvm.experimental.constrained.ldexp.f16.i32(half, i32, metadata, metadata)
-// CHECK: declare double @llvm.experimental.constrained.ldexp.f64.i32(double, i32, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.ldexp.f32.i32(float, i32, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.ldexp.f80.i32(x86_fp80, i32, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.ceil.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.ceil.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.ceil.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.ceil.f128(fp128, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.cos.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.cos.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.cos.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.exp.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.exp.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.exp.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.exp2.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.exp2.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.exp2.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @exp10(double noundef)
-// CHECK: declare float @exp10f(float noundef)
-// CHECK: declare x86_fp80 @exp10l(x86_fp80 noundef)
-// CHECK: declare fp128 @exp10f128(fp128 noundef)
-
-// CHECK: declare double @llvm.experimental.constrained.floor.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.floor.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.floor.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.floor.f128(fp128, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.fma.f80(x86_fp80, x86_fp80, x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.fma.f128(fp128, fp128, fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.maxnum.f64(double, double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.maxnum.f32(float, float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.maxnum.f80(x86_fp80, x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.maxnum.f128(fp128, fp128, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.minnum.f64(double, double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.minnum.f32(float, float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.minnum.f80(x86_fp80, x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.minnum.f128(fp128, fp128, metadata)
-
-// CHECK: declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.llrint.i64.f32(float, metadata, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.llrint.i64.f80(x86_fp80, metadata, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.llrint.i64.f128(fp128, metadata, metadata)
-
-// CHECK: declare i64 @llvm.experimental.constrained.llround.i64.f64(double, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.llround.i64.f32(float, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.llround.i64.f80(x86_fp80, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.llround.i64.f128(fp128, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.log.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.log.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.log.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.log10.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.log10.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.log10.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.log2.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.log2.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.log2.f128(fp128, metadata, metadata)
-
-// CHECK: declare i64 @llvm.experimental.constrained.lrint.i64.f64(double, metadata, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.lrint.i64.f32(float, metadata, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.lrint.i64.f80(x86_fp80, metadata, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.lrint.i64.f128(fp128, metadata, metadata)
-
-// CHECK: declare i64 @llvm.experimental.constrained.lround.i64.f64(double, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.lround.i64.f32(float, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.lround.i64.f80(x86_fp80, metadata)
-// CHECK: declare i64 @llvm.experimental.constrained.lround.i64.f128(fp128, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.nearbyint.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.nearbyint.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.nearbyint.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.rint.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.rint.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.rint.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.round.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.round.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.round.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.round.f128(fp128, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.sin.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.sin.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.sin.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.sqrt.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.sqrt.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.tan.f64(double, metadata, metadata)
-// CHECK: declare float @llvm.experimental.constrained.tan.f32(float, metadata, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
-
-// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.trunc.f128(fp128, metadata)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
 #pragma STDC FP_CONTRACT ON
+// CHECK-LABEL: define dso_local void @bar(
+// CHECK-SAME: float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP4:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP4]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP5:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP5]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP6:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV2:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[NEG:%.*]] = fneg double [[CONV2]]
+// CHECK-NEXT:    [[TMP7:%.*]] = call double @llvm.fmuladd.f64(double [[CONV]], double [[CONV1]], double [[NEG]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP8:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[FNEG:%.*]] = fneg float [[TMP8]]
+// CHECK-NEXT:    [[CONV3:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[FNEG]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP9:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV4:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP9]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP10:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[CONV5:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP10]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP11:%.*]] = call x86_fp80 @llvm.fmuladd.f80(x86_fp80 [[CONV3]], x86_fp80 [[CONV4]], x86_fp80 [[CONV5]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP12:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP13:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP14:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[NEG7:%.*]] = fneg float [[TMP12]]
+// CHECK-NEXT:    [[NEG8:%.*]] = fneg float [[TMP14]]
+// CHECK-NEXT:    [[TMP15:%.*]] = call float @llvm.fmuladd.f32(float [[NEG7]], float [[TMP13]], float [[NEG8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[TMP16:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP17:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[TMP18:%.*]] = load float, ptr [[F_ADDR]], align 4
+// CHECK-NEXT:    [[NEG10:%.*]] = fneg float [[TMP17]]
+// CHECK-NEXT:    [[TMP19:%.*]] = call float @llvm.fmuladd.f32(float [[NEG10]], float [[TMP18]], float [[TMP16]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    ret void
+//
 void bar(float f) {
   f * f + f;
   (double)f * f - f;
@@ -390,14 +656,4 @@ void bar(float f) {
   -(f * f) - f;
   f + -(f * f);
 
-  // CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: fneg
-  // CHECK: call double @llvm.experimental.constrained.fmuladd.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: fneg
-  // CHECK: call x86_fp80 @llvm.experimental.constrained.fmuladd.f80(x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, x86_fp80 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: fneg
-  // CHECK: fneg
-  // CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: fneg
-  // CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 };
diff --git a/clang/test/CodeGen/cx-complex-range-real.c b/clang/test/CodeGen/cx-complex-range-real.c
index 06786d376f0fb..5d0541e99ecf3 100644
--- a/clang/test/CodeGen/cx-complex-range-real.c
+++ b/clang/test/CodeGen/cx-complex-range-real.c
@@ -50,8 +50,8 @@
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[MUL_RL:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A_SROA_0_0_VEC_EXTRACT]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3:[0-9]+]]
-// PRMTD_STRICT-NEXT:    [[MUL_IL:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A_SROA_0_4_VEC_EXTRACT]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[MUL_RL:%.*]] = fmul float [[A_SROA_0_0_VEC_EXTRACT]], [[B]]
+// PRMTD_STRICT-NEXT:    [[MUL_IL:%.*]] = fmul float [[A_SROA_0_4_VEC_EXTRACT]], [[B]]
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[MUL_RL]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[MUL_IL]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -106,14 +106,14 @@ _Complex float mulaf(_Complex float a, float b) {
 // PRMTD-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @mulassignf(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR2:[0-9]+]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR1:[0-9]+]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// PRMTD_STRICT-NEXT:    [[MUL_RL:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTREAL]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[MUL_IL:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTIMAG]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[MUL_RL:%.*]] = fmul float [[DOTREAL]], [[B]]
+// PRMTD_STRICT-NEXT:    [[MUL_IL:%.*]] = fmul float [[DOTIMAG]], [[B]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store float [[MUL_RL]], ptr [[DOTREALP1]], align 4
@@ -162,8 +162,8 @@ void mulassignf(_Complex float *a, float b) {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[MUL_RL:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A]], float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[MUL_IR:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A]], float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[MUL_RL:%.*]] = fmul float [[A]], [[B_SROA_0_0_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_IR:%.*]] = fmul float [[A]], [[B_SROA_0_4_VEC_EXTRACT]]
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[MUL_RL]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[MUL_IR]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -210,8 +210,8 @@ _Complex float mulbf(float a, _Complex float b) {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[A_SROA_0_0_VEC_EXTRACT]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[A_SROA_0_4_VEC_EXTRACT]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fdiv float [[A_SROA_0_0_VEC_EXTRACT]], [[B]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fdiv float [[A_SROA_0_4_VEC_EXTRACT]], [[B]]
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[TMP0]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[TMP1]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -266,14 +266,14 @@ _Complex float divf(_Complex float a, float b) {
 // PRMTD-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @divassignf(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[DOTREAL]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.experimental.constrained.fdiv.f32(float [[DOTIMAG]], float [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fdiv float [[DOTREAL]], [[B]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fdiv float [[DOTIMAG]], [[B]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store float [[TMP0]], ptr [[DOTREALP1]], align 4
@@ -312,10 +312,10 @@ void divassignf(_Complex float *a, float b) {
 // PRMTD-NEXT:    ret { double, double } [[DOTFCA_1_INSERT]]
 //
 // PRMTD_STRICT-LABEL: define dso_local { double, double } @divd(
-// PRMTD_STRICT-SAME: double noundef [[A_COERCE0:%.*]], double noundef [[A_COERCE1:%.*]], double noundef [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: double noundef [[A_COERCE0:%.*]], double noundef [[A_COERCE1:%.*]], double noundef [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A_COERCE0]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[A_COERCE1]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fdiv double [[A_COERCE0]], [[B]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fdiv double [[A_COERCE1]], [[B]]
 // PRMTD_STRICT-NEXT:    [[DOTFCA_0_INSERT:%.*]] = insertvalue { double, double } poison, double [[TMP0]], 0
 // PRMTD_STRICT-NEXT:    [[DOTFCA_1_INSERT:%.*]] = insertvalue { double, double } [[DOTFCA_0_INSERT]], double [[TMP1]], 1
 // PRMTD_STRICT-NEXT:    ret { double, double } [[DOTFCA_1_INSERT]]
@@ -402,24 +402,24 @@ _Complex double divd(_Complex double a, double b) {
 // PRMTD_STRICT-LABEL: define dso_local <2 x float> @divbd(
 // PRMTD_STRICT-SAME: double noundef [[A:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR0]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
-// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[A]], metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[B_COERCE0]], metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[EXT2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[B_COERCE1]], metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT]], x86_fp80 [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 0xK00000000000000000000, x86_fp80 [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT1]], x86_fp80 [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT2]], x86_fp80 [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP3]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 0xK00000000000000000000, x86_fp80 [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT]], x86_fp80 [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[TMP6]], x86_fp80 [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP2]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP8]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = call double @llvm.experimental.constrained.fptrunc.f64.f80(x86_fp80 [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION3:%.*]] = call double @llvm.experimental.constrained.fptrunc.f64.f80(x86_fp80 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[UNPROMOTION]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[CONV4:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[UNPROMOTION3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext double [[A]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext double [[B_COERCE0]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[EXT2:%.*]] = fpext double [[B_COERCE1]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fmul x86_fp80 [[EXT]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fmul x86_fp80 0xK00000000000000000000, [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fadd x86_fp80 [[TMP0]], [[TMP1]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul x86_fp80 [[EXT1]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fmul x86_fp80 [[EXT2]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fadd x86_fp80 [[TMP3]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fmul x86_fp80 0xK00000000000000000000, [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul x86_fp80 [[EXT]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fsub x86_fp80 [[TMP6]], [[TMP7]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fdiv x86_fp80 [[TMP2]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv x86_fp80 [[TMP8]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = fptrunc x86_fp80 [[TMP9]] to double
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION3:%.*]] = fptrunc x86_fp80 [[TMP10]] to double
+// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = fptrunc double [[UNPROMOTION]] to float
+// PRMTD_STRICT-NEXT:    [[CONV4:%.*]] = fptrunc double [[UNPROMOTION3]] to float
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[CONV]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[CONV4]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -474,14 +474,14 @@ _Complex float divbd(double a, _Complex double b) {
 // PRMTD-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @divassignd(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[DOTREAL]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[DOTIMAG]], double [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fdiv double [[DOTREAL]], [[B]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fdiv double [[DOTIMAG]], [[B]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store double [[TMP0]], ptr [[DOTREALP1]], align 8
@@ -532,14 +532,14 @@ void divassignd(_Complex double *a, double b) {
 // PRMTD-NEXT:    ret { x86_fp80, x86_fp80 } [[DOTFCA_1_INSERT]]
 //
 // PRMTD_STRICT-LABEL: define dso_local { x86_fp80, x86_fp80 } @divld(
-// PRMTD_STRICT-SAME: ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[A:%.*]], x86_fp80 noundef [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[A:%.*]], x86_fp80 noundef [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[A_REALP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[A_REAL:%.*]] = load x86_fp80, ptr [[A_REALP]], align 16
 // PRMTD_STRICT-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[A_IMAG:%.*]] = load x86_fp80, ptr [[A_IMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[A_REAL]], x86_fp80 [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[A_IMAG]], x86_fp80 [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fdiv x86_fp80 [[A_REAL]], [[B]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fdiv x86_fp80 [[A_IMAG]], [[B]]
 // PRMTD_STRICT-NEXT:    [[DOTFCA_0_INSERT:%.*]] = insertvalue { x86_fp80, x86_fp80 } poison, x86_fp80 [[TMP0]], 0
 // PRMTD_STRICT-NEXT:    [[DOTFCA_1_INSERT:%.*]] = insertvalue { x86_fp80, x86_fp80 } [[DOTFCA_0_INSERT]], x86_fp80 [[TMP1]], 1
 // PRMTD_STRICT-NEXT:    ret { x86_fp80, x86_fp80 } [[DOTFCA_1_INSERT]]
@@ -594,14 +594,14 @@ _Complex long double divld(_Complex long double a, long double b) {
 // PRMTD-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @divassignld(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], x86_fp80 noundef [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], x86_fp80 noundef [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  [[ENTRY:.*:]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load x86_fp80, ptr [[DOTREALP]], align 16
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load x86_fp80, ptr [[DOTIMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[DOTREAL]], x86_fp80 [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[DOTIMAG]], x86_fp80 [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fdiv x86_fp80 [[DOTREAL]], [[B]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fdiv x86_fp80 [[DOTIMAG]], [[B]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store x86_fp80 [[TMP0]], ptr [[DOTREALP1]], align 16
diff --git a/clang/test/CodeGen/cx-complex-range.c b/clang/test/CodeGen/cx-complex-range.c
index 71000e6d9112b..7652d7dba32c4 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -379,23 +379,23 @@
 // X86WINPRMTD_STRICT-NEXT:    [[B_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i64 [[B_COERCE]], 32
 // X86WINPRMTD_STRICT-NEXT:    [[B_SROA_2_0_EXTRACT_TRUNC:%.*]] = trunc i64 [[B_SROA_2_0_EXTRACT_SHIFT]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = bitcast i32 [[B_SROA_2_0_EXTRACT_TRUNC]] to float
-// X86WINPRMTD_STRICT-NEXT:    [[EXT:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP0]], metadata !"fpexcept.strict") #[[ATTR3:[0-9]+]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP1]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT2:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP2]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT3:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP3]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT1]], double [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP4]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT2]], double [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT3]], double [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP7]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT1]], double [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP10]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP6]], double [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP12]], double [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION4:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP14]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext float [[TMP0]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext float [[TMP1]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT2:%.*]] = fpext float [[TMP2]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT3:%.*]] = fpext float [[TMP3]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fmul double [[EXT]], [[EXT2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul double [[EXT1]], [[EXT3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd double [[TMP4]], [[TMP5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul double [[EXT2]], [[EXT2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul double [[EXT3]], [[EXT3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fadd double [[TMP7]], [[TMP8]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fmul double [[EXT1]], [[EXT2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fmul double [[EXT]], [[EXT3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fsub double [[TMP10]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fdiv double [[TMP6]], [[TMP9]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fdiv double [[TMP12]], [[TMP9]]
+// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = fptrunc double [[TMP13]] to float
+// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc double [[TMP14]] to float
 // X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = bitcast float [[UNPROMOTION]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = bitcast float [[UNPROMOTION4]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[RETVAL_SROA_2_0_INSERT_EXT:%.*]] = zext i32 [[TMP16]] to i64
@@ -412,25 +412,25 @@
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[A_SROA_0_0_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4:[0-9]+]]
-// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[A_SROA_0_4_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext float [[A_SROA_0_0_VEC_EXTRACT]] to double
+// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext float [[A_SROA_0_4_VEC_EXTRACT]] to double
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[EXT2:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT3:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT1]], double [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP0]], double [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT2]], double [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT3]], double [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP3]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT1]], double [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP6]], double [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP2]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP8]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION4:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[EXT2:%.*]] = fpext float [[B_SROA_0_0_VEC_EXTRACT]] to double
+// PRMTD_STRICT-NEXT:    [[EXT3:%.*]] = fpext float [[B_SROA_0_4_VEC_EXTRACT]] to double
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fmul double [[EXT]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fmul double [[EXT1]], [[EXT3]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fadd double [[TMP0]], [[TMP1]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[EXT2]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fmul double [[EXT3]], [[EXT3]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fadd double [[TMP3]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fmul double [[EXT1]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul double [[EXT]], [[EXT3]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fsub double [[TMP6]], [[TMP7]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fdiv double [[TMP2]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv double [[TMP8]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = fptrunc double [[TMP9]] to float
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc double [[TMP10]] to float
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[UNPROMOTION]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[UNPROMOTION4]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -794,27 +794,27 @@ _Complex float divf(_Complex float a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i64 [[B_COERCE]], 32
 // X86WINPRMTD_STRICT-NEXT:    [[B_SROA_2_0_EXTRACT_TRUNC:%.*]] = trunc i64 [[B_SROA_2_0_EXTRACT_SHIFT]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = bitcast i32 [[B_SROA_2_0_EXTRACT_TRUNC]] to float
-// X86WINPRMTD_STRICT-NEXT:    [[EXT:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP0]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP1]], metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext float [[TMP0]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext float [[TMP1]] to double
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// X86WINPRMTD_STRICT-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[DOTREAL]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[CONV2:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[DOTIMAG]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV]], double [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV2]], double [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP2]], double [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT1]], double [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP5]], double [[TMP6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV2]], double [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV]], double [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP8]], double [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP4]], double [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP10]], double [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[CONV3:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[CONV4:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[CONV:%.*]] = fpext float [[DOTREAL]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[CONV2:%.*]] = fpext float [[DOTIMAG]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fmul double [[CONV]], [[EXT]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[CONV2]], [[EXT1]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd double [[TMP2]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul double [[EXT]], [[EXT]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fmul double [[EXT1]], [[EXT1]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fadd double [[TMP5]], [[TMP6]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul double [[CONV2]], [[EXT]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fmul double [[CONV]], [[EXT1]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fsub double [[TMP8]], [[TMP9]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv double [[TMP4]], [[TMP7]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fdiv double [[TMP10]], [[TMP7]]
+// X86WINPRMTD_STRICT-NEXT:    [[CONV3:%.*]] = fptrunc double [[TMP11]] to float
+// X86WINPRMTD_STRICT-NEXT:    [[CONV4:%.*]] = fptrunc double [[TMP12]] to float
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREALP5:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP6:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    store float [[CONV3]], ptr [[DOTREALP5]], align 4
@@ -826,27 +826,27 @@ _Complex float divf(_Complex float a, _Complex float b) {
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext float [[B_SROA_0_0_VEC_EXTRACT]] to double
+// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext float [[B_SROA_0_4_VEC_EXTRACT]] to double
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[DOTREAL]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV2:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[DOTIMAG]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV]], double [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV2]], double [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP0]], double [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT1]], double [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP3]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV2]], double [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[CONV]], double [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP6]], double [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP2]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP8]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV3:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV4:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = fpext float [[DOTREAL]] to double
+// PRMTD_STRICT-NEXT:    [[CONV2:%.*]] = fpext float [[DOTIMAG]] to double
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fmul double [[CONV]], [[EXT]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fmul double [[CONV2]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fadd double [[TMP0]], [[TMP1]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[EXT]], [[EXT]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fmul double [[EXT1]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fadd double [[TMP3]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fmul double [[CONV2]], [[EXT]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul double [[CONV]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fsub double [[TMP6]], [[TMP7]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fdiv double [[TMP2]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv double [[TMP8]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[CONV3:%.*]] = fptrunc double [[TMP9]] to float
+// PRMTD_STRICT-NEXT:    [[CONV4:%.*]] = fptrunc double [[TMP10]] to float
 // PRMTD_STRICT-NEXT:    [[DOTREALP5:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP6:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store float [[CONV3]], ptr [[DOTREALP5]], align 4
@@ -871,10 +871,10 @@ void divassignf(_Complex float *a, _Complex float b) {
 // FULL-NEXT:    [[MUL_R:%.*]] = fsub float [[MUL_AC]], [[MUL_BD]]
 // FULL-NEXT:    [[MUL_I:%.*]] = fadd float [[MUL_AD]], [[MUL_BC]]
 // FULL-NEXT:    [[ISNAN_CMP:%.*]] = fcmp uno float [[MUL_R]], [[MUL_R]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2:![0-9]+]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1:![0-9]+]]
 // FULL:       complex_mul_imag_nan:
 // FULL-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp uno float [[MUL_I]], [[MUL_I]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL:       complex_mul_libcall:
 // FULL-NEXT:    [[CALL:%.*]] = call <2 x float> @__mulsc3(float noundef [[A_SROA_0_0_VEC_EXTRACT]], float noundef [[A_SROA_0_4_VEC_EXTRACT]], float noundef [[B_SROA_0_0_VEC_EXTRACT]], float noundef [[B_SROA_0_4_VEC_EXTRACT]]) #[[ATTR2]]
 // FULL-NEXT:    [[COERCE_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[CALL]], i32 0
@@ -1025,10 +1025,10 @@ void divassignf(_Complex float *a, _Complex float b) {
 // FULL_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn float [[MUL_AC]], [[MUL_BD]]
 // FULL_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[MUL_AD]], [[MUL_BC]]
 // FULL_FAST-NEXT:    [[ISNAN_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno float [[MUL_R]], [[MUL_R]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2:![0-9]+]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1:![0-9]+]]
 // FULL_FAST:       complex_mul_imag_nan:
 // FULL_FAST-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno float [[MUL_I]], [[MUL_I]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_libcall:
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) <2 x float> @__mulsc3(float noundef nofpclass(nan inf) [[A_SROA_0_0_VEC_EXTRACT]], float noundef nofpclass(nan inf) [[A_SROA_0_4_VEC_EXTRACT]], float noundef nofpclass(nan inf) [[B_SROA_0_0_VEC_EXTRACT]], float noundef nofpclass(nan inf) [[B_SROA_0_4_VEC_EXTRACT]]) #[[ATTR2]]
 // FULL_FAST-NEXT:    [[COERCE_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[CALL]], i32 0
@@ -1088,12 +1088,12 @@ void divassignf(_Complex float *a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_SROA_2_0_EXTRACT_SHIFT:%.*]] = lshr i64 [[B_COERCE]], 32
 // X86WINPRMTD_STRICT-NEXT:    [[B_SROA_2_0_EXTRACT_TRUNC:%.*]] = trunc i64 [[B_SROA_2_0_EXTRACT_SHIFT]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = bitcast i32 [[B_SROA_2_0_EXTRACT_TRUNC]] to float
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[TMP0]], float [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[TMP1]], float [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[TMP0]], float [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[TMP1]], float [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[MUL_AC]], float [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[MUL_AD]], float [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul float [[TMP0]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul float [[TMP1]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul float [[TMP0]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul float [[TMP1]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub float [[MUL_AC]], [[MUL_BD]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd float [[MUL_AD]], [[MUL_BC]]
 // X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = bitcast float [[MUL_R]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = bitcast float [[MUL_I]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[RETVAL_SROA_2_0_INSERT_EXT:%.*]] = zext i32 [[TMP5]] to i64
@@ -1112,12 +1112,12 @@ void divassignf(_Complex float *a, _Complex float b) {
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 1
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[B_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[B_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A_SROA_0_0_VEC_EXTRACT]], float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A_SROA_0_4_VEC_EXTRACT]], float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A_SROA_0_0_VEC_EXTRACT]], float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[A_SROA_0_4_VEC_EXTRACT]], float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[MUL_AC]], float [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[MUL_AD]], float [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul float [[A_SROA_0_0_VEC_EXTRACT]], [[B_SROA_0_0_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul float [[A_SROA_0_4_VEC_EXTRACT]], [[B_SROA_0_4_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul float [[A_SROA_0_0_VEC_EXTRACT]], [[B_SROA_0_4_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul float [[A_SROA_0_4_VEC_EXTRACT]], [[B_SROA_0_0_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub float [[MUL_AC]], [[MUL_BD]]
+// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd float [[MUL_AD]], [[MUL_BC]]
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[MUL_R]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[MUL_I]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -1142,10 +1142,10 @@ _Complex float mulf(_Complex float a, _Complex float b) {
 // FULL-NEXT:    [[MUL_R:%.*]] = fsub float [[MUL_AC]], [[MUL_BD]]
 // FULL-NEXT:    [[MUL_I:%.*]] = fadd float [[MUL_AD]], [[MUL_BC]]
 // FULL-NEXT:    [[ISNAN_CMP:%.*]] = fcmp uno float [[MUL_R]], [[MUL_R]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL:       complex_mul_imag_nan:
 // FULL-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp uno float [[MUL_I]], [[MUL_I]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL:       complex_mul_libcall:
 // FULL-NEXT:    [[CALL:%.*]] = call <2 x float> @__mulsc3(float noundef [[DOTREAL]], float noundef [[DOTIMAG]], float noundef [[B_SROA_0_0_VEC_EXTRACT]], float noundef [[B_SROA_0_4_VEC_EXTRACT]]) #[[ATTR2]]
 // FULL-NEXT:    [[COERCE_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[CALL]], i32 0
@@ -1322,10 +1322,10 @@ _Complex float mulf(_Complex float a, _Complex float b) {
 // FULL_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn float [[MUL_AC]], [[MUL_BD]]
 // FULL_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[MUL_AD]], [[MUL_BC]]
 // FULL_FAST-NEXT:    [[ISNAN_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno float [[MUL_R]], [[MUL_R]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_imag_nan:
 // FULL_FAST-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno float [[MUL_I]], [[MUL_I]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_libcall:
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) <2 x float> @__mulsc3(float noundef nofpclass(nan inf) [[DOTREAL]], float noundef nofpclass(nan inf) [[DOTIMAG]], float noundef nofpclass(nan inf) [[B_SROA_0_0_VEC_EXTRACT]], float noundef nofpclass(nan inf) [[B_SROA_0_4_VEC_EXTRACT]]) #[[ATTR2]]
 // FULL_FAST-NEXT:    [[COERCE_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[CALL]], i32 0
@@ -1394,12 +1394,12 @@ _Complex float mulf(_Complex float a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTREAL]], float [[TMP0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTIMAG]], float [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTREAL]], float [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTIMAG]], float [[TMP0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[MUL_AC]], float [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[MUL_AD]], float [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul float [[DOTREAL]], [[TMP0]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul float [[DOTIMAG]], [[TMP1]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul float [[DOTREAL]], [[TMP1]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul float [[DOTIMAG]], [[TMP0]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub float [[MUL_AC]], [[MUL_BD]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd float [[MUL_AD]], [[MUL_BC]]
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    store float [[MUL_R]], ptr [[DOTREALP1]], align 4
@@ -1415,12 +1415,12 @@ _Complex float mulf(_Complex float a, _Complex float b) {
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTREAL]], float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTIMAG]], float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTREAL]], float [[B_SROA_0_4_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call float @llvm.experimental.constrained.fmul.f32(float [[DOTIMAG]], float [[B_SROA_0_0_VEC_EXTRACT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call float @llvm.experimental.constrained.fsub.f32(float [[MUL_AC]], float [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[MUL_AD]], float [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul float [[DOTREAL]], [[B_SROA_0_0_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul float [[DOTIMAG]], [[B_SROA_0_4_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul float [[DOTREAL]], [[B_SROA_0_4_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul float [[DOTIMAG]], [[B_SROA_0_0_VEC_EXTRACT]]
+// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub float [[MUL_AC]], [[MUL_BD]]
+// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd float [[MUL_AD]], [[MUL_BC]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store float [[MUL_R]], ptr [[DOTREALP1]], align 4
@@ -1754,31 +1754,31 @@ void mulassignf(_Complex float *a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR2:[0-9]+]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt double [[TMP0]], [[TMP1]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_IMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP2]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_REAL]], double [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A_REAL]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP6]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A_IMAG]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP9]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv double [[B_IMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[TMP2]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd double [[B_REAL]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul double [[A_IMAG]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd double [[A_REAL]], [[TMP5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv double [[TMP6]], [[TMP4]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul double [[A_REAL]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub double [[A_IMAG]], [[TMP8]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv double [[TMP9]], [[TMP4]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_REAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP11]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_IMAG]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP14]], double [[A_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP15]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP17]], double [[A_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP18]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv double [[B_REAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul double [[TMP11]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd double [[B_IMAG]], [[TMP12]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul double [[A_REAL]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd double [[TMP14]], [[A_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv double [[TMP15]], [[TMP13]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul double [[A_IMAG]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub double [[TMP17]], [[A_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv double [[TMP18]], [[TMP13]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // X86WINPRMTD_STRICT:       complex_div:
 // X86WINPRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi double [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
@@ -1798,25 +1798,25 @@ void mulassignf(_Complex float *a, _Complex float b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local { double, double } @divd(
-// PRMTD_STRICT-SAME: double noundef [[A_COERCE0:%.*]], double noundef [[A_COERCE1:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR2:[0-9]+]] {
+// PRMTD_STRICT-SAME: double noundef [[A_COERCE0:%.*]], double noundef [[A_COERCE1:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR1:[0-9]+]] {
 // PRMTD_STRICT-NEXT:  entry:
-// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[A_COERCE0]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[A_COERCE1]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[B_COERCE0]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[B_COERCE1]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT]], x86_fp80 [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT1]], x86_fp80 [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT2]], x86_fp80 [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT3]], x86_fp80 [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP3]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT1]], x86_fp80 [[EXT2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT]], x86_fp80 [[EXT3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[TMP6]], x86_fp80 [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP2]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP8]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = call double @llvm.experimental.constrained.fptrunc.f64.f80(x86_fp80 [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION4:%.*]] = call double @llvm.experimental.constrained.fptrunc.f64.f80(x86_fp80 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext double [[A_COERCE0]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext double [[A_COERCE1]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[EXT2:%.*]] = fpext double [[B_COERCE0]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[EXT3:%.*]] = fpext double [[B_COERCE1]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fmul x86_fp80 [[EXT]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fmul x86_fp80 [[EXT1]], [[EXT3]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fadd x86_fp80 [[TMP0]], [[TMP1]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul x86_fp80 [[EXT2]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fmul x86_fp80 [[EXT3]], [[EXT3]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fadd x86_fp80 [[TMP3]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fmul x86_fp80 [[EXT1]], [[EXT2]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul x86_fp80 [[EXT]], [[EXT3]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fsub x86_fp80 [[TMP6]], [[TMP7]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fdiv x86_fp80 [[TMP2]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv x86_fp80 [[TMP8]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = fptrunc x86_fp80 [[TMP9]] to double
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc x86_fp80 [[TMP10]] to double
 // PRMTD_STRICT-NEXT:    [[DOTFCA_0_INSERT:%.*]] = insertvalue { double, double } poison, double [[UNPROMOTION]], 0
 // PRMTD_STRICT-NEXT:    [[DOTFCA_1_INSERT:%.*]] = insertvalue { double, double } [[DOTFCA_0_INSERT]], double [[UNPROMOTION4]], 1
 // PRMTD_STRICT-NEXT:    ret { double, double } [[DOTFCA_1_INSERT]]
@@ -2190,31 +2190,31 @@ _Complex double divd(_Complex double a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt double [[TMP0]], [[TMP1]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_IMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP2]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_REAL]], double [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[DOTREAL]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP6]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[DOTIMAG]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP9]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv double [[B_IMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[TMP2]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd double [[B_REAL]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul double [[DOTIMAG]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd double [[DOTREAL]], [[TMP5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv double [[TMP6]], [[TMP4]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul double [[DOTREAL]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub double [[DOTIMAG]], [[TMP8]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv double [[TMP9]], [[TMP4]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_REAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP11]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_IMAG]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP14]], double [[DOTIMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP15]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP17]], double [[DOTREAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP18]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv double [[B_REAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul double [[TMP11]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd double [[B_IMAG]], [[TMP12]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul double [[DOTREAL]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd double [[TMP14]], [[DOTIMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv double [[TMP15]], [[TMP13]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul double [[DOTIMAG]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub double [[TMP17]], [[DOTREAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv double [[TMP18]], [[TMP13]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // X86WINPRMTD_STRICT:       complex_div:
 // X86WINPRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi double [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
@@ -2226,29 +2226,29 @@ _Complex double divd(_Complex double a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @divassignd(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
-// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[B_COERCE0]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[B_COERCE1]], metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext double [[B_COERCE0]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[EXT1:%.*]] = fpext double [[B_COERCE1]] to x86_fp80
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[DOTREAL]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f64(double [[DOTIMAG]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[CONV]], x86_fp80 [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[CONV2]], x86_fp80 [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT]], x86_fp80 [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[EXT1]], x86_fp80 [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP3]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[CONV2]], x86_fp80 [[EXT]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[CONV]], x86_fp80 [[EXT1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[TMP6]], x86_fp80 [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP2]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP8]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV3:%.*]] = call double @llvm.experimental.constrained.fptrunc.f64.f80(x86_fp80 [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV4:%.*]] = call double @llvm.experimental.constrained.fptrunc.f64.f80(x86_fp80 [[TMP10]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = fpext double [[DOTREAL]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[CONV2:%.*]] = fpext double [[DOTIMAG]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = fmul x86_fp80 [[CONV]], [[EXT]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = fmul x86_fp80 [[CONV2]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fadd x86_fp80 [[TMP0]], [[TMP1]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul x86_fp80 [[EXT]], [[EXT]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fmul x86_fp80 [[EXT1]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fadd x86_fp80 [[TMP3]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fmul x86_fp80 [[CONV2]], [[EXT]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul x86_fp80 [[CONV]], [[EXT1]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fsub x86_fp80 [[TMP6]], [[TMP7]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fdiv x86_fp80 [[TMP2]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv x86_fp80 [[TMP8]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[CONV3:%.*]] = fptrunc x86_fp80 [[TMP9]] to double
+// PRMTD_STRICT-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[TMP10]] to double
 // PRMTD_STRICT-NEXT:    [[DOTREALP5:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP6:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store double [[CONV3]], ptr [[DOTREALP5]], align 8
@@ -2269,10 +2269,10 @@ void divassignd(_Complex double *a, _Complex double b) {
 // FULL-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
 // FULL-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // FULL-NEXT:    [[ISNAN_CMP:%.*]] = fcmp uno double [[MUL_R]], [[MUL_R]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL:       complex_mul_imag_nan:
 // FULL-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp uno double [[MUL_I]], [[MUL_I]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL:       complex_mul_libcall:
 // FULL-NEXT:    [[CALL:%.*]] = call { double, double } @__muldc3(double noundef [[A_COERCE0]], double noundef [[A_COERCE1]], double noundef [[B_COERCE0]], double noundef [[B_COERCE1]]) #[[ATTR2]]
 // FULL-NEXT:    [[TMP0:%.*]] = extractvalue { double, double } [[CALL]], 0
@@ -2414,10 +2414,10 @@ void divassignd(_Complex double *a, _Complex double b) {
 // FULL_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn double [[MUL_AC]], [[MUL_BD]]
 // FULL_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn double [[MUL_AD]], [[MUL_BC]]
 // FULL_FAST-NEXT:    [[ISNAN_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno double [[MUL_R]], [[MUL_R]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_imag_nan:
 // FULL_FAST-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno double [[MUL_I]], [[MUL_I]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_libcall:
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) { double, double } @__muldc3(double noundef nofpclass(nan inf) [[A_COERCE0]], double noundef nofpclass(nan inf) [[A_COERCE1]], double noundef nofpclass(nan inf) [[B_COERCE0]], double noundef nofpclass(nan inf) [[B_COERCE1]]) #[[ATTR2]]
 // FULL_FAST-NEXT:    [[TMP0:%.*]] = extractvalue { double, double } [[CALL]], 0
@@ -2467,12 +2467,12 @@ void divassignd(_Complex double *a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[MUL_AC]], double [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[MUL_AD]], double [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul double [[A_REAL]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul double [[A_IMAG]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul double [[A_REAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul double [[A_IMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // X86WINPRMTD_STRICT-NEXT:    [[AGG_RESULT_REALP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[AGG_RESULT]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[AGG_RESULT_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[AGG_RESULT]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    store double [[MUL_R]], ptr [[AGG_RESULT_REALP]], align 8
@@ -2488,14 +2488,14 @@ void divassignd(_Complex double *a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local { double, double } @muld(
-// PRMTD_STRICT-SAME: double noundef [[A_COERCE0:%.*]], double noundef [[A_COERCE1:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: double noundef [[A_COERCE0:%.*]], double noundef [[A_COERCE1:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
-// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_COERCE0]], double [[B_COERCE0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_COERCE1]], double [[B_COERCE1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_COERCE0]], double [[B_COERCE1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_COERCE1]], double [[B_COERCE0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[MUL_AC]], double [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[MUL_AD]], double [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul double [[A_COERCE0]], [[B_COERCE0]]
+// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul double [[A_COERCE1]], [[B_COERCE1]]
+// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul double [[A_COERCE0]], [[B_COERCE1]]
+// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul double [[A_COERCE1]], [[B_COERCE0]]
+// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
+// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // PRMTD_STRICT-NEXT:    [[DOTFCA_0_INSERT:%.*]] = insertvalue { double, double } poison, double [[MUL_R]], 0
 // PRMTD_STRICT-NEXT:    [[DOTFCA_1_INSERT:%.*]] = insertvalue { double, double } [[DOTFCA_0_INSERT]], double [[MUL_I]], 1
 // PRMTD_STRICT-NEXT:    ret { double, double } [[DOTFCA_1_INSERT]]
@@ -2518,10 +2518,10 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // FULL-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
 // FULL-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // FULL-NEXT:    [[ISNAN_CMP:%.*]] = fcmp uno double [[MUL_R]], [[MUL_R]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL:       complex_mul_imag_nan:
 // FULL-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp uno double [[MUL_I]], [[MUL_I]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL:       complex_mul_libcall:
 // FULL-NEXT:    [[CALL:%.*]] = call { double, double } @__muldc3(double noundef [[DOTREAL]], double noundef [[DOTIMAG]], double noundef [[B_COERCE0]], double noundef [[B_COERCE1]]) #[[ATTR2]]
 // FULL-NEXT:    [[TMP0:%.*]] = extractvalue { double, double } [[CALL]], 0
@@ -2687,10 +2687,10 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // FULL_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn double [[MUL_AC]], [[MUL_BD]]
 // FULL_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn double [[MUL_AD]], [[MUL_BC]]
 // FULL_FAST-NEXT:    [[ISNAN_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno double [[MUL_R]], [[MUL_R]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_imag_nan:
 // FULL_FAST-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno double [[MUL_I]], [[MUL_I]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_libcall:
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) { double, double } @__muldc3(double noundef nofpclass(nan inf) [[DOTREAL]], double noundef nofpclass(nan inf) [[DOTIMAG]], double noundef nofpclass(nan inf) [[B_COERCE0]], double noundef nofpclass(nan inf) [[B_COERCE1]]) #[[ATTR2]]
 // FULL_FAST-NEXT:    [[TMP0:%.*]] = extractvalue { double, double } [[CALL]], 0
@@ -2754,12 +2754,12 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[MUL_AC]], double [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[MUL_AD]], double [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul double [[DOTREAL]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul double [[DOTIMAG]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul double [[DOTREAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul double [[DOTIMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    store double [[MUL_R]], ptr [[DOTREALP1]], align 8
@@ -2767,18 +2767,18 @@ _Complex double muld(_Complex double a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @mulassignd(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], double noundef [[B_COERCE0:%.*]], double noundef [[B_COERCE1:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[DOTREALP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[B_COERCE0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[B_COERCE1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[B_COERCE1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[B_COERCE0]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[MUL_AC]], double [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[MUL_AD]], double [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul double [[DOTREAL]], [[B_COERCE0]]
+// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul double [[DOTIMAG]], [[B_COERCE1]]
+// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul double [[DOTREAL]], [[B_COERCE1]]
+// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul double [[DOTIMAG]], [[B_COERCE0]]
+// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
+// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store double [[MUL_R]], ptr [[DOTREALP1]], align 8
@@ -3200,31 +3200,31 @@ void mulassignd(_Complex double *a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt double [[TMP0]], [[TMP1]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_IMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP2]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_REAL]], double [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[A_REAL]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP6]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[A_IMAG]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP9]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv double [[B_IMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[TMP2]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd double [[B_REAL]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul double [[A_IMAG]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd double [[A_REAL]], [[TMP5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv double [[TMP6]], [[TMP4]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul double [[A_REAL]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub double [[A_IMAG]], [[TMP8]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv double [[TMP9]], [[TMP4]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_REAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP11]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_IMAG]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP14]], double [[A_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP15]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP17]], double [[A_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP18]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv double [[B_REAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul double [[TMP11]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd double [[B_IMAG]], [[TMP12]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul double [[A_REAL]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd double [[TMP14]], [[A_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv double [[TMP15]], [[TMP13]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul double [[A_IMAG]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub double [[TMP17]], [[A_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv double [[TMP18]], [[TMP13]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // X86WINPRMTD_STRICT:       complex_div:
 // X86WINPRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi double [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
@@ -3244,7 +3244,7 @@ void mulassignd(_Complex double *a, _Complex double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local { x86_fp80, x86_fp80 } @divld(
-// PRMTD_STRICT-SAME: ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[A_REALP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[A_REAL:%.*]] = load x86_fp80, ptr [[A_REALP]], align 16
@@ -3254,31 +3254,31 @@ void mulassignd(_Complex double *a, _Complex double b) {
 // PRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load x86_fp80, ptr [[B_REALP]], align 16
 // PRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[B]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load x86_fp80, ptr [[B_IMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]]) #[[ATTR3:[0-9]+]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]]) #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt x86_fp80 [[TMP0]], [[TMP1]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[TMP2]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[B_REAL]], x86_fp80 [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_IMAG]], x86_fp80 [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[A_REAL]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP6]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_REAL]], x86_fp80 [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[A_IMAG]], x86_fp80 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP9]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv x86_fp80 [[B_IMAG]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul x86_fp80 [[TMP2]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd x86_fp80 [[B_REAL]], [[TMP3]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul x86_fp80 [[A_IMAG]], [[TMP2]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd x86_fp80 [[A_REAL]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv x86_fp80 [[TMP6]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul x86_fp80 [[A_REAL]], [[TMP2]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub x86_fp80 [[A_IMAG]], [[TMP8]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv x86_fp80 [[TMP9]], [[TMP4]]
 // PRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // PRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// PRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[B_REAL]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[TMP11]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_REAL]], x86_fp80 [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP14]], x86_fp80 [[A_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP15]], x86_fp80 [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_IMAG]], x86_fp80 [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[TMP17]], x86_fp80 [[A_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP18]], x86_fp80 [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv x86_fp80 [[B_REAL]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul x86_fp80 [[TMP11]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd x86_fp80 [[B_IMAG]], [[TMP12]]
+// PRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul x86_fp80 [[A_REAL]], [[TMP11]]
+// PRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd x86_fp80 [[TMP14]], [[A_IMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv x86_fp80 [[TMP15]], [[TMP13]]
+// PRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul x86_fp80 [[A_IMAG]], [[TMP11]]
+// PRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub x86_fp80 [[TMP17]], [[A_REAL]]
+// PRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv x86_fp80 [[TMP18]], [[TMP13]]
 // PRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // PRMTD_STRICT:       complex_div:
 // PRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi x86_fp80 [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
@@ -3712,31 +3712,31 @@ _Complex long double divld(_Complex long double a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call double @llvm.fabs.f64(double [[B_REAL]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[B_IMAG]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt double [[TMP0]], [[TMP1]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_IMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP2]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_REAL]], double [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[DOTREAL]], double [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP6]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[DOTIMAG]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP9]], double [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv double [[B_IMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul double [[TMP2]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd double [[B_REAL]], [[TMP3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul double [[DOTIMAG]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd double [[DOTREAL]], [[TMP5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv double [[TMP6]], [[TMP4]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul double [[DOTREAL]], [[TMP2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub double [[DOTIMAG]], [[TMP8]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv double [[TMP9]], [[TMP4]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[B_REAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP11]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_IMAG]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP14]], double [[DOTIMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP15]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP17]], double [[DOTREAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP18]], double [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv double [[B_REAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul double [[TMP11]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd double [[B_IMAG]], [[TMP12]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul double [[DOTREAL]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd double [[TMP14]], [[DOTIMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv double [[TMP15]], [[TMP13]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul double [[DOTIMAG]], [[TMP11]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub double [[TMP17]], [[DOTREAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv double [[TMP18]], [[TMP13]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // X86WINPRMTD_STRICT:       complex_div:
 // X86WINPRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi double [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
@@ -3748,7 +3748,7 @@ _Complex long double divld(_Complex long double a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @divassignld(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[B]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load x86_fp80, ptr [[B_REALP]], align 16
@@ -3758,31 +3758,31 @@ _Complex long double divld(_Complex long double a, _Complex long double b) {
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load x86_fp80, ptr [[DOTREALP]], align 16
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load x86_fp80, ptr [[DOTIMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_REAL]]) #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[B_IMAG]]) #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt x86_fp80 [[TMP0]], [[TMP1]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[TMP2]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[B_REAL]], x86_fp80 [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTIMAG]], x86_fp80 [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[DOTREAL]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP6]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTREAL]], x86_fp80 [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[DOTIMAG]], x86_fp80 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP9]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv x86_fp80 [[B_IMAG]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul x86_fp80 [[TMP2]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd x86_fp80 [[B_REAL]], [[TMP3]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul x86_fp80 [[DOTIMAG]], [[TMP2]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd x86_fp80 [[DOTREAL]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv x86_fp80 [[TMP6]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul x86_fp80 [[DOTREAL]], [[TMP2]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub x86_fp80 [[DOTIMAG]], [[TMP8]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv x86_fp80 [[TMP9]], [[TMP4]]
 // PRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // PRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// PRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[B_REAL]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[TMP11]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTREAL]], x86_fp80 [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP14]], x86_fp80 [[DOTIMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP15]], x86_fp80 [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTIMAG]], x86_fp80 [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[TMP17]], x86_fp80 [[DOTREAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP18]], x86_fp80 [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv x86_fp80 [[B_REAL]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul x86_fp80 [[TMP11]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd x86_fp80 [[B_IMAG]], [[TMP12]]
+// PRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul x86_fp80 [[DOTREAL]], [[TMP11]]
+// PRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd x86_fp80 [[TMP14]], [[DOTIMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv x86_fp80 [[TMP15]], [[TMP13]]
+// PRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul x86_fp80 [[DOTIMAG]], [[TMP11]]
+// PRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub x86_fp80 [[TMP17]], [[DOTREAL]]
+// PRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv x86_fp80 [[TMP18]], [[TMP13]]
 // PRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // PRMTD_STRICT:       complex_div:
 // PRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi x86_fp80 [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
@@ -3815,10 +3815,10 @@ void divassignld(_Complex long double *a, _Complex long double b) {
 // FULL-NEXT:    [[MUL_R:%.*]] = fsub x86_fp80 [[MUL_AC]], [[MUL_BD]]
 // FULL-NEXT:    [[MUL_I:%.*]] = fadd x86_fp80 [[MUL_AD]], [[MUL_BC]]
 // FULL-NEXT:    [[ISNAN_CMP:%.*]] = fcmp uno x86_fp80 [[MUL_R]], [[MUL_R]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL:       complex_mul_imag_nan:
 // FULL-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp uno x86_fp80 [[MUL_I]], [[MUL_I]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL:       complex_mul_libcall:
 // FULL-NEXT:    [[CALL:%.*]] = call { x86_fp80, x86_fp80 } @__mulxc3(x86_fp80 noundef [[A_REAL]], x86_fp80 noundef [[A_IMAG]], x86_fp80 noundef [[B_REAL]], x86_fp80 noundef [[B_IMAG]]) #[[ATTR2]]
 // FULL-NEXT:    [[TMP0:%.*]] = extractvalue { x86_fp80, x86_fp80 } [[CALL]], 0
@@ -4000,10 +4000,10 @@ void divassignld(_Complex long double *a, _Complex long double b) {
 // FULL_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn x86_fp80 [[MUL_AC]], [[MUL_BD]]
 // FULL_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn x86_fp80 [[MUL_AD]], [[MUL_BC]]
 // FULL_FAST-NEXT:    [[ISNAN_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno x86_fp80 [[MUL_R]], [[MUL_R]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_imag_nan:
 // FULL_FAST-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno x86_fp80 [[MUL_I]], [[MUL_I]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_libcall:
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) { x86_fp80, x86_fp80 } @__mulxc3(x86_fp80 noundef nofpclass(nan inf) [[A_REAL]], x86_fp80 noundef nofpclass(nan inf) [[A_IMAG]], x86_fp80 noundef nofpclass(nan inf) [[B_REAL]], x86_fp80 noundef nofpclass(nan inf) [[B_IMAG]]) #[[ATTR2]]
 // FULL_FAST-NEXT:    [[TMP0:%.*]] = extractvalue { x86_fp80, x86_fp80 } [[CALL]], 0
@@ -4069,12 +4069,12 @@ void divassignld(_Complex long double *a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_REAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[A_IMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[MUL_AC]], double [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[MUL_AD]], double [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul double [[A_REAL]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul double [[A_IMAG]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul double [[A_REAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul double [[A_IMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // X86WINPRMTD_STRICT-NEXT:    [[AGG_RESULT_REALP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[AGG_RESULT]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[AGG_RESULT_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[AGG_RESULT]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    store double [[MUL_R]], ptr [[AGG_RESULT_REALP]], align 8
@@ -4090,7 +4090,7 @@ void divassignld(_Complex long double *a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local { x86_fp80, x86_fp80 } @mulld(
-// PRMTD_STRICT-SAME: ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[A_REALP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[A_REAL:%.*]] = load x86_fp80, ptr [[A_REALP]], align 16
@@ -4100,12 +4100,12 @@ void divassignld(_Complex long double *a, _Complex long double b) {
 // PRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load x86_fp80, ptr [[B_REALP]], align 16
 // PRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[B]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load x86_fp80, ptr [[B_IMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_REAL]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_IMAG]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_REAL]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[A_IMAG]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[MUL_AC]], x86_fp80 [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[MUL_AD]], x86_fp80 [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul x86_fp80 [[A_REAL]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul x86_fp80 [[A_IMAG]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul x86_fp80 [[A_REAL]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul x86_fp80 [[A_IMAG]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub x86_fp80 [[MUL_AC]], [[MUL_BD]]
+// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd x86_fp80 [[MUL_AD]], [[MUL_BC]]
 // PRMTD_STRICT-NEXT:    [[DOTFCA_0_INSERT:%.*]] = insertvalue { x86_fp80, x86_fp80 } poison, x86_fp80 [[MUL_R]], 0
 // PRMTD_STRICT-NEXT:    [[DOTFCA_1_INSERT:%.*]] = insertvalue { x86_fp80, x86_fp80 } [[DOTFCA_0_INSERT]], x86_fp80 [[MUL_I]], 1
 // PRMTD_STRICT-NEXT:    ret { x86_fp80, x86_fp80 } [[DOTFCA_1_INSERT]]
@@ -4132,10 +4132,10 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // FULL-NEXT:    [[MUL_R:%.*]] = fsub x86_fp80 [[MUL_AC]], [[MUL_BD]]
 // FULL-NEXT:    [[MUL_I:%.*]] = fadd x86_fp80 [[MUL_AD]], [[MUL_BC]]
 // FULL-NEXT:    [[ISNAN_CMP:%.*]] = fcmp uno x86_fp80 [[MUL_R]], [[MUL_R]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL:       complex_mul_imag_nan:
 // FULL-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp uno x86_fp80 [[MUL_I]], [[MUL_I]]
-// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL:       complex_mul_libcall:
 // FULL-NEXT:    [[CALL:%.*]] = call { x86_fp80, x86_fp80 } @__mulxc3(x86_fp80 noundef [[DOTREAL]], x86_fp80 noundef [[DOTIMAG]], x86_fp80 noundef [[B_REAL]], x86_fp80 noundef [[B_IMAG]]) #[[ATTR2]]
 // FULL-NEXT:    [[TMP0:%.*]] = extractvalue { x86_fp80, x86_fp80 } [[CALL]], 0
@@ -4321,10 +4321,10 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // FULL_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn x86_fp80 [[MUL_AC]], [[MUL_BD]]
 // FULL_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn x86_fp80 [[MUL_AD]], [[MUL_BC]]
 // FULL_FAST-NEXT:    [[ISNAN_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno x86_fp80 [[MUL_R]], [[MUL_R]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP]], label [[COMPLEX_MUL_IMAG_NAN:%.*]], label [[COMPLEX_MUL_CONT:%.*]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_imag_nan:
 // FULL_FAST-NEXT:    [[ISNAN_CMP1:%.*]] = fcmp reassoc nnan ninf nsz arcp afn uno x86_fp80 [[MUL_I]], [[MUL_I]]
-// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF2]]
+// FULL_FAST-NEXT:    br i1 [[ISNAN_CMP1]], label [[COMPLEX_MUL_LIBCALL:%.*]], label [[COMPLEX_MUL_CONT]], !prof [[PROF1]]
 // FULL_FAST:       complex_mul_libcall:
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) { x86_fp80, x86_fp80 } @__mulxc3(x86_fp80 noundef nofpclass(nan inf) [[DOTREAL]], x86_fp80 noundef nofpclass(nan inf) [[DOTIMAG]], x86_fp80 noundef nofpclass(nan inf) [[B_REAL]], x86_fp80 noundef nofpclass(nan inf) [[B_IMAG]]) #[[ATTR2]]
 // FULL_FAST-NEXT:    [[TMP0:%.*]] = extractvalue { x86_fp80, x86_fp80 } [[CALL]], 0
@@ -4396,12 +4396,12 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load double, ptr [[DOTREALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load double, ptr [[DOTIMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTREAL]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[DOTIMAG]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[MUL_AC]], double [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[MUL_AD]], double [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul double [[DOTREAL]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul double [[DOTIMAG]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul double [[DOTREAL]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul double [[DOTIMAG]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub double [[MUL_AC]], [[MUL_BD]]
+// X86WINPRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd double [[MUL_AD]], [[MUL_BC]]
 // X86WINPRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 0
 // X86WINPRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    store double [[MUL_R]], ptr [[DOTREALP1]], align 8
@@ -4409,7 +4409,7 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    ret void
 //
 // PRMTD_STRICT-LABEL: define dso_local void @mulassignld(
-// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR2]] {
+// PRMTD_STRICT-SAME: ptr noundef [[A:%.*]], ptr noundef byval({ x86_fp80, x86_fp80 }) align 16 [[B:%.*]]) #[[ATTR1]] {
 // PRMTD_STRICT-NEXT:  entry:
 // PRMTD_STRICT-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[B]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load x86_fp80, ptr [[B_REALP]], align 16
@@ -4419,12 +4419,12 @@ _Complex long double mulld(_Complex long double a, _Complex long double b) {
 // PRMTD_STRICT-NEXT:    [[DOTREAL:%.*]] = load x86_fp80, ptr [[DOTREALP]], align 16
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    [[DOTIMAG:%.*]] = load x86_fp80, ptr [[DOTIMAGP]], align 16
-// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTREAL]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTIMAG]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTREAL]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[DOTIMAG]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[MUL_AC]], x86_fp80 [[MUL_BD]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[MUL_AD]], x86_fp80 [[MUL_BC]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[MUL_AC:%.*]] = fmul x86_fp80 [[DOTREAL]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[MUL_BD:%.*]] = fmul x86_fp80 [[DOTIMAG]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[MUL_AD:%.*]] = fmul x86_fp80 [[DOTREAL]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[MUL_BC:%.*]] = fmul x86_fp80 [[DOTIMAG]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[MUL_R:%.*]] = fsub x86_fp80 [[MUL_AC]], [[MUL_BD]]
+// PRMTD_STRICT-NEXT:    [[MUL_I:%.*]] = fadd x86_fp80 [[MUL_AD]], [[MUL_BC]]
 // PRMTD_STRICT-NEXT:    [[DOTREALP1:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 0
 // PRMTD_STRICT-NEXT:    [[DOTIMAGP2:%.*]] = getelementptr inbounds nuw { x86_fp80, x86_fp80 }, ptr [[A]], i32 0, i32 1
 // PRMTD_STRICT-NEXT:    store x86_fp80 [[MUL_R]], ptr [[DOTREALP1]], align 16
@@ -5068,56 +5068,56 @@ void mulassignld(_Complex long double *a, _Complex long double b) {
 // X86WINPRMTD_STRICT-NEXT:    [[B_REAL:%.*]] = load double, ptr [[B_REALP]], align 8
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[B]], i32 0, i32 1
 // X86WINPRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load double, ptr [[B_IMAGP]], align 8
-// X86WINPRMTD_STRICT-NEXT:    [[CONV:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP2]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[CONV1:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP3]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.fabs.f64(double [[CONV]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.fabs.f64(double [[CONV1]]) #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP4]], double [[TMP5]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[CONV:%.*]] = fpext float [[TMP2]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[CONV1:%.*]] = fpext float [[TMP3]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call double @llvm.fabs.f64(double [[CONV]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call double @llvm.fabs.f64(double [[CONV1]]) #[[ATTR2]]
+// X86WINPRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt double [[TMP4]], [[TMP5]]
 // X86WINPRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[CONV1]], double [[CONV]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP6]], double [[CONV1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[CONV]], double [[TMP7]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[B_IMAG]], double [[TMP6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[B_REAL]], double [[TMP9]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP10]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[B_REAL]], double [[TMP6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[B_IMAG]], double [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP13]], double [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fdiv double [[CONV1]], [[CONV]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fmul double [[TMP6]], [[CONV1]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fadd double [[CONV]], [[TMP7]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fmul double [[B_IMAG]], [[TMP6]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fadd double [[B_REAL]], [[TMP9]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv double [[TMP10]], [[TMP8]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul double [[B_REAL]], [[TMP6]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fsub double [[B_IMAG]], [[TMP12]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fdiv double [[TMP13]], [[TMP8]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // X86WINPRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[CONV]], double [[CONV1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[TMP15]], double [[CONV]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[CONV1]], double [[TMP16]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[B_REAL]], double [[TMP15]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP18]], double [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP20:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP19]], double [[TMP17]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP21:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[B_IMAG]], double [[TMP15]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP22:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP21]], double [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP23:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP22]], double [[TMP17]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fdiv double [[CONV]], [[CONV1]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fmul double [[TMP15]], [[CONV]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fadd double [[CONV1]], [[TMP16]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fmul double [[B_REAL]], [[TMP15]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fadd double [[TMP18]], [[B_IMAG]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP20:%.*]] = fdiv double [[TMP19]], [[TMP17]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP21:%.*]] = fmul double [[B_IMAG]], [[TMP15]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP22:%.*]] = fsub double [[TMP21]], [[B_REAL]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP23:%.*]] = fdiv double [[TMP22]], [[TMP17]]
 // X86WINPRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // X86WINPRMTD_STRICT:       complex_div:
 // X86WINPRMTD_STRICT-NEXT:    [[TMP24:%.*]] = phi double [ [[TMP11]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP20]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
 // X86WINPRMTD_STRICT-NEXT:    [[TMP25:%.*]] = phi double [ [[TMP14]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP23]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
-// X86WINPRMTD_STRICT-NEXT:    [[CONV2:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP24]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[CONV3:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP25]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CONV2]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT4:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CONV3]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT5:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP0]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[EXT6:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[TMP1]], metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP26:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP27:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT4]], double [[EXT6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP28:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP26]], double [[TMP27]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP29:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT5]], double [[EXT5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP30:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT6]], double [[EXT6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP31:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP29]], double [[TMP30]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP32:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT4]], double [[EXT5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP33:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP34:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP32]], double [[TMP33]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP35:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP28]], double [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[TMP36:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP34]], double [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP35]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
-// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION7:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP36]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR3]]
+// X86WINPRMTD_STRICT-NEXT:    [[CONV2:%.*]] = fptrunc double [[TMP24]] to float
+// X86WINPRMTD_STRICT-NEXT:    [[CONV3:%.*]] = fptrunc double [[TMP25]] to float
+// X86WINPRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext float [[CONV2]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT4:%.*]] = fpext float [[CONV3]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT5:%.*]] = fpext float [[TMP0]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[EXT6:%.*]] = fpext float [[TMP1]] to double
+// X86WINPRMTD_STRICT-NEXT:    [[TMP26:%.*]] = fmul double [[EXT]], [[EXT5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP27:%.*]] = fmul double [[EXT4]], [[EXT6]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP28:%.*]] = fadd double [[TMP26]], [[TMP27]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP29:%.*]] = fmul double [[EXT5]], [[EXT5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP30:%.*]] = fmul double [[EXT6]], [[EXT6]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP31:%.*]] = fadd double [[TMP29]], [[TMP30]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP32:%.*]] = fmul double [[EXT4]], [[EXT5]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP33:%.*]] = fmul double [[EXT]], [[EXT6]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP34:%.*]] = fsub double [[TMP32]], [[TMP33]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP35:%.*]] = fdiv double [[TMP28]], [[TMP31]]
+// X86WINPRMTD_STRICT-NEXT:    [[TMP36:%.*]] = fdiv double [[TMP34]], [[TMP31]]
+// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = fptrunc double [[TMP35]] to float
+// X86WINPRMTD_STRICT-NEXT:    [[UNPROMOTION7:%.*]] = fptrunc double [[TMP36]] to float
 // X86WINPRMTD_STRICT-NEXT:    [[TMP37:%.*]] = bitcast float [[UNPROMOTION]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[TMP38:%.*]] = bitcast float [[UNPROMOTION7]] to i32
 // X86WINPRMTD_STRICT-NEXT:    [[RETVAL_SROA_2_0_INSERT_EXT:%.*]] = zext i32 [[TMP38]] to i64
@@ -5138,58 +5138,58 @@ void mulassignld(_Complex long double *a, _Complex long double b) {
 // PRMTD_STRICT-NEXT:    [[B_IMAG:%.*]] = load x86_fp80, ptr [[B_IMAGP]], align 16
 // PRMTD_STRICT-NEXT:    [[C_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[C_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[C_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[C_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f32(float [[C_SROA_0_0_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV1:%.*]] = call x86_fp80 @llvm.experimental.constrained.fpext.f80.f32(float [[C_SROA_0_4_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV1]]) #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f80(x86_fp80 [[TMP0]], x86_fp80 [[TMP1]], metadata !"ugt", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[CONV:%.*]] = fpext float [[C_SROA_0_0_VEC_EXTRACT]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[CONV1:%.*]] = fpext float [[C_SROA_0_4_VEC_EXTRACT]] to x86_fp80
+// PRMTD_STRICT-NEXT:    [[TMP0:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV]]) #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[TMP1:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV1]]) #[[ATTR3]]
+// PRMTD_STRICT-NEXT:    [[ABS_CMP:%.*]] = fcmp ugt x86_fp80 [[TMP0]], [[TMP1]]
 // PRMTD_STRICT-NEXT:    br i1 [[ABS_CMP]], label [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label [[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
 // PRMTD_STRICT:       abs_rhsr_greater_or_equal_abs_rhsi:
-// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[CONV1]], x86_fp80 [[CONV]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[TMP2]], x86_fp80 [[CONV1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[CONV]], x86_fp80 [[TMP3]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[B_REAL]], x86_fp80 [[TMP5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP6]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[B_REAL]], x86_fp80 [[TMP2]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[TMP8]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP9]], x86_fp80 [[TMP4]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP2:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV]]
+// PRMTD_STRICT-NEXT:    [[TMP3:%.*]] = fmul x86_fp80 [[TMP2]], [[CONV1]]
+// PRMTD_STRICT-NEXT:    [[TMP4:%.*]] = fadd x86_fp80 [[CONV]], [[TMP3]]
+// PRMTD_STRICT-NEXT:    [[TMP5:%.*]] = fmul x86_fp80 [[B_IMAG]], [[TMP2]]
+// PRMTD_STRICT-NEXT:    [[TMP6:%.*]] = fadd x86_fp80 [[B_REAL]], [[TMP5]]
+// PRMTD_STRICT-NEXT:    [[TMP7:%.*]] = fdiv x86_fp80 [[TMP6]], [[TMP4]]
+// PRMTD_STRICT-NEXT:    [[TMP8:%.*]] = fmul x86_fp80 [[B_REAL]], [[TMP2]]
+// PRMTD_STRICT-NEXT:    [[TMP9:%.*]] = fsub x86_fp80 [[B_IMAG]], [[TMP8]]
+// PRMTD_STRICT-NEXT:    [[TMP10:%.*]] = fdiv x86_fp80 [[TMP9]], [[TMP4]]
 // PRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV:%.*]]
 // PRMTD_STRICT:       abs_rhsr_less_than_abs_rhsi:
-// PRMTD_STRICT-NEXT:    [[TMP11:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[CONV]], x86_fp80 [[CONV1]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP12:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[TMP11]], x86_fp80 [[CONV]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP13:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[CONV1]], x86_fp80 [[TMP12]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP14:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[B_REAL]], x86_fp80 [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP15:%.*]] = call x86_fp80 @llvm.experimental.constrained.fadd.f80(x86_fp80 [[TMP14]], x86_fp80 [[B_IMAG]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP16:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP15]], x86_fp80 [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP17:%.*]] = call x86_fp80 @llvm.experimental.constrained.fmul.f80(x86_fp80 [[B_IMAG]], x86_fp80 [[TMP11]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP18:%.*]] = call x86_fp80 @llvm.experimental.constrained.fsub.f80(x86_fp80 [[TMP17]], x86_fp80 [[B_REAL]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP19:%.*]] = call x86_fp80 @llvm.experimental.constrained.fdiv.f80(x86_fp80 [[TMP18]], x86_fp80 [[TMP13]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[TMP11:%.*]] = fdiv x86_fp80 [[CONV]], [[CONV1]]
+// PRMTD_STRICT-NEXT:    [[TMP12:%.*]] = fmul x86_fp80 [[TMP11]], [[CONV]]
+// PRMTD_STRICT-NEXT:    [[TMP13:%.*]] = fadd x86_fp80 [[CONV1]], [[TMP12]]
+// PRMTD_STRICT-NEXT:    [[TMP14:%.*]] = fmul x86_fp80 [[B_REAL]], [[TMP11]]
+// PRMTD_STRICT-NEXT:    [[TMP15:%.*]] = fadd x86_fp80 [[TMP14]], [[B_IMAG]]
+// PRMTD_STRICT-NEXT:    [[TMP16:%.*]] = fdiv x86_fp80 [[TMP15]], [[TMP13]]
+// PRMTD_STRICT-NEXT:    [[TMP17:%.*]] = fmul x86_fp80 [[B_IMAG]], [[TMP11]]
+// PRMTD_STRICT-NEXT:    [[TMP18:%.*]] = fsub x86_fp80 [[TMP17]], [[B_REAL]]
+// PRMTD_STRICT-NEXT:    [[TMP19:%.*]] = fdiv x86_fp80 [[TMP18]], [[TMP13]]
 // PRMTD_STRICT-NEXT:    br label [[COMPLEX_DIV]]
 // PRMTD_STRICT:       complex_div:
 // PRMTD_STRICT-NEXT:    [[TMP20:%.*]] = phi x86_fp80 [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
 // PRMTD_STRICT-NEXT:    [[TMP21:%.*]] = phi x86_fp80 [ [[TMP10]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP19]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
-// PRMTD_STRICT-NEXT:    [[CONV2:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f80(x86_fp80 [[TMP20]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[CONV3:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f80(x86_fp80 [[TMP21]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CONV2]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT4:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[CONV3]], metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[TMP20]] to float
+// PRMTD_STRICT-NEXT:    [[CONV3:%.*]] = fptrunc x86_fp80 [[TMP21]] to float
+// PRMTD_STRICT-NEXT:    [[EXT:%.*]] = fpext float [[CONV2]] to double
+// PRMTD_STRICT-NEXT:    [[EXT4:%.*]] = fpext float [[CONV3]] to double
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 0
 // PRMTD_STRICT-NEXT:    [[A_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[A_COERCE]], i32 1
-// PRMTD_STRICT-NEXT:    [[EXT5:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[A_SROA_0_0_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[EXT6:%.*]] = call double @llvm.experimental.constrained.fpext.f64.f32(float [[A_SROA_0_4_VEC_EXTRACT]], metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP22:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP23:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT4]], double [[EXT6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP24:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP22]], double [[TMP23]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP25:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT5]], double [[EXT5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP26:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT6]], double [[EXT6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP27:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[TMP25]], double [[TMP26]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP28:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT4]], double [[EXT5]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP29:%.*]] = call double @llvm.experimental.constrained.fmul.f64(double [[EXT]], double [[EXT6]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP30:%.*]] = call double @llvm.experimental.constrained.fsub.f64(double [[TMP28]], double [[TMP29]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP31:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP24]], double [[TMP27]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[TMP32:%.*]] = call double @llvm.experimental.constrained.fdiv.f64(double [[TMP30]], double [[TMP27]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP31]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// PRMTD_STRICT-NEXT:    [[UNPROMOTION7:%.*]] = call float @llvm.experimental.constrained.fptrunc.f32.f64(double [[TMP32]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// PRMTD_STRICT-NEXT:    [[EXT5:%.*]] = fpext float [[A_SROA_0_0_VEC_EXTRACT]] to double
+// PRMTD_STRICT-NEXT:    [[EXT6:%.*]] = fpext float [[A_SROA_0_4_VEC_EXTRACT]] to double
+// PRMTD_STRICT-NEXT:    [[TMP22:%.*]] = fmul double [[EXT]], [[EXT5]]
+// PRMTD_STRICT-NEXT:    [[TMP23:%.*]] = fmul double [[EXT4]], [[EXT6]]
+// PRMTD_STRICT-NEXT:    [[TMP24:%.*]] = fadd double [[TMP22]], [[TMP23]]
+// PRMTD_STRICT-NEXT:    [[TMP25:%.*]] = fmul double [[EXT5]], [[EXT5]]
+// PRMTD_STRICT-NEXT:    [[TMP26:%.*]] = fmul double [[EXT6]], [[EXT6]]
+// PRMTD_STRICT-NEXT:    [[TMP27:%.*]] = fadd double [[TMP25]], [[TMP26]]
+// PRMTD_STRICT-NEXT:    [[TMP28:%.*]] = fmul double [[EXT4]], [[EXT5]]
+// PRMTD_STRICT-NEXT:    [[TMP29:%.*]] = fmul double [[EXT]], [[EXT6]]
+// PRMTD_STRICT-NEXT:    [[TMP30:%.*]] = fsub double [[TMP28]], [[TMP29]]
+// PRMTD_STRICT-NEXT:    [[TMP31:%.*]] = fdiv double [[TMP24]], [[TMP27]]
+// PRMTD_STRICT-NEXT:    [[TMP32:%.*]] = fdiv double [[TMP30]], [[TMP27]]
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION:%.*]] = fptrunc double [[TMP31]] to float
+// PRMTD_STRICT-NEXT:    [[UNPROMOTION7:%.*]] = fptrunc double [[TMP32]] to float
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x float> undef, float [[UNPROMOTION]], i32 0
 // PRMTD_STRICT-NEXT:    [[RETVAL_SROA_0_4_VEC_INSERT:%.*]] = insertelement <2 x float> [[RETVAL_SROA_0_0_VEC_INSERT]], float [[UNPROMOTION7]], i32 1
 // PRMTD_STRICT-NEXT:    ret <2 x float> [[RETVAL_SROA_0_4_VEC_INSERT]]
@@ -5198,7 +5198,7 @@ _Complex float f1(_Complex float a, _Complex long double b, _Complex float c) {
   return (_Complex float)(b / c) / a;
 }
 //.
-// FULL: [[PROF2]] = !{!"branch_weights", i32 1, i32 1048575}
+// FULL: [[PROF1]] = !{!"branch_weights", i32 1, i32 1048575}
 //.
-// FULL_FAST: [[PROF2]] = !{!"branch_weights", i32 1, i32 1048575}
+// FULL_FAST: [[PROF1]] = !{!"branch_weights", i32 1, i32 1048575}
 //.
diff --git a/clang/test/CodeGen/exprs-strictfp.c b/clang/test/CodeGen/exprs-strictfp.c
index 3fb39ddba0c0c..7a4fa40cf11db 100644
--- a/clang/test/CodeGen/exprs-strictfp.c
+++ b/clang/test/CodeGen/exprs-strictfp.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown %s -emit-llvm -o - | FileCheck %s
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown %s -ffp-exception-behavior=maytrap -emit-llvm -o - | FileCheck %s
 
@@ -8,10 +9,22 @@
 
 #pragma float_control(except, on)
 
+// CHECK-LABEL: define dso_local void @eMaisUma(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[T:%.*]] = alloca [1 x double], align 8
+// CHECK-NEXT:    [[ARRAYDECAY:%.*]] = getelementptr inbounds [1 x double], ptr [[T]], i64 0, i64 0
+// CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[ARRAYDECAY]], align 8
+// CHECK-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"une") #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    br i1 [[TOBOOL]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK:       [[IF_THEN]]:
+// CHECK-NEXT:    br label %[[IF_END]]
+// CHECK:       [[IF_END]]:
+// CHECK-NEXT:    ret void
+//
 void eMaisUma(void) {
   double t[1];
   if (*t)
     return;
-// CHECK: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double 0.000000e+00, metadata !"une", metadata !"fpexcept.strict")
 }
 
diff --git a/clang/test/CodeGen/fp-contract-fast-pragma.cpp b/clang/test/CodeGen/fp-contract-fast-pragma.cpp
index 0bb01d6e17a1d..08ddb94f7fff1 100644
--- a/clang/test/CodeGen/fp-contract-fast-pragma.cpp
+++ b/clang/test/CodeGen/fp-contract-fast-pragma.cpp
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -O3 -triple %itanium_abi_triple \
 // RUN:   -emit-llvm -o - %s \
 // RUN:   | FileCheck -check-prefixes=COMMON,CHECK %s
@@ -7,24 +8,42 @@
 // RUN:   | FileCheck -check-prefixes=COMMON,STRICT %s
 
 // Is FP_CONTRACT honored in a simple case?
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_1fff(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[MUL:%.*]] = fmul contract float [[A]], [[B]]
+// CHECK-NEXT:    [[ADD:%.*]] = fadd contract float [[MUL]], [[C]]
+// CHECK-NEXT:    ret float [[ADD]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_1fff(
+// STRICT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[MUL:%.*]] = call contract float @llvm.fmul.f32(float [[A]], float [[B]]) #[[ATTR3:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    [[ADD:%.*]] = call contract float @llvm.fadd.f32(float [[MUL]], float [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
 float fp_contract_1(float a, float b, float c) {
-// COMMON: _Z13fp_contract_1fff
-// CHECK: %[[M:.+]] = fmul contract float %a, %b
-// CHECK-NEXT: fadd contract float %[[M]], %c
-// STRICT: %[[M:.+]] = tail call contract float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// STRICT-NEXT: tail call contract float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 #pragma clang fp contract(fast)
   return a * b + c;
 }
 
 // Is FP_CONTRACT state cleared on exiting compound statements?
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_2fff(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[MUL:%.*]] = fmul float [[A]], [[B]]
+// CHECK-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[C]]
+// CHECK-NEXT:    ret float [[ADD]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_2fff(
+// STRICT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[A]], float [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[MUL]], float [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
 float fp_contract_2(float a, float b, float c) {
-  // COMMON: _Z13fp_contract_2fff
-  // CHECK: %[[M:.+]] = fmul float %a, %b
-  // CHECK-NEXT: fadd float %[[M]], %c
-  // STRICT: %[[M:.+]] = tail call float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // STRICT-NEXT: tail call float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata !"round.tonearest", metadata !"fpexcept.strict")
   {
 #pragma clang fp contract(fast)
   }
@@ -41,12 +60,21 @@ T template_muladd(T a, T b, T c) {
   return a * b + c;
 }
 
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_3fff(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[MUL_I:%.*]] = fmul contract float [[A]], [[B]]
+// CHECK-NEXT:    [[ADD_I:%.*]] = fadd contract float [[MUL_I]], [[C]]
+// CHECK-NEXT:    ret float [[ADD_I]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_3fff(
+// STRICT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[MUL_I:%.*]] = call contract float @llvm.fmul.f32(float [[A]], float [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    [[ADD_I:%.*]] = call contract noundef float @llvm.fadd.f32(float [[MUL_I]], float [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD_I]]
+//
 float fp_contract_3(float a, float b, float c) {
-  // COMMON: _Z13fp_contract_3fff
-  // CHECK: %[[M:.+]] = fmul contract float %a, %b
-  // CHECK-NEXT: fadd contract float %[[M]], %c
-  // STRICT: %[[M:.+]] = tail call contract float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // STRICT-NEXT: tail call contract noundef float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata !"round.tonearest", metadata !"fpexcept.strict")
   return template_muladd<float>(a, b, c);
 }
 
@@ -58,48 +86,95 @@ class fp_contract_4 {
   }
 };
 
+// CHECK-LABEL: define weak_odr noundef float @_ZN13fp_contract_4IiE6methodEfff(
+// CHECK-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] comdat align 2 {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[MUL:%.*]] = fmul contract float [[A]], [[B]]
+// CHECK-NEXT:    [[ADD:%.*]] = fadd contract float [[MUL]], [[C]]
+// CHECK-NEXT:    ret float [[ADD]]
+//
+// STRICT-LABEL: define weak_odr noundef float @_ZN13fp_contract_4IiE6methodEfff(
+// STRICT-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] comdat align 2 {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[MUL:%.*]] = call contract float @llvm.fmul.f32(float [[A]], float [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    [[ADD:%.*]] = call contract float @llvm.fadd.f32(float [[MUL]], float [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
 template class fp_contract_4<int>;
-// COMMON: _ZN13fp_contract_4IiE6methodEfff
-// CHECK: %[[M:.+]] = fmul contract float %a, %b
-// CHECK-NEXT: fadd contract float %[[M]], %c
-// STRICT: %[[M:.+]] = tail call contract float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// STRICT-NEXT: tail call contract float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 // Check file-scoped FP_CONTRACT
 #pragma clang fp contract(fast)
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_5fff(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[MUL:%.*]] = fmul contract float [[A]], [[B]]
+// CHECK-NEXT:    [[ADD:%.*]] = fadd contract float [[MUL]], [[C]]
+// CHECK-NEXT:    ret float [[ADD]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_5fff(
+// STRICT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[MUL:%.*]] = call contract float @llvm.fmul.f32(float [[A]], float [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    [[ADD:%.*]] = call contract float @llvm.fadd.f32(float [[MUL]], float [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
 float fp_contract_5(float a, float b, float c) {
-  // COMMON: _Z13fp_contract_5fff
-  // CHECK: %[[M:.+]] = fmul contract float %a, %b
-  // CHECK-NEXT: fadd contract float %[[M]], %c
-  // STRICT: %[[M:.+]] = tail call contract float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // STRICT-NEXT: tail call contract float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata !"round.tonearest", metadata !"fpexcept.strict")
   return a * b + c;
 }
 
 // Verify that we can handle multiple flags on the same pragma
 #pragma clang fp contract(fast) contract(off)
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_6fff(
+// CHECK-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[MUL:%.*]] = fmul float [[A]], [[B]]
+// CHECK-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[C]]
+// CHECK-NEXT:    ret float [[ADD]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_6fff(
+// STRICT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[A]], float [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[MUL]], float [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
 float fp_contract_6(float a, float b, float c) {
-  // COMMON: _Z13fp_contract_6fff
-  // CHECK: %[[M:.+]] = fmul float %a, %b
-  // CHECK-NEXT: fadd float %[[M]], %c
-  // STRICT: %[[M:.+]] = tail call float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // STRICT-NEXT: tail call float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata !"round.tonearest", metadata !"fpexcept.strict")
   return a * b + c;
 }
 
 
 #pragma clang fp contract(fast)
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_7f(
+// CHECK-SAME: float noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call contract float @llvm.sqrt.f32(float [[A]])
+// CHECK-NEXT:    ret float [[TMP0]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_7f(
+// STRICT-SAME: float noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[TMP0:%.*]] = call contract float @llvm.sqrt.f32(float [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[TMP0]]
+//
 float fp_contract_7(float a) {
-// COMMON: _Z13fp_contract_7f
-// CHECK: tail call contract float @llvm.sqrt.f32(float %a)
-// STRICT: tail call contract float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
   return __builtin_sqrtf(a);
 }
 
+// CHECK-LABEL: define dso_local noundef float @_Z13fp_contract_8f(
+// CHECK-SAME: float noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call float @llvm.sqrt.f32(float [[A]])
+// CHECK-NEXT:    ret float [[TMP0]]
+//
+// STRICT-LABEL: define dso_local noundef float @_Z13fp_contract_8f(
+// STRICT-SAME: float noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[TMP0:%.*]] = call float @llvm.sqrt.f32(float [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[TMP0]]
+//
 float fp_contract_8(float a) {
-// COMMON: _Z13fp_contract_8f
-// CHECK: tail call float @llvm.sqrt.f32(float %a)
-// STRICT: tail call float @llvm.experimental.constrained.sqrt.f32(float %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
 #pragma clang fp contract(off)
   return __builtin_sqrtf(a);
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// COMMON: {{.*}}
diff --git a/clang/test/CodeGen/fp-floatcontrol-class.cpp b/clang/test/CodeGen/fp-floatcontrol-class.cpp
index 83a27cb206eb8..1387ab6b3cf71 100644
--- a/clang/test/CodeGen/fp-floatcontrol-class.cpp
+++ b/clang/test/CodeGen/fp-floatcontrol-class.cpp
@@ -1,19 +1,34 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -ffp-contract=on -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
 // Verify that float_control does not pertain to initializer expressions
 
 float y();
 float z();
 #pragma float_control(except, on)
+// CHECK-LABEL: define linkonce_odr void @_ZN2ONC1Ev(
+// CHECK-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR1:[0-9]+]] comdat align 2 {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NEXT:    call void @_ZN2ONC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-NEXT:    ret void
+//
 class ON {
   float w = 2 + y() * z();
-  // CHECK-LABEL: define {{.*}} @_ZN2ONC2Ev{{.*}}
-  // CHECK: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
 };
 ON on;
 #pragma float_control(except, off)
+// CHECK-LABEL: define linkonce_odr void @_ZN3OFFC1Ev(
+// CHECK-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR2:[0-9]+]] comdat align 2 {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NEXT:    call void @_ZN3OFFC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-NEXT:    ret void
+//
 class OFF {
   float w = 2 + y() * z();
-  // CHECK-LABEL: define {{.*}} @_ZN3OFFC2Ev{{.*}}
-  // CHECK-NOT: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
 };
 OFF off;
diff --git a/clang/test/CodeGen/fp-floatcontrol-pragma.cpp b/clang/test/CodeGen/fp-floatcontrol-pragma.cpp
index 966eaf6053970..54e93a1e6d9c1 100644
--- a/clang/test/CodeGen/fp-floatcontrol-pragma.cpp
+++ b/clang/test/CodeGen/fp-floatcontrol-pragma.cpp
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -DEXCEPT=1 \
 // RUN: -fcxx-exceptions -triple x86_64-linux-gnu -emit-llvm -o - %s \
 // RUN: | FileCheck -check-prefix=CHECK-NS %s
@@ -42,6 +43,46 @@
 // RUN: %clang_cc1 -triple powerpc-unknown-aix -DNF128 -emit-llvm -o - %s \
 // RUN: | FileCheck %s -check-prefix=CHECK-AIX
 
+// CHECK-NS-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-NS-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    ret i1 false
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-DEFAULT-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    ret i1 false
+//
+// CHECK-FENV-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-FENV-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    ret i1 false
+//
+// CHECK-O3-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-O3-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    ret i1 false
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-SOURCE-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    ret i1 false
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-DOUBLE-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    ret i1 false
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef zeroext i1 @_Z1fv(
+// CHECK-EXTENDED-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    ret i1 false
+//
+// CHECK-AIX-LABEL: define noundef zeroext i1 @_Z1fv(
+// CHECK-AIX-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    ret i1 false
+//
 bool f() {
   // CHECK: define {{.*}}f{{.*}}
   return __FLT_EVAL_METHOD__ < 0 &&
@@ -50,19 +91,259 @@ bool f() {
 }
 
 // Verify float_control(precise, off) enables fast math flags on fp operations.
+// CHECK-NS-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-NS-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-NS-NEXT:    ret float [[ADD]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-DEFAULT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    ret float [[ADD]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-FENV-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-FENV-NEXT:    ret float [[ADD]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-O3-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = fmul fast float [[B]], [[A]]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[C]]
+// CHECK-O3-NEXT:    ret float [[ADD]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-SOURCE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-SOURCE-NEXT:    ret float [[ADD]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-DOUBLE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-DOUBLE-NEXT:    ret float [[ADD]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z12fp_precise_1fff(
+// CHECK-EXTENDED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-EXTENDED-NEXT:    ret float [[ADD]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z12fp_precise_1fff(
+// CHECK-AIX-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-AIX-NEXT:    ret float [[ADD]]
+//
 float fp_precise_1(float a, float b, float c) {
-// CHECK-O3: _Z12fp_precise_1fff
-// CHECK-O3: %[[M:.+]] = fmul fast float{{.*}}
-// CHECK-O3: fadd fast float %[[M]], %c
 #pragma float_control(precise, off)
   return a * b + c;
 }
 
 // Is float_control state cleared on exiting compound statements?
+// CHECK-NS-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-NS-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-NS-NEXT:    ret float [[ADD]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-DEFAULT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    ret float [[ADD]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-FENV-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-FENV-NEXT:    ret float [[ADD]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-O3-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = fmul float [[A]], [[B]]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[C]]
+// CHECK-O3-NEXT:    ret float [[ADD]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-SOURCE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-SOURCE-NEXT:    ret float [[ADD]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-DOUBLE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = fmul double [[CONV]], [[CONV1]]
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = fadd double [[MUL]], [[CONV2]]
+// CHECK-DOUBLE-NEXT:    [[CONV3:%.*]] = fptrunc double [[ADD]] to float
+// CHECK-DOUBLE-NEXT:    ret float [[CONV3]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z12fp_precise_2fff(
+// CHECK-EXTENDED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = fmul x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = fadd x86_fp80 [[MUL]], [[CONV2]]
+// CHECK-EXTENDED-NEXT:    [[CONV3:%.*]] = fptrunc x86_fp80 [[ADD]] to float
+// CHECK-EXTENDED-NEXT:    ret float [[CONV3]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z12fp_precise_2fff(
+// CHECK-AIX-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-AIX-NEXT:    ret float [[ADD]]
+//
 float fp_precise_2(float a, float b, float c) {
-  // CHECK-O3: _Z12fp_precise_2fff
-  // CHECK-O3: %[[M:.+]] = fmul float{{.*}}
-  // CHECK-O3: fadd float %[[M]], %c
   {
 #pragma float_control(precise, off)
   }
@@ -79,10 +360,119 @@ T template_muladd(T a, T b, T c) {
   return a * b + c;
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-NS-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-NS-NEXT:    ret float [[CALL]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-DEFAULT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-DEFAULT-NEXT:    ret float [[CALL]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-FENV-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-FENV-NEXT:    ret float [[CALL]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-O3-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL_I:%.*]] = fmul fast float [[B]], [[A]]
+// CHECK-O3-NEXT:    [[ADD_I:%.*]] = fadd fast float [[MUL_I]], [[C]]
+// CHECK-O3-NEXT:    ret float [[ADD_I]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-SOURCE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-SOURCE-NEXT:    ret float [[CALL]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-DOUBLE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-DOUBLE-NEXT:    ret float [[CALL]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z12fp_precise_3fff(
+// CHECK-EXTENDED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-EXTENDED-NEXT:    ret float [[CALL]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z12fp_precise_3fff(
+// CHECK-AIX-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CALL:%.*]] = call noundef float @_Z15template_muladdIfET_S0_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]], float noundef [[TMP2]])
+// CHECK-AIX-NEXT:    ret float [[CALL]]
+//
 float fp_precise_3(float a, float b, float c) {
-  // CHECK-O3: _Z12fp_precise_3fff
-  // CHECK-O3: %[[M:.+]] = fmul fast float{{.*}}
-  // CHECK-O3: fadd fast float %[[M]], %c
   return template_muladd<float>(a, b, c);
 }
 
@@ -94,22 +484,431 @@ class fp_precise_4 {
   }
 };
 
+// CHECK-NS-LABEL: define weak_odr noundef float @_ZN12fp_precise_4IiE6methodEfff(
+// CHECK-NS-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] comdat align 2 {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-NS-NEXT:    ret float [[ADD]]
+//
+// CHECK-FENV-LABEL: define weak_odr noundef float @_ZN12fp_precise_4IiE6methodEfff(
+// CHECK-FENV-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] comdat align 2 {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-FENV-NEXT:    ret float [[ADD]]
+//
+// CHECK-O3-LABEL: define weak_odr noundef float @_ZN12fp_precise_4IiE6methodEfff(
+// CHECK-O3-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] comdat align 2 {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = fmul fast float [[B]], [[A]]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[C]]
+// CHECK-O3-NEXT:    ret float [[ADD]]
+//
+// CHECK-CONST-ARGS-LABEL: define weak_odr noundef float @_ZN12fp_precise_4IiE6methodEfff(
+// CHECK-CONST-ARGS-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] comdat align 2 {
+// CHECK-CONST-ARGS-NEXT:  [[ENTRY:.*:]]
+// CHECK-CONST-ARGS-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-CONST-ARGS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-CONST-ARGS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-CONST-ARGS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-CONST-ARGS-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-CONST-ARGS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-CONST-ARGS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-CONST-ARGS-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-CONST-ARGS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-CONST-ARGS-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-CONST-ARGS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-CONST-ARGS-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-CONST-ARGS-NEXT:    ret float [[ADD]]
+//
+// CHECK-AIX-LABEL: define weak_odr noundef float @_ZN12fp_precise_4IiE6methodEfff(
+// CHECK-AIX-SAME: ptr noundef nonnull align 1 dereferenceable(1) [[THIS:%.*]], float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 4
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-AIX-NEXT:    ret float [[ADD]]
+//
 template class fp_precise_4<int>;
-// CHECK-O3: _ZN12fp_precise_4IiE6methodEfff
-// CHECK-O3: %[[M:.+]] = fmul fast float{{.*}}
-// CHECK-O3: fadd fast float %[[M]], %c
 
 // Check file-scoped float_control
 #pragma float_control(push)
 #pragma float_control(precise, off)
+// CHECK-NS-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-NS-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-NS-NEXT:    ret float [[ADD]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-DEFAULT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    ret float [[ADD]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-FENV-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-FENV-NEXT:    ret float [[ADD]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-O3-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = fmul fast float [[B]], [[A]]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[C]]
+// CHECK-O3-NEXT:    ret float [[ADD]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-SOURCE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-SOURCE-NEXT:    ret float [[ADD]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-DOUBLE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-DOUBLE-NEXT:    ret float [[ADD]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z12fp_precise_5fff(
+// CHECK-EXTENDED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-EXTENDED-NEXT:    ret float [[ADD]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z12fp_precise_5fff(
+// CHECK-AIX-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-AIX-NEXT:    ret float [[ADD]]
+//
 float fp_precise_5(float a, float b, float c) {
-  // CHECK-O3: _Z12fp_precise_5fff
-  // CHECK-O3: %[[M:.+]] = fmul fast float{{.*}}
-  // CHECK-O3: fadd fast float %[[M]], %c
   return a * b + c;
 }
 #pragma float_control(pop)
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-NS-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-NS-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP9]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEFAULT-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEFAULT-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-DEFAULT-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEFAULT-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    ret float [[TMP9]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-FENV-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-FENV-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-FENV-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-FENV-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    ret float [[TMP9]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float undef, float undef) #[[ATTR8:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-O3-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[X]], float [[Y]]) #[[ATTR8]] [ "fp.control"(metadata !"rte") ]
+// CHECK-O3-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[X]], float [[Y]]) #[[ATTR8]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[MUL1]], float [[MUL2]]) #[[ATTR8]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-O3-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[ADD]], float [[ADD]]) #[[ATTR8]] [ "fp.control"(metadata !"rte") ]
+// CHECK-O3-NEXT:    ret float [[MUL3]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-SOURCE-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-SOURCE-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-SOURCE-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-SOURCE-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    ret float [[TMP9]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DOUBLE-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DOUBLE-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-DOUBLE-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DOUBLE-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    ret float [[TMP9]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z3fffff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-EXTENDED-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-EXTENDED-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-EXTENDED-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-EXTENDED-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    ret float [[TMP9]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z3fffff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-AIX-NEXT:    store float [[MUL]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL1:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-AIX-NEXT:    store float [[MUL1]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP4:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP5:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP6:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL2:%.*]] = call float @llvm.fmul.f32(float [[TMP5]], float [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float [[MUL2]]) #[[ATTR4]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-AIX-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP8:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[MUL3:%.*]] = call float @llvm.fmul.f32(float [[TMP7]], float [[TMP8]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-AIX-NEXT:    store float [[MUL3]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP9:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    ret float [[TMP9]]
+//
 float fff(float x, float y) {
 // CHECK-LABEL: define{{.*}} float @_Z3fffff{{.*}}
 // CHECK: entry
@@ -132,6 +931,165 @@ float fff(float x, float y) {
   //CHECK: llvm.experimental.constrained.fmul{{.*}}
   return z;
 }
+// CHECK-NS-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-NS-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-NS-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-NS-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP7]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-DEFAULT-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-DEFAULT-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-DEFAULT-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    ret float [[TMP7]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-FENV-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-FENV-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-FENV-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    ret float [[TMP7]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    ret float poison
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-SOURCE-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-SOURCE-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-SOURCE-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    ret float [[TMP7]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-DOUBLE-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-DOUBLE-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-DOUBLE-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    ret float [[TMP7]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z13check_preciseff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-EXTENDED-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-EXTENDED-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-EXTENDED-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    ret float [[TMP7]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z13check_preciseff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-AIX-NEXT:    store float [[TMP3]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP4:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP5:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP4]], [[TMP5]]
+// CHECK-AIX-NEXT:    [[TMP6:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP6]]
+// CHECK-AIX-NEXT:    store float [[ADD]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP7:%.*]] = load float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    ret float [[TMP7]]
+//
 float check_precise(float x, float y) {
   // CHECK-LABEL: define{{.*}} float @_Z13check_preciseff{{.*}}
   float z;
@@ -149,6 +1107,146 @@ float check_precise(float x, float y) {
   return z;
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-NS-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-NS-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-NS-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP3]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-DEFAULT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-DEFAULT-NEXT:    ret float [[TMP3]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-FENV-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-FENV-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-FENV-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-FENV-NEXT:    ret float [[TMP3]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-O3-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = fmul fast float [[B]], [[A]]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[C]]
+// CHECK-O3-NEXT:    ret float [[ADD]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-SOURCE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-SOURCE-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-SOURCE-NEXT:    ret float [[TMP3]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-DOUBLE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-DOUBLE-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-DOUBLE-NEXT:    ret float [[TMP3]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z9fma_test2fff(
+// CHECK-EXTENDED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-EXTENDED-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-EXTENDED-NEXT:    ret float [[TMP3]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z9fma_test2fff(
+// CHECK-AIX-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-AIX-NEXT:    store float [[ADD]], ptr [[X]], align 4
+// CHECK-AIX-NEXT:    [[TMP3:%.*]] = load float, ptr [[X]], align 4
+// CHECK-AIX-NEXT:    ret float [[TMP3]]
+//
 float fma_test2(float a, float b, float c) {
 // CHECK-LABEL define{{.*}} float @_Z9fma_test2fff{{.*}}
 #pragma float_control(precise, off)
@@ -157,6 +1255,138 @@ float fma_test2(float a, float b, float c) {
   return x;
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-NS-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-NS-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-NS-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP4]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-DEFAULT-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-DEFAULT-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-DEFAULT-NEXT:    ret float [[TMP4]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-FENV-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-FENV-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-FENV-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-FENV-NEXT:    ret float [[TMP4]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-O3-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[TMP0:%.*]] = tail call float @llvm.fmuladd.f32(float [[A]], float [[B]], float [[C]])
+// CHECK-O3-NEXT:    ret float [[TMP0]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-SOURCE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-SOURCE-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-SOURCE-NEXT:    ret float [[TMP4]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-DOUBLE-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-DOUBLE-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-DOUBLE-NEXT:    ret float [[TMP4]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z9fma_test1fff(
+// CHECK-EXTENDED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-EXTENDED-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-EXTENDED-NEXT:    ret float [[TMP4]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z9fma_test1fff(
+// CHECK-AIX-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], float noundef [[C:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[A_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[B_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[C_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[X:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[A]], ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[B]], ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[C]], ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[A_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[B_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[C_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-AIX-NEXT:    store float [[TMP3]], ptr [[X]], align 4
+// CHECK-AIX-NEXT:    [[TMP4:%.*]] = load float, ptr [[X]], align 4
+// CHECK-AIX-NEXT:    ret float [[TMP4]]
+//
 float fma_test1(float a, float b, float c) {
 // CHECK-LABEL define{{.*}} float @_Z9fma_test1fff{{.*}}
 #pragma float_control(precise, on)
@@ -177,6 +1407,54 @@ T add(T lhs, T rhs) {
 }
 #pragma float_control(pop)
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-NS-SAME: ) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-NS-NEXT:    ret float [[CALL]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-DEFAULT-SAME: ) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-DEFAULT-NEXT:    ret float [[CALL]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-FENV-SAME: ) #[[ATTR0]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-FENV-NEXT:    ret float [[CALL]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-O3-SAME: ) local_unnamed_addr #[[ATTR1]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[CALL:%.*]] = tail call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-O3-NEXT:    ret float [[CALL]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-SOURCE-SAME: ) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-SOURCE-NEXT:    ret float [[CALL]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-DOUBLE-SAME: ) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-DOUBLE-NEXT:    ret float [[CALL]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z17test_OperatorCallv(
+// CHECK-EXTENDED-SAME: ) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-EXTENDED-NEXT:    ret float [[CALL]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z17test_OperatorCallv(
+// CHECK-AIX-SAME: ) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[CALL:%.*]] = call noundef float @_Z3addIfET_S0_S0_(float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-AIX-NEXT:    ret float [[CALL]]
+//
 float test_OperatorCall() {
   return add(1.0f, 2.0f);
   //CHECK: llvm.experimental.constrained.fadd{{.*}}fpexcept.strict
@@ -188,22 +1466,269 @@ float test_OperatorCall() {
 #endif
 // CHECK-LABEL: define {{.*}}callt{{.*}}
 
+// CHECK-NS-LABEL: define dso_local void @_Z5calltv(
+// CHECK-NS-SAME: ) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-NS-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-NS-NEXT:    ret void
+//
+// CHECK-DEFAULT-LABEL: define dso_local void @_Z5calltv(
+// CHECK-DEFAULT-SAME: ) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-DEFAULT-NEXT:    ret void
+//
+// CHECK-FENV-LABEL: define dso_local void @_Z5calltv(
+// CHECK-FENV-SAME: ) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-FENV-NEXT:    ret void
+//
+// CHECK-O3-LABEL: define dso_local void @_Z5calltv(
+// CHECK-O3-SAME: ) local_unnamed_addr #[[ATTR6:[0-9]+]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-O3-NEXT:    call void @llvm.lifetime.start.p0(ptr nonnull [[Z]])
+// CHECK-O3-NEXT:    [[Z_0_Z_0_Z_0_Z_0_:%.*]] = load volatile float, ptr [[Z]], align 4, !tbaa [[FLOAT_TBAA5:![0-9]+]]
+// CHECK-O3-NEXT:    [[Z_0_Z_0_Z_0_Z_0_1:%.*]] = load volatile float, ptr [[Z]], align 4, !tbaa [[FLOAT_TBAA5]]
+// CHECK-O3-NEXT:    [[MUL:%.*]] = fmul float [[Z_0_Z_0_Z_0_Z_0_]], [[Z_0_Z_0_Z_0_Z_0_1]]
+// CHECK-O3-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4, !tbaa [[FLOAT_TBAA5]]
+// CHECK-O3-NEXT:    call void @llvm.lifetime.end.p0(ptr nonnull [[Z]])
+// CHECK-O3-NEXT:    ret void
+//
+// CHECK-SOURCE-LABEL: define dso_local void @_Z5calltv(
+// CHECK-SOURCE-SAME: ) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-SOURCE-NEXT:    ret void
+//
+// CHECK-DOUBLE-LABEL: define dso_local void @_Z5calltv(
+// CHECK-DOUBLE-SAME: ) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-DOUBLE-NEXT:    ret void
+//
+// CHECK-EXTENDED-LABEL: define dso_local void @_Z5calltv(
+// CHECK-EXTENDED-SAME: ) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-EXTENDED-NEXT:    ret void
+//
+// CHECK-AIX-LABEL: define void @_Z5calltv(
+// CHECK-AIX-SAME: ) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[Z:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load volatile float, ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    store volatile float [[MUL]], ptr [[Z]], align 4
+// CHECK-AIX-NEXT:    ret void
+//
 void callt() {
   volatile float z;
   z = z * z;
-  //CHECK-FENV: llvm.experimental.constrained.fmul{{.*}}
 }
 
 // CHECK-LABEL: define {{.*}}myAdd{{.*}}
+// CHECK-NS-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-NS-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NS-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-NS-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-NS:       [[IF_THEN]]:
+// CHECK-NS-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-NS-NEXT:    br label %[[RETURN:.*]]
+// CHECK-NS:       [[IF_END]]:
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-NS-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-NS-NEXT:    br label %[[RETURN]]
+// CHECK-NS:       [[RETURN]]:
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP2]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-DEFAULT-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-DEFAULT-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-DEFAULT-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-DEFAULT:       [[IF_THEN]]:
+// CHECK-DEFAULT-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-DEFAULT-NEXT:    br label %[[RETURN:.*]]
+// CHECK-DEFAULT:       [[IF_END]]:
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-DEFAULT-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-DEFAULT-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-DEFAULT-NEXT:    br label %[[RETURN]]
+// CHECK-DEFAULT:       [[RETURN]]:
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-DEFAULT-NEXT:    ret float [[TMP2]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-FENV-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-FENV-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-FENV-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-FENV:       [[IF_THEN]]:
+// CHECK-FENV-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-FENV-NEXT:    br label %[[RETURN:.*]]
+// CHECK-FENV:       [[IF_END]]:
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-FENV-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-FENV-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-FENV-NEXT:    br label %[[RETURN]]
+// CHECK-FENV:       [[RETURN]]:
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-FENV-NEXT:    ret float [[TMP2]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-O3-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[CMP:%.*]] = icmp slt i32 [[I]], 0
+// CHECK-O3-NEXT:    [[DOT:%.*]] = select i1 [[CMP]], float 3.000000e+00, float 0x3FD5555560000000
+// CHECK-O3-NEXT:    ret float [[DOT]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-SOURCE-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-SOURCE-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-SOURCE-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-SOURCE:       [[IF_THEN]]:
+// CHECK-SOURCE-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-SOURCE-NEXT:    br label %[[RETURN:.*]]
+// CHECK-SOURCE:       [[IF_END]]:
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-SOURCE-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-SOURCE-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-SOURCE-NEXT:    br label %[[RETURN]]
+// CHECK-SOURCE:       [[RETURN]]:
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-SOURCE-NEXT:    ret float [[TMP2]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-DOUBLE-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-DOUBLE-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-DOUBLE-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-DOUBLE:       [[IF_THEN]]:
+// CHECK-DOUBLE-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-DOUBLE-NEXT:    br label %[[RETURN:.*]]
+// CHECK-DOUBLE:       [[IF_END]]:
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-DOUBLE-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-DOUBLE-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-DOUBLE-NEXT:    br label %[[RETURN]]
+// CHECK-DOUBLE:       [[RETURN]]:
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-DOUBLE-NEXT:    ret float [[TMP2]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z5myAddif(
+// CHECK-EXTENDED-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-EXTENDED-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-EXTENDED-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-EXTENDED:       [[IF_THEN]]:
+// CHECK-EXTENDED-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-EXTENDED-NEXT:    br label %[[RETURN:.*]]
+// CHECK-EXTENDED:       [[IF_END]]:
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-EXTENDED-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-EXTENDED-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-EXTENDED-NEXT:    br label %[[RETURN]]
+// CHECK-EXTENDED:       [[RETURN]]:
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-EXTENDED-NEXT:    ret float [[TMP2]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z5myAddif(
+// CHECK-AIX-SAME: i32 noundef [[I:%.*]], float noundef [[F:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[RETVAL:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[I_ADDR:%.*]] = alloca i32, align 4
+// CHECK-AIX-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load i32, ptr [[I_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP0]], 0
+// CHECK-AIX-NEXT:    br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// CHECK-AIX:       [[IF_THEN]]:
+// CHECK-AIX-NEXT:    store float 3.000000e+00, ptr [[RETVAL]], align 4
+// CHECK-AIX-NEXT:    br label %[[RETURN:.*]]
+// CHECK-AIX:       [[IF_END]]:
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load double, ptr @_ZZ5myAddifE1v, align 8
+// CHECK-AIX-NEXT:    [[CONV:%.*]] = fptrunc double [[TMP1]] to float
+// CHECK-AIX-NEXT:    store float [[CONV]], ptr [[RETVAL]], align 4
+// CHECK-AIX-NEXT:    br label %[[RETURN]]
+// CHECK-AIX:       [[RETURN]]:
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[RETVAL]], align 4
+// CHECK-AIX-NEXT:    ret float [[TMP2]]
+//
 float myAdd(int i, float f) {
   if (i<0)
   return 1.0 + 2.0;
   // Check that floating point constant folding doesn't occur if
   // #pragma STC FENV_ACCESS is enabled.
-  //CHECK-FENV: llvm.experimental.constrained.fadd{{.*}}double 1.0{{.*}}double 2.0{{.*}}
   //CHECK: store float 3.0{{.*}}retval{{.*}}
   static double v = 1.0 / 3.0;
-  //CHECK-FENV: llvm.experimental.constrained.fptrunc.f32.f64{{.*}}
   //CHECK-NOT: fdiv
   return v;
 }
@@ -212,32 +1737,68 @@ float myAdd(int i, float f) {
 namespace ns {
 // Check that pragma float_control can appear in namespace.
 #pragma float_control(except, on, push)
+// CHECK-NS-LABEL: define dso_local noundef float @_ZN2ns6exc_onEdf(
+// CHECK-NS-SAME: double noundef [[X:%.*]], float noundef [[ZERO:%.*]]) #[[ATTR1]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca double, align 8
+// CHECK-NS-NEXT:    [[ZERO_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store double [[X]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    store float [[ZERO]], ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP0]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    [[DIV:%.*]] = call double @llvm.fdiv.f64(double 1.000000e+00, double [[CONV]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    store double [[DIV]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP1]]
+//
 float exc_on(double x, float zero) {
-// CHECK-NS: define {{.*}}exc_on{{.*}}
   {} try {
     x = 1.0 / zero; /* division by zero, the result unused */
-//CHECK-NS: llvm.experimental.constrained.fdiv{{.*}}
   } catch (...) {}
   return zero;
 }
 }
 
 // Check pragma is still effective after namespace closes
+// CHECK-NS-LABEL: define dso_local noundef float @_Z12exc_still_ondf(
+// CHECK-NS-SAME: double noundef [[X:%.*]], float noundef [[ZERO:%.*]]) #[[ATTR1]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca double, align 8
+// CHECK-NS-NEXT:    [[ZERO_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store double [[X]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    store float [[ZERO]], ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP0]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    [[DIV:%.*]] = call double @llvm.fdiv.f64(double 1.000000e+00, double [[CONV]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NS-NEXT:    store double [[DIV]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP1]]
+//
 float exc_still_on(double x, float zero) {
-// CHECK-NS: define {{.*}}exc_still_on{{.*}}
   {} try {
     x = 1.0 / zero; /* division by zero, the result unused */
-//CHECK-NS: llvm.experimental.constrained.fdiv{{.*}}
   } catch (...) {}
   return zero;
 }
 
 #pragma float_control(pop)
+// CHECK-NS-LABEL: define dso_local noundef float @_Z7exc_offdf(
+// CHECK-NS-SAME: double noundef [[X:%.*]], float noundef [[ZERO:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca double, align 8
+// CHECK-NS-NEXT:    [[ZERO_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store double [[X]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    store float [[ZERO]], ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-NS-NEXT:    [[DIV:%.*]] = fdiv double 1.000000e+00, [[CONV]]
+// CHECK-NS-NEXT:    store double [[DIV]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[ZERO_ADDR]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP1]]
+//
 float exc_off(double x, float zero) {
-// CHECK-NS: define {{.*}}exc_off{{.*}}
   {} try {
     x = 1.0 / zero; /* division by zero, the result unused */
-//CHECK-NS: fdiv double
   } catch (...) {}
   return zero;
 }
@@ -246,21 +1807,170 @@ namespace fc_template_namespace {
 #pragma float_control(except, on, push)
 template <class T>
 T exc_on(double x, T zero) {
-// CHECK-NS: define {{.*}}fc_template_namespace{{.*}}
   {} try {
     x = 1.0 / zero; /* division by zero, the result unused */
-//CHECK-NS: llvm.experimental.constrained.fdiv{{.*}}
   } catch (...) {}
   return zero;
 }
 }
 
 #pragma float_control(pop)
+// CHECK-NS-LABEL: define dso_local noundef float @_Z2xxdf(
+// CHECK-NS-SAME: double noundef [[X:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca double, align 8
+// CHECK-NS-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store double [[X]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load double, ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CALL:%.*]] = call noundef float @_ZN21fc_template_namespace6exc_onIfEET_dS1_(double noundef [[TMP0]], float noundef [[TMP1]])
+// CHECK-NS-NEXT:    ret float [[CALL]]
+//
 float xx(double x, float z) {
   return fc_template_namespace::exc_on<float>(x, z);
 }
 #endif // EXCEPT
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-NS-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NS-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-NS-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-NS-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-NS-NEXT:    ret float [[TMP2]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-DEFAULT-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-DEFAULT-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-DEFAULT-NEXT:    ret float [[TMP2]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-FENV-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00) #[[ATTR5:[0-9]+]]
+// CHECK-FENV-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-FENV-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-FENV-NEXT:    ret float [[TMP2]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-O3-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[ADD:%.*]] = fadd float [[X]], 2.000000e+00
+// CHECK-O3-NEXT:    ret float [[ADD]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-SOURCE-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-SOURCE-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-SOURCE-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-SOURCE-NEXT:    ret float [[TMP2]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-DOUBLE-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-DOUBLE-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-DOUBLE-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-DOUBLE-NEXT:    ret float [[TMP2]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z7try_lamfj(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-EXTENDED-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-EXTENDED-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-EXTENDED-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-EXTENDED-NEXT:    ret float [[TMP2]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z7try_lamfj(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], i32 noundef [[N:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
+// CHECK-AIX-NEXT:    [[RESULT:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[T:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[REF_TMP:%.*]] = alloca [[CLASS_ANON:%.*]], align 1
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store i32 [[N]], ptr [[N_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CALL:%.*]] = call noundef float @"_ZZ7try_lamfjENK3$_0clEff"(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]], float noundef 1.000000e+00, float noundef 2.000000e+00)
+// CHECK-AIX-NEXT:    store float [[CALL]], ptr [[T]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[T]], align 4
+// CHECK-AIX-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    store float [[ADD]], ptr [[RESULT]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[RESULT]], align 4
+// CHECK-AIX-NEXT:    ret float [[TMP2]]
+//
 float try_lam(float x, unsigned n) {
 // CHECK: define {{.*}}try_lam{{.*}}class.anon{{.*}}
   float result;
@@ -276,21 +1986,191 @@ float try_lam(float x, unsigned n) {
   return result;
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    ret float [[SUB]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    ret float [[SUB]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    ret float [[SUB]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[SUB:%.*]] = fsub float [[X]], [[Y]]
+// CHECK-O3-NEXT:    ret float [[SUB]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    ret float [[SUB]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    ret float [[SUB]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z5mySubff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    ret float [[SUB]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z5mySubff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    ret float [[SUB]]
+//
 float mySub(float x, float y) {
   // CHECK: define {{.*}}float {{.*}}mySub{{.*}}
-  // CHECK-NS: fsub float
-  // CHECK-SOURCE: fsub float
-  // CHECK-DOUBLE: fpext float
-  // CHECK-DOUBLE: fpext float
-  // CHECK-DOUBLE: fsub double
-  // CHECK-DOUBLE: fptrunc double {{.*}} to float
-  // CHECK-EXTENDED: fpext float
-  // CHECK-EXTENDED: fpext float
-  // CHECK-EXTENDED: fsub double
-  // CHECK-EXTENDED: fptrunc double {{.*}} to float
   return x - y;
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    ret float [[SUB]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    ret float [[SUB]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    ret float [[SUB]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[SUB:%.*]] = fsub float [[X]], [[Y]]
+// CHECK-O3-NEXT:    ret float [[SUB]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    ret float [[SUB]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    ret float [[SUB]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z11mySubSourceff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    ret float [[SUB]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z11mySubSourceff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[SUB:%.*]] = fsub float [[TMP0]], [[TMP1]]
+// CHECK-AIX-NEXT:    ret float [[SUB]]
+//
 float mySubSource(float x, float y) {
 // CHECK: define {{.*}}float {{.*}}mySubSource{{.*}}
 #pragma clang fp eval_method(source)
@@ -298,6 +2178,117 @@ float mySubSource(float x, float y) {
   // CHECK: fsub float
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-NS-NEXT:    [[SUB:%.*]] = fsub x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-NS-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[SUB]] to float
+// CHECK-NS-NEXT:    ret float [[CONV2]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-DEFAULT-NEXT:    [[SUB:%.*]] = fsub x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-DEFAULT-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[SUB]] to float
+// CHECK-DEFAULT-NEXT:    ret float [[CONV2]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-FENV-NEXT:    [[SUB:%.*]] = fsub x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-FENV-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[SUB]] to float
+// CHECK-FENV-NEXT:    ret float [[CONV2]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[CONV2:%.*]] = fsub float [[X]], [[Y]]
+// CHECK-O3-NEXT:    ret float [[CONV2]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-SOURCE-NEXT:    [[SUB:%.*]] = fsub x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-SOURCE-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[SUB]] to float
+// CHECK-SOURCE-NEXT:    ret float [[CONV2]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-DOUBLE-NEXT:    [[SUB:%.*]] = fsub x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-DOUBLE-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[SUB]] to float
+// CHECK-DOUBLE-NEXT:    ret float [[CONV2]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z13mySubExtendedff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[SUB:%.*]] = fsub x86_fp80 [[CONV]], [[CONV1]]
+// CHECK-EXTENDED-NEXT:    [[CONV2:%.*]] = fptrunc x86_fp80 [[SUB]] to float
+// CHECK-EXTENDED-NEXT:    ret float [[CONV2]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z13mySubExtendedff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-AIX-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-AIX-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-AIX-NEXT:    ret float [[CONV2]]
+//
 float mySubExtended(float x, float y) {
 // CHECK: define {{.*}}float {{.*}}mySubExtended{{.*}}
 #pragma clang fp eval_method(extended)
@@ -306,10 +2297,119 @@ float mySubExtended(float x, float y) {
   // CHECK: fpext float
   // CHECK: fsub x86_fp80
   // CHECK: fptrunc x86_fp80 {{.*}} to float
-  // CHECK-AIX: fsub double
-  // CHECK-AIX: fptrunc double
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-NS-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-NS-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-NS-NEXT:    ret float [[CONV2]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-DEFAULT-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-DEFAULT-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-DEFAULT-NEXT:    ret float [[CONV2]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-FENV-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-FENV-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-FENV-NEXT:    ret float [[CONV2]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[CONV2:%.*]] = fsub float [[X]], [[Y]]
+// CHECK-O3-NEXT:    ret float [[CONV2]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-SOURCE-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-SOURCE-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-SOURCE-NEXT:    ret float [[CONV2]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-DOUBLE-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-DOUBLE-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-DOUBLE-NEXT:    ret float [[CONV2]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z11mySubDoubleff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-EXTENDED-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-EXTENDED-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-EXTENDED-NEXT:    ret float [[CONV2]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z11mySubDoubleff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-AIX-NEXT:    [[SUB:%.*]] = fsub double [[CONV]], [[CONV1]]
+// CHECK-AIX-NEXT:    [[CONV2:%.*]] = fptrunc double [[SUB]] to float
+// CHECK-AIX-NEXT:    ret float [[CONV2]]
+//
 float mySubDouble(float x, float y) {
 // CHECK: define {{.*}}float {{.*}}mySubDouble{{.*}}
 #pragma clang fp eval_method(double)
@@ -321,6 +2421,78 @@ float mySubDouble(float x, float y) {
 }
 
 #ifndef NF128
+// CHECK-NS-LABEL: define dso_local noundef fp128 @_Z8mySub128gg(
+// CHECK-NS-SAME: fp128 noundef [[X:%.*]], fp128 noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-NS-NEXT:    store fp128 [[X]], ptr [[X_ADDR]], align 16
+// CHECK-NS-NEXT:    store fp128 [[Y]], ptr [[Y_ADDR]], align 16
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load fp128, ptr [[X_ADDR]], align 16
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[Y_ADDR]], align 16
+// CHECK-NS-NEXT:    [[SUB:%.*]] = fsub fp128 [[TMP0]], [[TMP1]]
+// CHECK-NS-NEXT:    ret fp128 [[SUB]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef fp128 @_Z8mySub128gg(
+// CHECK-DEFAULT-SAME: fp128 noundef [[X:%.*]], fp128 noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-DEFAULT-NEXT:    store fp128 [[X]], ptr [[X_ADDR]], align 16
+// CHECK-DEFAULT-NEXT:    store fp128 [[Y]], ptr [[Y_ADDR]], align 16
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load fp128, ptr [[X_ADDR]], align 16
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[Y_ADDR]], align 16
+// CHECK-DEFAULT-NEXT:    [[SUB:%.*]] = fsub fp128 [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    ret fp128 [[SUB]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef fp128 @_Z8mySub128gg(
+// CHECK-FENV-SAME: fp128 noundef [[X:%.*]], fp128 noundef [[Y:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-FENV-NEXT:    store fp128 [[X]], ptr [[X_ADDR]], align 16
+// CHECK-FENV-NEXT:    store fp128 [[Y]], ptr [[Y_ADDR]], align 16
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load fp128, ptr [[X_ADDR]], align 16
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[Y_ADDR]], align 16
+// CHECK-FENV-NEXT:    [[SUB:%.*]] = fsub fp128 [[TMP0]], [[TMP1]]
+// CHECK-FENV-NEXT:    ret fp128 [[SUB]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef fp128 @_Z8mySub128gg(
+// CHECK-SOURCE-SAME: fp128 noundef [[X:%.*]], fp128 noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-SOURCE-NEXT:    store fp128 [[X]], ptr [[X_ADDR]], align 16
+// CHECK-SOURCE-NEXT:    store fp128 [[Y]], ptr [[Y_ADDR]], align 16
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load fp128, ptr [[X_ADDR]], align 16
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[Y_ADDR]], align 16
+// CHECK-SOURCE-NEXT:    [[SUB:%.*]] = fsub fp128 [[TMP0]], [[TMP1]]
+// CHECK-SOURCE-NEXT:    ret fp128 [[SUB]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef fp128 @_Z8mySub128gg(
+// CHECK-DOUBLE-SAME: fp128 noundef [[X:%.*]], fp128 noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-DOUBLE-NEXT:    store fp128 [[X]], ptr [[X_ADDR]], align 16
+// CHECK-DOUBLE-NEXT:    store fp128 [[Y]], ptr [[Y_ADDR]], align 16
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load fp128, ptr [[X_ADDR]], align 16
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[Y_ADDR]], align 16
+// CHECK-DOUBLE-NEXT:    [[SUB:%.*]] = fsub fp128 [[TMP0]], [[TMP1]]
+// CHECK-DOUBLE-NEXT:    ret fp128 [[SUB]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef fp128 @_Z8mySub128gg(
+// CHECK-EXTENDED-SAME: fp128 noundef [[X:%.*]], fp128 noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca fp128, align 16
+// CHECK-EXTENDED-NEXT:    store fp128 [[X]], ptr [[X_ADDR]], align 16
+// CHECK-EXTENDED-NEXT:    store fp128 [[Y]], ptr [[Y_ADDR]], align 16
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load fp128, ptr [[X_ADDR]], align 16
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load fp128, ptr [[Y_ADDR]], align 16
+// CHECK-EXTENDED-NEXT:    [[SUB:%.*]] = fsub fp128 [[TMP0]], [[TMP1]]
+// CHECK-EXTENDED-NEXT:    ret fp128 [[SUB]]
+//
 __float128 mySub128(__float128 x, __float128 y) {
   // CHECK: define {{.*}}mySub128{{.*}}
   // Expect no fpext since fp128 is already widest
@@ -332,6 +2504,102 @@ __float128 mySub128(__float128 x, __float128 y) {
 }
 #endif
 
+// CHECK-NS-LABEL: define dso_local void @_Z9mySubfp16PDhS_S_(
+// CHECK-NS-SAME: ptr noundef [[RES:%.*]], ptr noundef [[X:%.*]], ptr noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[RES_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NS-NEXT:    store ptr [[RES]], ptr [[RES_ADDR]], align 8
+// CHECK-NS-NEXT:    store ptr [[X]], ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    store ptr [[Y]], ptr [[Y_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load half, ptr [[TMP0]], align 2
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fpext half [[TMP1]] to float
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[Y_ADDR]], align 8
+// CHECK-NS-NEXT:    [[TMP3:%.*]] = load half, ptr [[TMP2]], align 2
+// CHECK-NS-NEXT:    [[CONV1:%.*]] = fpext half [[TMP3]] to float
+// CHECK-NS-NEXT:    [[SUB:%.*]] = fsub float [[CONV]], [[CONV1]]
+// CHECK-NS-NEXT:    [[CONV2:%.*]] = fptrunc float [[SUB]] to half
+// CHECK-NS-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[RES_ADDR]], align 8
+// CHECK-NS-NEXT:    store half [[CONV2]], ptr [[TMP4]], align 2
+// CHECK-NS-NEXT:    ret void
+//
+// CHECK-FENV-LABEL: define dso_local void @_Z9mySubfp16PDhS_S_(
+// CHECK-FENV-SAME: ptr noundef [[RES:%.*]], ptr noundef [[X:%.*]], ptr noundef [[Y:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[RES_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FENV-NEXT:    store ptr [[RES]], ptr [[RES_ADDR]], align 8
+// CHECK-FENV-NEXT:    store ptr [[X]], ptr [[X_ADDR]], align 8
+// CHECK-FENV-NEXT:    store ptr [[Y]], ptr [[Y_ADDR]], align 8
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 8
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load half, ptr [[TMP0]], align 2
+// CHECK-FENV-NEXT:    [[CONV:%.*]] = fpext half [[TMP1]] to float
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[Y_ADDR]], align 8
+// CHECK-FENV-NEXT:    [[TMP3:%.*]] = load half, ptr [[TMP2]], align 2
+// CHECK-FENV-NEXT:    [[CONV1:%.*]] = fpext half [[TMP3]] to float
+// CHECK-FENV-NEXT:    [[SUB:%.*]] = fsub float [[CONV]], [[CONV1]]
+// CHECK-FENV-NEXT:    [[CONV2:%.*]] = fptrunc float [[SUB]] to half
+// CHECK-FENV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[RES_ADDR]], align 8
+// CHECK-FENV-NEXT:    store half [[CONV2]], ptr [[TMP4]], align 2
+// CHECK-FENV-NEXT:    ret void
+//
+// CHECK-O3-LABEL: define dso_local void @_Z9mySubfp16PDhS_S_(
+// CHECK-O3-SAME: ptr noundef writeonly captures(none) initializes((0, 2)) [[RES:%.*]], ptr noundef readonly captures(none) [[X:%.*]], ptr noundef readonly captures(none) [[Y:%.*]]) local_unnamed_addr #[[ATTR7:[0-9]+]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[TMP0:%.*]] = load half, ptr [[X]], align 2, !tbaa [[__FP16_TBAA7:![0-9]+]]
+// CHECK-O3-NEXT:    [[TMP1:%.*]] = load half, ptr [[Y]], align 2, !tbaa [[__FP16_TBAA7]]
+// CHECK-O3-NEXT:    [[CONV2:%.*]] = fsub half [[TMP0]], [[TMP1]]
+// CHECK-O3-NEXT:    store half [[CONV2]], ptr [[RES]], align 2, !tbaa [[__FP16_TBAA7]]
+// CHECK-O3-NEXT:    ret void
+//
+// CHECK-CONST-ARGS-LABEL: define dso_local void @_Z9mySubfp16PDhS_S_(
+// CHECK-CONST-ARGS-SAME: ptr noundef [[RES:%.*]], ptr noundef [[X:%.*]], ptr noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-CONST-ARGS-NEXT:  [[ENTRY:.*:]]
+// CHECK-CONST-ARGS-NEXT:    [[RES_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-CONST-ARGS-NEXT:    [[X_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-CONST-ARGS-NEXT:    [[Y_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-CONST-ARGS-NEXT:    store ptr [[RES]], ptr [[RES_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    store ptr [[X]], ptr [[X_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    store ptr [[Y]], ptr [[Y_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    [[TMP1:%.*]] = load half, ptr [[TMP0]], align 2
+// CHECK-CONST-ARGS-NEXT:    [[CONV:%.*]] = fpext half [[TMP1]] to float
+// CHECK-CONST-ARGS-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[Y_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    [[TMP3:%.*]] = load half, ptr [[TMP2]], align 2
+// CHECK-CONST-ARGS-NEXT:    [[CONV1:%.*]] = fpext half [[TMP3]] to float
+// CHECK-CONST-ARGS-NEXT:    [[SUB:%.*]] = fsub float [[CONV]], [[CONV1]]
+// CHECK-CONST-ARGS-NEXT:    [[CONV2:%.*]] = fptrunc float [[SUB]] to half
+// CHECK-CONST-ARGS-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[RES_ADDR]], align 8
+// CHECK-CONST-ARGS-NEXT:    store half [[CONV2]], ptr [[TMP4]], align 2
+// CHECK-CONST-ARGS-NEXT:    ret void
+//
+// CHECK-AIX-LABEL: define void @_Z9mySubfp16PDhS_S_(
+// CHECK-AIX-SAME: ptr noundef [[RES:%.*]], ptr noundef [[X:%.*]], ptr noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[RES_ADDR:%.*]] = alloca ptr, align 4
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca ptr, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca ptr, align 4
+// CHECK-AIX-NEXT:    store ptr [[RES]], ptr [[RES_ADDR]], align 4
+// CHECK-AIX-NEXT:    store ptr [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store ptr [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load i16, ptr [[TMP0]], align 2
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = bitcast i16 [[TMP1]] to half
+// CHECK-AIX-NEXT:    [[CONV:%.*]] = fpext half [[TMP2]] to float
+// CHECK-AIX-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP4:%.*]] = load i16, ptr [[TMP3]], align 2
+// CHECK-AIX-NEXT:    [[TMP5:%.*]] = bitcast i16 [[TMP4]] to half
+// CHECK-AIX-NEXT:    [[CONV1:%.*]] = fpext half [[TMP5]] to float
+// CHECK-AIX-NEXT:    [[SUB:%.*]] = fsub float [[CONV]], [[CONV1]]
+// CHECK-AIX-NEXT:    [[CONV2:%.*]] = fptrunc float [[SUB]] to half
+// CHECK-AIX-NEXT:    [[TMP6:%.*]] = bitcast half [[CONV2]] to i16
+// CHECK-AIX-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[RES_ADDR]], align 4
+// CHECK-AIX-NEXT:    store i16 [[TMP6]], ptr [[TMP7]], align 2
+// CHECK-AIX-NEXT:    ret void
+//
 void mySubfp16(__fp16 *res, __fp16 *x, __fp16 *y) {
   // CHECK: define {{.*}}mySubfp16{{.*}}
   *res = *x - *y;
@@ -340,46 +2608,697 @@ void mySubfp16(__fp16 *res, __fp16 *x, __fp16 *y) {
   // CHECK-NEXT: fpext half{{.*}}
   // CHECK-NEXT: load half
   // CHECK-NEXT: load half
-  // CHECK-NS: fpext half{{.*}} to float
-  // CHECK-DEFAULT: fpext half{{.*}} to float
-  // CHECK-DOUBLE: fpext half{{.*}} to float
-  // CHECK-EXTENDED: fpext half{{.*}} to float
   // CHECK-NEXT: fsub
   // CHECK-NEXT: fptrunc {{.*}}to half
-  // CHECK-NS: fptrunc float {{.*}} to half
-  // CHECK-DOUBLE: fptrunc float {{.*}} to half
-  // CHECK-EXTENDED: fptrunc float {{.*}} to half
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-NS-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-NS-NEXT:    ret float [[DIV1]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-DEFAULT-NEXT:    ret float [[DIV1]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-FENV-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-FENV-NEXT:    ret float [[DIV1]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[DIV:%.*]] = fdiv float [[Y]], [[Z]]
+// CHECK-O3-NEXT:    [[DIV1:%.*]] = fdiv float [[X]], [[DIV]]
+// CHECK-O3-NEXT:    ret float [[DIV1]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-SOURCE-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-SOURCE-NEXT:    ret float [[DIV1]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-DOUBLE-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-DOUBLE-NEXT:    ret float [[DIV1]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z3Divfff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-EXTENDED-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-EXTENDED-NEXT:    ret float [[DIV1]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z3Divfff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-AIX-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-AIX-NEXT:    ret float [[DIV1]]
+//
 float Div(float x, float y, float z) {
   // CHECK: define{{.*}}float {{.*}}Div{{.*}}
-  // CHECK-CONST-ARGS: fdiv float
   return x / (y / z);
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-NS-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-NS-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-NS-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-NS-NEXT:    ret float [[CONV4]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-DEFAULT-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-DEFAULT-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-DEFAULT-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-DEFAULT-NEXT:    ret float [[CONV4]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-FENV-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-FENV-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-FENV-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-FENV-NEXT:    ret float [[CONV4]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[CONV:%.*]] = fpext float [[X]] to x86_fp80
+// CHECK-O3-NEXT:    [[CONV1:%.*]] = fpext float [[Y]] to x86_fp80
+// CHECK-O3-NEXT:    [[CONV2:%.*]] = fpext float [[Z]] to x86_fp80
+// CHECK-O3-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-O3-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-O3-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-O3-NEXT:    ret float [[CONV4]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-SOURCE-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-SOURCE-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-SOURCE-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-SOURCE-NEXT:    ret float [[CONV4]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-DOUBLE-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-DOUBLE-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-DOUBLE-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-DOUBLE-NEXT:    ret float [[CONV4]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z11DivExtendedfff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to x86_fp80
+// CHECK-EXTENDED-NEXT:    [[DIV:%.*]] = fdiv x86_fp80 [[CONV1]], [[CONV2]]
+// CHECK-EXTENDED-NEXT:    [[DIV3:%.*]] = fdiv x86_fp80 [[CONV]], [[DIV]]
+// CHECK-EXTENDED-NEXT:    [[CONV4:%.*]] = fptrunc x86_fp80 [[DIV3]] to float
+// CHECK-EXTENDED-NEXT:    ret float [[CONV4]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z11DivExtendedfff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-AIX-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-AIX-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-AIX-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-AIX-NEXT:    ret float [[CONV4]]
+//
 float DivExtended(float x, float y, float z) {
 // CHECK: define{{.*}}float {{.*}}DivExtended{{.*}}
 #pragma clang fp eval_method(extended)
-  // CHECK-CONST-ARGS: fdiv x86_fp80
-  // CHECK-CONST-ARGS: fptrunc x86_fp80
   return x / (y / z);
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-NS-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-NS-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-NS-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-NS-NEXT:    ret float [[CONV4]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-DEFAULT-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-DEFAULT-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-DEFAULT-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-DEFAULT-NEXT:    ret float [[CONV4]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-FENV-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-FENV-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-FENV-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-FENV-NEXT:    ret float [[CONV4]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[CONV:%.*]] = fpext float [[X]] to double
+// CHECK-O3-NEXT:    [[CONV1:%.*]] = fpext float [[Y]] to double
+// CHECK-O3-NEXT:    [[CONV2:%.*]] = fpext float [[Z]] to double
+// CHECK-O3-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-O3-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-O3-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-O3-NEXT:    ret float [[CONV4]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-SOURCE-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-SOURCE-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-SOURCE-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-SOURCE-NEXT:    ret float [[CONV4]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-DOUBLE-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-DOUBLE-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-DOUBLE-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-DOUBLE-NEXT:    ret float [[CONV4]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z9DivDoublefff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-EXTENDED-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-EXTENDED-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-EXTENDED-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-EXTENDED-NEXT:    ret float [[CONV4]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z9DivDoublefff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[CONV2:%.*]] = fpext float [[TMP2]] to double
+// CHECK-AIX-NEXT:    [[DIV:%.*]] = fdiv double [[CONV1]], [[CONV2]]
+// CHECK-AIX-NEXT:    [[DIV3:%.*]] = fdiv double [[CONV]], [[DIV]]
+// CHECK-AIX-NEXT:    [[CONV4:%.*]] = fptrunc double [[DIV3]] to float
+// CHECK-AIX-NEXT:    ret float [[CONV4]]
+//
 float DivDouble(float x, float y, float z) {
 // CHECK: define{{.*}}float {{.*}}DivDouble{{.*}}
 #pragma clang fp eval_method(double)
-  // CHECK-CONST-ARGS: fdiv double
-  // CHECK-CONST-ARGS: fptrunc double
   return x / (y / z);
 }
 
+// CHECK-NS-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-NS-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NS-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NS-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-NS-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-NS-NEXT:    ret float [[DIV1]]
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-DEFAULT-NEXT:    ret float [[DIV1]]
+//
+// CHECK-FENV-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-FENV-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FENV-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-FENV-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-FENV-NEXT:    ret float [[DIV1]]
+//
+// CHECK-O3-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-O3-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    [[DIV:%.*]] = fdiv float [[Y]], [[Z]]
+// CHECK-O3-NEXT:    [[DIV1:%.*]] = fdiv float [[X]], [[DIV]]
+// CHECK-O3-NEXT:    ret float [[DIV1]]
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-SOURCE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-SOURCE-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-SOURCE-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-SOURCE-NEXT:    ret float [[DIV1]]
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-DOUBLE-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DOUBLE-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-DOUBLE-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-DOUBLE-NEXT:    ret float [[DIV1]]
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef float @_Z9DivSourcefff(
+// CHECK-EXTENDED-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-EXTENDED-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-EXTENDED-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-EXTENDED-NEXT:    ret float [[DIV1]]
+//
+// CHECK-AIX-LABEL: define noundef float @_Z9DivSourcefff(
+// CHECK-AIX-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-AIX-NEXT:    [[DIV:%.*]] = fdiv float [[TMP1]], [[TMP2]]
+// CHECK-AIX-NEXT:    [[DIV1:%.*]] = fdiv float [[TMP0]], [[DIV]]
+// CHECK-AIX-NEXT:    ret float [[DIV1]]
+//
 float DivSource(float x, float y, float z) {
 // CHECK: define{{.*}}float {{.*}}DivSource{{.*}}
 #pragma clang fp eval_method(source)
-  // CHECK-CONST-ARGS: fdiv float
   return x / (y / z);
 }
 
+// CHECK-NS-LABEL: define dso_local noundef i32 @main(
+// CHECK-NS-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-NS-NEXT:  [[ENTRY:.*:]]
+// CHECK-NS-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-NS-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-NS-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-NS-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-NS-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-NS-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-NS-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-NS-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-NS-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-NS-NEXT:    ret i32 0
+//
+// CHECK-DEFAULT-LABEL: define dso_local noundef i32 @main(
+// CHECK-DEFAULT-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DEFAULT-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-DEFAULT-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DEFAULT-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-DEFAULT-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DEFAULT-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-DEFAULT-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DEFAULT-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-DEFAULT-NEXT:    ret i32 0
+//
+// CHECK-FENV-LABEL: define dso_local noundef i32 @main(
+// CHECK-FENV-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-FENV-NEXT:  [[ENTRY:.*:]]
+// CHECK-FENV-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-FENV-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00) #[[ATTR5]]
+// CHECK-FENV-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-FENV-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00) #[[ATTR5]]
+// CHECK-FENV-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-FENV-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00) #[[ATTR5]]
+// CHECK-FENV-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-FENV-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00) #[[ATTR5]]
+// CHECK-FENV-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-FENV-NEXT:    ret i32 0
+//
+// CHECK-O3-LABEL: define dso_local noundef i32 @main(
+// CHECK-O3-SAME: ) local_unnamed_addr #[[ATTR0]] {
+// CHECK-O3-NEXT:  [[ENTRY:.*:]]
+// CHECK-O3-NEXT:    ret i32 0
+//
+// CHECK-SOURCE-LABEL: define dso_local noundef i32 @main(
+// CHECK-SOURCE-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-SOURCE-NEXT:  [[ENTRY:.*:]]
+// CHECK-SOURCE-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-SOURCE-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-SOURCE-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-SOURCE-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-SOURCE-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-SOURCE-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-SOURCE-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-SOURCE-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-SOURCE-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-SOURCE-NEXT:    ret i32 0
+//
+// CHECK-DOUBLE-LABEL: define dso_local noundef i32 @main(
+// CHECK-DOUBLE-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-DOUBLE-NEXT:  [[ENTRY:.*:]]
+// CHECK-DOUBLE-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-DOUBLE-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DOUBLE-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-DOUBLE-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DOUBLE-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-DOUBLE-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DOUBLE-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-DOUBLE-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DOUBLE-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-DOUBLE-NEXT:    ret i32 0
+//
+// CHECK-EXTENDED-LABEL: define dso_local noundef i32 @main(
+// CHECK-EXTENDED-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-EXTENDED-NEXT:  [[ENTRY:.*:]]
+// CHECK-EXTENDED-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-EXTENDED-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-EXTENDED-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-EXTENDED-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-EXTENDED-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-EXTENDED-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-EXTENDED-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-EXTENDED-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-EXTENDED-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-EXTENDED-NEXT:    ret i32 0
+//
+// CHECK-AIX-LABEL: define noundef i32 @main(
+// CHECK-AIX-SAME: ) #[[ATTR3:[0-9]+]] {
+// CHECK-AIX-NEXT:  [[ENTRY:.*:]]
+// CHECK-AIX-NEXT:    [[F:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[FEXTENDED:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[FDOUBLE:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[FSOURCE:%.*]] = alloca float, align 4
+// CHECK-AIX-NEXT:    [[CALL:%.*]] = call noundef float @_Z3Divfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-AIX-NEXT:    store float [[CALL]], ptr [[F]], align 4
+// CHECK-AIX-NEXT:    [[CALL1:%.*]] = call noundef float @_Z11DivExtendedfff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-AIX-NEXT:    store float [[CALL1]], ptr [[FEXTENDED]], align 4
+// CHECK-AIX-NEXT:    [[CALL2:%.*]] = call noundef float @_Z9DivDoublefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-AIX-NEXT:    store float [[CALL2]], ptr [[FDOUBLE]], align 4
+// CHECK-AIX-NEXT:    [[CALL3:%.*]] = call noundef float @_Z9DivSourcefff(float noundef 0x4010CCCCC0000000, float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-AIX-NEXT:    store float [[CALL3]], ptr [[FSOURCE]], align 4
+// CHECK-AIX-NEXT:    ret i32 0
+//
 int main() {
   float f = Div(4.2f, 1.0f, 3.0f);
   float fextended = DivExtended(4.2f, 1.0f, 3.0f);
@@ -387,3 +3306,11 @@ int main() {
   float fsource = DivSource(4.2f, 1.0f, 3.0f);
   // CHECK: store float
 }
+//.
+// CHECK-O3: [[META3:![0-9]+]] = !{!"omnipotent char", [[META4:![0-9]+]], i64 0}
+// CHECK-O3: [[META4]] = !{!"Simple C++ TBAA"}
+// CHECK-O3: [[FLOAT_TBAA5]] = !{[[META6:![0-9]+]], [[META6]], i64 0}
+// CHECK-O3: [[META6]] = !{!"float", [[META3]], i64 0}
+// CHECK-O3: [[__FP16_TBAA7]] = !{[[META8:![0-9]+]], [[META8]], i64 0}
+// CHECK-O3: [[META8]] = !{!"__fp16", [[META3]], i64 0}
+//.
diff --git a/clang/test/CodeGen/fp-floatcontrol-stack.cpp b/clang/test/CodeGen/fp-floatcontrol-stack.cpp
index 237c9d4f9a37e..cb4fdab9ac2b2 100644
--- a/clang/test/CodeGen/fp-floatcontrol-stack.cpp
+++ b/clang/test/CodeGen/fp-floatcontrol-stack.cpp
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -ffp-contract=on -DDEFAULT=1 -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-DDEFAULT %s
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -ffp-contract=on -DEBSTRICT=1 -ffp-exception-behavior=strict -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-DEBSTRICT %s
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -DFAST=1 -ffast-math -ffp-contract=fast -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-FAST %s
@@ -6,25 +7,56 @@
 #define FUN(n) \
   (float z) { return n * z + n; }
 
-// CHECK-DDEFAULT: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-DEBSTRICT: Function Attrs: mustprogress noinline nounwind optnone strictfp{{$$}}
-// CHECK-FAST: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-NOHONOR: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z11fun_defaultf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 1.000000e+00, float [[TMP0]], float 1.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z11fun_defaultf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CONV:%.*]] = call float @llvm.sitofp.f32.i32(i32 1) #[[ATTR7:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CONV1:%.*]] = call float @llvm.sitofp.f32.i32(i32 1) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float [[CONV]], float [[TMP0]], float [[CONV1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11fun_defaultf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 1.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 1.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11fun_defaultf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call nnan ninf float @llvm.fmuladd.f32(float 1.000000e+00, float [[TMP0]], float 1.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
 float fun_default FUN(1)
 //CHECK-LABEL: define {{.*}} @_Z11fun_defaultf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: call float @llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
 // Note that backend wants constrained intrinsics used
 // throughout the function if they are needed anywhere in the function.
 // In that case, operations are built with constrained intrinsics operator
 // but using default settings for exception behavior and rounding mode.
-//CHECK-DEBSTRICT: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
 #endif
 #if FAST
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 
 #pragma float_control(push)
@@ -32,190 +64,510 @@ float fun_default FUN(1)
 // Rule: precise must be enabled
 #pragma float_control(except, on)
 #endif
-    // CHECK-FAST: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-    // CHECK-DDEFAULT: Function Attrs: mustprogress noinline nounwind optnone strictfp{{$$}}
-    // CHECK-DEBSTRICT: Function Attrs: mustprogress noinline nounwind optnone strictfp{{$$}}
-    // CHECK-NOHONOR: Function Attrs: mustprogress noinline nounwind optnone strictfp{{$$}}
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z6exc_onf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[CONV:%.*]] = call float @llvm.sitofp.f32.i32(i32 2) #[[ATTR7:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[CONV1:%.*]] = call float @llvm.sitofp.f32.i32(i32 2) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float [[CONV]], float [[TMP0]], float [[CONV1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z6exc_onf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CONV:%.*]] = call float @llvm.sitofp.f32.i32(i32 2) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CONV1:%.*]] = call float @llvm.sitofp.f32.i32(i32 2) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float [[CONV]], float [[TMP0]], float [[CONV1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z6exc_onf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 2.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 2.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z6exc_onf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[CONV:%.*]] = call nnan ninf float @llvm.sitofp.f32.i32(i32 2) #[[ATTR7:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[CONV1:%.*]] = call nnan ninf float @llvm.sitofp.f32.i32(i32 2) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call nnan ninf float @llvm.fmuladd.f32(float [[CONV]], float [[TMP0]], float [[CONV1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
     float exc_on FUN(2)
 //CHECK-LABEL: define {{.*}} @_Z6exc_onf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: llvm.experimental.constrained.fmul{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: llvm.experimental.constrained.fmuladd{{.*}}tonearest{{.*}}strict
 #endif
 #if NOHONOR
-//CHECK-NOHONOR: nnan ninf float {{.*}}llvm.experimental.constrained.fmuladd{{.*}}tonearest{{.*}}strict
 #endif
 #if FAST
 //Not possible to enable float_control(except) in FAST mode.
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 
 #pragma float_control(pop)
-    // CHECK-DDEFAULT: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-    // CHECK-DEBSTRICT: Function Attrs: mustprogress noinline nounwind optnone strictfp{{$$}}
-    // CHECK-FAST: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-    // CHECK-NOHONOR: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z7exc_popf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 5.000000e+00, float [[TMP0]], float 5.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z7exc_popf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CONV:%.*]] = call float @llvm.sitofp.f32.i32(i32 5) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CONV1:%.*]] = call float @llvm.sitofp.f32.i32(i32 5) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float [[CONV]], float [[TMP0]], float [[CONV1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z7exc_popf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 5.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 5.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z7exc_popf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call nnan ninf float @llvm.fmuladd.f32(float 5.000000e+00, float [[TMP0]], float 5.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
     float exc_pop FUN(5)
 //CHECK-LABEL: define {{.*}} @_Z7exc_popf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: call float @llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: llvm.experimental.constrained.fmuladd{{.*}}tonearest{{.*}}strict
 #endif
 #if NOHONOR
-//CHECK-NOHONOR: call nnan ninf float @llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 
 #pragma float_control(except, off)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z7exc_offf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 5.000000e+00, float [[TMP0]], float 5.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z7exc_offf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 5.000000e+00, float [[TMP0]], float 5.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z7exc_offf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 5.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 5.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z7exc_offf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call nnan ninf float @llvm.fmuladd.f32(float 5.000000e+00, float [[TMP0]], float 5.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
         float exc_off FUN(5)
 //CHECK-LABEL: define {{.*}} @_Z7exc_offf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: call float @llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: call float @llvm.fmuladd{{.*}}
 #endif
 #if NOHONOR
-//CHECK-NOHONOR: call nnan ninf float @llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 
 #pragma float_control(precise, on, push)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z10precise_onf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z10precise_onf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z10precise_onf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-FAST-NEXT:    ret float [[TMP1]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z10precise_onf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
             float precise_on FUN(3)
 //CHECK-LABEL: define {{.*}} @_Z10precise_onf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if NOHONOR
 // If precise is pushed then all fast-math should be off!
-//CHECK-NOHONOR: call float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 
 #pragma float_control(pop)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z11precise_popf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z11precise_popf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11precise_popf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 3.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 3.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11precise_popf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call nnan ninf float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
                 float precise_pop FUN(3)
 //CHECK-LABEL: define {{.*}} @_Z11precise_popf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if NOHONOR
-//CHECK-NOHONOR: call nnan ninf float @llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 #pragma float_control(precise, off)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z11precise_offf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-DDEFAULT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-DDEFAULT-NEXT:    ret float [[ADD]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z11precise_offf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-DEBSTRICT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-DEBSTRICT-NEXT:    ret float [[ADD]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11precise_offf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11precise_offf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-NOHONOR-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-NOHONOR-NEXT:    ret float [[ADD]]
+//
                     float precise_off FUN(4)
 //CHECK-LABEL: define {{.*}} @_Z11precise_offf{{.*}}
 #if DEFAULT
 // Note: precise_off enables fp_contract=fast and the instructions
 // generated do not include the contract flag, although it was enabled
 // in IRBuilder.
-//CHECK-DDEFAULT: fmul fast float
-//CHECK-DDEFAULT: fadd fast float
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: fmul fast float
-//CHECK-DEBSTRICT: fadd fast float
 #endif
 #if NOHONOR
 // fast math should be enabled, and contract should be fast
-//CHECK-NOHONOR: fmul fast float
-//CHECK-NOHONOR: fadd fast float
 #endif
 #if FAST
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 
 #pragma float_control(precise, on)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z11precise_on2f(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z11precise_on2f(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11precise_on2f(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-FAST-NEXT:    ret float [[TMP1]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z11precise_on2f(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
                         float precise_on2 FUN(3)
 //CHECK-LABEL: define {{.*}} @_Z11precise_on2f{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if NOHONOR
 // fast math should be off, and contract should be on
-//CHECK-NOHONOR: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 
 #pragma float_control(push)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z12precise_pushf(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z12precise_pushf(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z12precise_pushf(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-FAST-NEXT:    ret float [[TMP1]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z12precise_pushf(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
                             float precise_push FUN(3)
 //CHECK-LABEL: define {{.*}} @_Z12precise_pushf{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if NOHONOR
-//CHECK-NOHONOR: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 
 #pragma float_control(precise, off)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z12precise_off2f(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-DDEFAULT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-DDEFAULT-NEXT:    ret float [[ADD]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z12precise_off2f(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-DEBSTRICT-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-DEBSTRICT-NEXT:    ret float [[ADD]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z12precise_off2f(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-FAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z12precise_off2f(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[MUL:%.*]] = fmul fast float 4.000000e+00, [[TMP0]]
+// CHECK-NOHONOR-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], 4.000000e+00
+// CHECK-NOHONOR-NEXT:    ret float [[ADD]]
+//
                                 float precise_off2 FUN(4)
 //CHECK-LABEL: define {{.*}} @_Z12precise_off2f{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: fmul fast float
-//CHECK-DDEFAULT: fadd fast float
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: fmul fast float
-//CHECK-DEBSTRICT: fadd fast float
 #endif
 #if NOHONOR
 // fast math settings since precise is off
-//CHECK-NOHONOR: fmul fast float
-//CHECK-NOHONOR: fadd fast float
 #endif
 #if FAST
-//CHECK-FAST: fmul fast float
-//CHECK-FAST: fadd fast float
 #endif
 
 #pragma float_control(pop)
+// CHECK-DDEFAULT-LABEL: define dso_local noundef float @_Z12precise_pop2f(
+// CHECK-DDEFAULT-SAME: float noundef [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DDEFAULT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local noundef float @_Z12precise_pop2f(
+// CHECK-DEBSTRICT-SAME: float noundef [[Z:%.*]]) #[[ATTR2]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    ret float [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z12precise_pop2f(
+// CHECK-FAST-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR1]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-FAST-NEXT:    ret float [[TMP1]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z12precise_pop2f(
+// CHECK-NOHONOR-SAME: float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = call float @llvm.fmuladd.f32(float 3.000000e+00, float [[TMP0]], float 3.000000e+00)
+// CHECK-NOHONOR-NEXT:    ret float [[TMP1]]
+//
                                     float precise_pop2 FUN(3)
 //CHECK-LABEL: define {{.*}} @_Z12precise_pop2f{{.*}}
 #if DEFAULT
-//CHECK-DDEFAULT: llvm.fmuladd{{.*}}
 #endif
 #if EBSTRICT
-//CHECK-DEBSTRICT: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if NOHONOR
-//CHECK-NOHONOR: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 #if FAST
-//CHECK-FAST: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 
 #ifndef FAST
@@ -223,53 +575,324 @@ float fun_default FUN(1)
 #pragma float_control(except, on)
 #endif
                                         float y();
-// CHECK-DDEFAULT: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-DEBSTRICT: Function Attrs: mustprogress noinline nounwind optnone strictfp{{$$}}
-// CHECK-FAST: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-NOHONOR: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
+// CHECK-DDEFAULT-LABEL: define linkonce_odr void @_ZN2ONC1Ev(
+// CHECK-DDEFAULT-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR2]] comdat align 2 {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DDEFAULT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    call void @_ZN2ONC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-DDEFAULT-NEXT:    ret void
+//
+// CHECK-DEBSTRICT-LABEL: define linkonce_odr void @_ZN2ONC1Ev(
+// CHECK-DEBSTRICT-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DEBSTRICT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    call void @_ZN2ONC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-DEBSTRICT-NEXT:    ret void
+//
+// CHECK-FAST-LABEL: define linkonce_odr void @_ZN2ONC1Ev(
+// CHECK-FAST-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    call void @_ZN2ONC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-FAST-NEXT:    ret void
+//
+// CHECK-NOHONOR-LABEL: define linkonce_odr void @_ZN2ONC1Ev(
+// CHECK-NOHONOR-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR2]] comdat align 2 {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NOHONOR-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    call void @_ZN2ONC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-NOHONOR-NEXT:    ret void
+//
 class ON {
   // Settings for top level class initializer use program source setting.
   float z = 2 + y() * 7;
 //CHECK-LABEL: define {{.*}} void @_ZN2ONC2Ev{{.*}}
 #if DEFAULT
-// CHECK-DDEFAULT: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
 #endif
 #if EBSTRICT
-// CHECK-DEBSTRICT: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
 #endif
 #if NOHONOR
-// CHECK-NOHONOR: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
 #endif
 #if FAST
-// CHECK-FAST: float {{.*}}llvm.fmuladd{{.*}}
 #endif
 };
 ON on;
 #pragma float_control(except, off)
-// CHECK-DDEFAULT: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-DEBSTRICT: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-FAST: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
-// CHECK-NOHONOR: Function Attrs: mustprogress noinline nounwind optnone{{$$}}
+// CHECK-DDEFAULT-LABEL: define linkonce_odr void @_ZN3OFFC1Ev(
+// CHECK-DDEFAULT-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DDEFAULT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    call void @_ZN3OFFC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-DDEFAULT-NEXT:    ret void
+//
+// CHECK-DEBSTRICT-LABEL: define linkonce_odr void @_ZN3OFFC1Ev(
+// CHECK-DEBSTRICT-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR2]] comdat align 2 {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DEBSTRICT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    call void @_ZN3OFFC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-DEBSTRICT-NEXT:    ret void
+//
+// CHECK-FAST-LABEL: define linkonce_odr void @_ZN3OFFC1Ev(
+// CHECK-FAST-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    call void @_ZN3OFFC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-FAST-NEXT:    ret void
+//
+// CHECK-NOHONOR-LABEL: define linkonce_odr void @_ZN3OFFC1Ev(
+// CHECK-NOHONOR-SAME: ptr noundef nonnull align 4 dereferenceable(4) [[THIS:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NOHONOR-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    call void @_ZN3OFFC2Ev(ptr noundef nonnull align 4 dereferenceable(4) [[THIS1]])
+// CHECK-NOHONOR-NEXT:    ret void
+//
 class OFF {
   float w = 2 + y() * 7;
-// CHECK-LABEL: define {{.*}} void @_ZN3OFFC2Ev{{.*}}
-// CHECK: call float {{.*}}llvm.fmuladd
 };
 OFF off;
 
+// CHECK-DDEFAULT-LABEL: define dso_local <2 x float> @_Z6useAddv(
+// CHECK-DDEFAULT-SAME: ) #[[ATTR4:[0-9]+]] {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-DDEFAULT-NEXT:    [[A:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DDEFAULT-NEXT:    [[B:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DDEFAULT-NEXT:    [[AGG_TMP:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DDEFAULT-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[A]], float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DDEFAULT-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[B]], float noundef 2.000000e+00, float noundef 4.000000e+00)
+// CHECK-DDEFAULT-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP]], ptr align 4 [[B]], i64 8, i1 false)
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[AGG_TMP]], align 4
+// CHECK-DDEFAULT-NEXT:    [[CALL:%.*]] = call <2 x float> @_ZNK9MyComplexplES_(ptr noundef nonnull align 4 dereferenceable(8) [[A]], <2 x float> [[TMP0]])
+// CHECK-DDEFAULT-NEXT:    store <2 x float> [[CALL]], ptr [[RETVAL]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-DDEFAULT-NEXT:    ret <2 x float> [[TMP1]]
+//
+// CHECK-DEBSTRICT-LABEL: define dso_local <2 x float> @_Z6useAddv(
+// CHECK-DEBSTRICT-SAME: ) #[[ATTR4:[0-9]+]] {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[A:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[B:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[AGG_TMP:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DEBSTRICT-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[A]], float noundef 1.000000e+00, float noundef 3.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[B]], float noundef 2.000000e+00, float noundef 4.000000e+00)
+// CHECK-DEBSTRICT-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP]], ptr align 4 [[B]], i64 8, i1 false)
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[AGG_TMP]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[CALL:%.*]] = call <2 x float> @_ZNK9MyComplexplES_(ptr noundef nonnull align 4 dereferenceable(8) [[A]], <2 x float> [[TMP0]])
+// CHECK-DEBSTRICT-NEXT:    store <2 x float> [[CALL]], ptr [[RETVAL]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-DEBSTRICT-NEXT:    ret <2 x float> [[TMP1]]
+//
+// CHECK-FAST-LABEL: define dso_local <2 x float> @_Z6useAddv(
+// CHECK-FAST-SAME: ) #[[ATTR4:[0-9]+]] {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-FAST-NEXT:    [[A:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-FAST-NEXT:    [[B:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-FAST-NEXT:    [[AGG_TMP:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-FAST-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[A]], float noundef nofpclass(nan inf) 1.000000e+00, float noundef nofpclass(nan inf) 3.000000e+00)
+// CHECK-FAST-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[B]], float noundef nofpclass(nan inf) 2.000000e+00, float noundef nofpclass(nan inf) 4.000000e+00)
+// CHECK-FAST-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP]], ptr align 4 [[B]], i64 8, i1 false)
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[AGG_TMP]], align 4
+// CHECK-FAST-NEXT:    [[CALL:%.*]] = call fast <2 x float> @_ZNK9MyComplexplES_(ptr noundef nonnull align 4 dereferenceable(8) [[A]], <2 x float> [[TMP0]])
+// CHECK-FAST-NEXT:    store <2 x float> [[CALL]], ptr [[RETVAL]], align 4
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-FAST-NEXT:    ret <2 x float> [[TMP1]]
+//
+// CHECK-NOHONOR-LABEL: define dso_local <2 x float> @_Z6useAddv(
+// CHECK-NOHONOR-SAME: ) #[[ATTR4:[0-9]+]] {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-NOHONOR-NEXT:    [[A:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-NOHONOR-NEXT:    [[B:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-NOHONOR-NEXT:    [[AGG_TMP:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-NOHONOR-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[A]], float noundef nofpclass(nan inf) 1.000000e+00, float noundef nofpclass(nan inf) 3.000000e+00)
+// CHECK-NOHONOR-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[B]], float noundef nofpclass(nan inf) 2.000000e+00, float noundef nofpclass(nan inf) 4.000000e+00)
+// CHECK-NOHONOR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP]], ptr align 4 [[B]], i64 8, i1 false)
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[AGG_TMP]], align 4
+// CHECK-NOHONOR-NEXT:    [[CALL:%.*]] = call nnan ninf <2 x float> @_ZNK9MyComplexplES_(ptr noundef nonnull align 4 dereferenceable(8) [[A]], <2 x float> [[TMP0]])
+// CHECK-NOHONOR-NEXT:    store <2 x float> [[CALL]], ptr [[RETVAL]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-NOHONOR-NEXT:    ret <2 x float> [[TMP1]]
+//
 #pragma clang fp reassociate(on)
 struct MyComplex {
   float xx;
   float yy;
+// CHECK-DDEFAULT-LABEL: define linkonce_odr void @_ZN9MyComplexC1Eff(
+// CHECK-DDEFAULT-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], float noundef [[X:%.*]], float noundef [[Y:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DDEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DDEFAULT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DDEFAULT-NEXT:    call void @_ZN9MyComplexC2Eff(ptr noundef nonnull align 4 dereferenceable(8) [[THIS1]], float noundef [[TMP0]], float noundef [[TMP1]])
+// CHECK-DDEFAULT-NEXT:    ret void
+//
+// CHECK-DEBSTRICT-LABEL: define linkonce_odr void @_ZN9MyComplexC1Eff(
+// CHECK-DEBSTRICT-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], float noundef [[X:%.*]], float noundef [[Y:%.*]]) unnamed_addr #[[ATTR2]] comdat align 2 {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DEBSTRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEBSTRICT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEBSTRICT-NEXT:    call void @_ZN9MyComplexC2Eff(ptr noundef nonnull align 4 dereferenceable(8) [[THIS1]], float noundef [[TMP0]], float noundef [[TMP1]])
+// CHECK-DEBSTRICT-NEXT:    ret void
+//
+// CHECK-FAST-LABEL: define linkonce_odr void @_ZN9MyComplexC1Eff(
+// CHECK-FAST-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], float noundef nofpclass(nan inf) [[X:%.*]], float noundef nofpclass(nan inf) [[Y:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FAST-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FAST-NEXT:    call void @_ZN9MyComplexC2Eff(ptr noundef nonnull align 4 dereferenceable(8) [[THIS1]], float noundef nofpclass(nan inf) [[TMP0]], float noundef nofpclass(nan inf) [[TMP1]])
+// CHECK-FAST-NEXT:    ret void
+//
+// CHECK-NOHONOR-LABEL: define linkonce_odr void @_ZN9MyComplexC1Eff(
+// CHECK-NOHONOR-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], float noundef nofpclass(nan inf) [[X:%.*]], float noundef nofpclass(nan inf) [[Y:%.*]]) unnamed_addr #[[ATTR0]] comdat align 2 {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NOHONOR-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOHONOR-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NOHONOR-NEXT:    call void @_ZN9MyComplexC2Eff(ptr noundef nonnull align 4 dereferenceable(8) [[THIS1]], float noundef nofpclass(nan inf) [[TMP0]], float noundef nofpclass(nan inf) [[TMP1]])
+// CHECK-NOHONOR-NEXT:    ret void
+//
   MyComplex(float x, float y) {
     xx = x;
     yy = y;
   }
   MyComplex() {}
+// CHECK-DDEFAULT-LABEL: define linkonce_odr <2 x float> @_ZNK9MyComplexplES_(
+// CHECK-DDEFAULT-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], <2 x float> [[OTHER_COERCE:%.*]]) #[[ATTR4:[0-9]+]] comdat align 2 {
+// CHECK-DDEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DDEFAULT-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-DDEFAULT-NEXT:    [[OTHER:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DDEFAULT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DDEFAULT-NEXT:    store <2 x float> [[OTHER_COERCE]], ptr [[OTHER]], align 4
+// CHECK-DDEFAULT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DDEFAULT-NEXT:    [[XX:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 0
+// CHECK-DDEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[XX]], align 4
+// CHECK-DDEFAULT-NEXT:    [[XX2:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 0
+// CHECK-DDEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[XX2]], align 4
+// CHECK-DDEFAULT-NEXT:    [[ADD:%.*]] = fadd reassoc float [[TMP0]], [[TMP1]]
+// CHECK-DDEFAULT-NEXT:    [[YY:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 1
+// CHECK-DDEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[YY]], align 4
+// CHECK-DDEFAULT-NEXT:    [[YY3:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 1
+// CHECK-DDEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[YY3]], align 4
+// CHECK-DDEFAULT-NEXT:    [[ADD4:%.*]] = fadd reassoc float [[TMP2]], [[TMP3]]
+// CHECK-DDEFAULT-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[RETVAL]], float noundef [[ADD]], float noundef [[ADD4]])
+// CHECK-DDEFAULT-NEXT:    [[TMP4:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-DDEFAULT-NEXT:    ret <2 x float> [[TMP4]]
+//
+// CHECK-DEBSTRICT-LABEL: define linkonce_odr <2 x float> @_ZNK9MyComplexplES_(
+// CHECK-DEBSTRICT-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], <2 x float> [[OTHER_COERCE:%.*]]) #[[ATTR4:[0-9]+]] comdat align 2 {
+// CHECK-DEBSTRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEBSTRICT-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[OTHER:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-DEBSTRICT-NEXT:    store <2 x float> [[OTHER_COERCE]], ptr [[OTHER]], align 4
+// CHECK-DEBSTRICT-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-DEBSTRICT-NEXT:    [[XX:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 0
+// CHECK-DEBSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[XX]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[XX2:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 0
+// CHECK-DEBSTRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[XX2]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[ADD:%.*]] = fadd reassoc float [[TMP0]], [[TMP1]]
+// CHECK-DEBSTRICT-NEXT:    [[YY:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 1
+// CHECK-DEBSTRICT-NEXT:    [[TMP2:%.*]] = load float, ptr [[YY]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[YY3:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 1
+// CHECK-DEBSTRICT-NEXT:    [[TMP3:%.*]] = load float, ptr [[YY3]], align 4
+// CHECK-DEBSTRICT-NEXT:    [[ADD4:%.*]] = fadd reassoc float [[TMP2]], [[TMP3]]
+// CHECK-DEBSTRICT-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[RETVAL]], float noundef [[ADD]], float noundef [[ADD4]])
+// CHECK-DEBSTRICT-NEXT:    [[TMP4:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-DEBSTRICT-NEXT:    ret <2 x float> [[TMP4]]
+//
+// CHECK-FAST-LABEL: define linkonce_odr <2 x float> @_ZNK9MyComplexplES_(
+// CHECK-FAST-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], <2 x float> [[OTHER_COERCE:%.*]]) #[[ATTR4:[0-9]+]] comdat align 2 {
+// CHECK-FAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FAST-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-FAST-NEXT:    [[OTHER:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-FAST-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    store <2 x float> [[OTHER_COERCE]], ptr [[OTHER]], align 4
+// CHECK-FAST-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-FAST-NEXT:    [[XX:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 0
+// CHECK-FAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[XX]], align 4
+// CHECK-FAST-NEXT:    [[XX2:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 0
+// CHECK-FAST-NEXT:    [[TMP1:%.*]] = load float, ptr [[XX2]], align 4
+// CHECK-FAST-NEXT:    [[ADD:%.*]] = fadd reassoc float [[TMP0]], [[TMP1]]
+// CHECK-FAST-NEXT:    [[YY:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 1
+// CHECK-FAST-NEXT:    [[TMP2:%.*]] = load float, ptr [[YY]], align 4
+// CHECK-FAST-NEXT:    [[YY3:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 1
+// CHECK-FAST-NEXT:    [[TMP3:%.*]] = load float, ptr [[YY3]], align 4
+// CHECK-FAST-NEXT:    [[ADD4:%.*]] = fadd reassoc float [[TMP2]], [[TMP3]]
+// CHECK-FAST-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[RETVAL]], float noundef nofpclass(nan inf) [[ADD]], float noundef nofpclass(nan inf) [[ADD4]])
+// CHECK-FAST-NEXT:    [[TMP4:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-FAST-NEXT:    ret <2 x float> [[TMP4]]
+//
+// CHECK-NOHONOR-LABEL: define linkonce_odr <2 x float> @_ZNK9MyComplexplES_(
+// CHECK-NOHONOR-SAME: ptr noundef nonnull align 4 dereferenceable(8) [[THIS:%.*]], <2 x float> [[OTHER_COERCE:%.*]]) #[[ATTR4:[0-9]+]] comdat align 2 {
+// CHECK-NOHONOR-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOHONOR-NEXT:    [[RETVAL:%.*]] = alloca [[STRUCT_MYCOMPLEX:%.*]], align 4
+// CHECK-NOHONOR-NEXT:    [[OTHER:%.*]] = alloca [[STRUCT_MYCOMPLEX]], align 4
+// CHECK-NOHONOR-NEXT:    [[THIS_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NOHONOR-NEXT:    store <2 x float> [[OTHER_COERCE]], ptr [[OTHER]], align 4
+// CHECK-NOHONOR-NEXT:    store ptr [[THIS]], ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    [[THIS1:%.*]] = load ptr, ptr [[THIS_ADDR]], align 8
+// CHECK-NOHONOR-NEXT:    [[XX:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 0
+// CHECK-NOHONOR-NEXT:    [[TMP0:%.*]] = load float, ptr [[XX]], align 4
+// CHECK-NOHONOR-NEXT:    [[XX2:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 0
+// CHECK-NOHONOR-NEXT:    [[TMP1:%.*]] = load float, ptr [[XX2]], align 4
+// CHECK-NOHONOR-NEXT:    [[ADD:%.*]] = fadd reassoc float [[TMP0]], [[TMP1]]
+// CHECK-NOHONOR-NEXT:    [[YY:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[THIS1]], i32 0, i32 1
+// CHECK-NOHONOR-NEXT:    [[TMP2:%.*]] = load float, ptr [[YY]], align 4
+// CHECK-NOHONOR-NEXT:    [[YY3:%.*]] = getelementptr inbounds nuw [[STRUCT_MYCOMPLEX]], ptr [[OTHER]], i32 0, i32 1
+// CHECK-NOHONOR-NEXT:    [[TMP3:%.*]] = load float, ptr [[YY3]], align 4
+// CHECK-NOHONOR-NEXT:    [[ADD4:%.*]] = fadd reassoc float [[TMP2]], [[TMP3]]
+// CHECK-NOHONOR-NEXT:    call void @_ZN9MyComplexC1Eff(ptr noundef nonnull align 4 dereferenceable(8) [[RETVAL]], float noundef nofpclass(nan inf) [[ADD]], float noundef nofpclass(nan inf) [[ADD4]])
+// CHECK-NOHONOR-NEXT:    [[TMP4:%.*]] = load <2 x float>, ptr [[RETVAL]], align 4
+// CHECK-NOHONOR-NEXT:    ret <2 x float> [[TMP4]]
+//
   const MyComplex operator+(const MyComplex other) const {
-// CHECK-LABEL: define {{.*}} @_ZNK9MyComplexplES_
-// CHECK: fadd reassoc float
-// CHECK: fadd reassoc float
     return MyComplex(xx + other.xx, yy + other.yy);
   }
 };
@@ -278,12 +901,3 @@ MyComplex useAdd() {
   MyComplex b (2, 4);
    return a + b;
 }
-
-// CHECK-DDEFAULT: Function Attrs: noinline nounwind{{$$}}
-// CHECK-DEBSTRICT: Function Attrs: noinline nounwind strictfp{{$$}}
-// CHECK-FAST: Function Attrs: noinline nounwind{{$$}}
-// CHECK-NOHONOR: Function Attrs: noinline nounwind{{$$}}
-// CHECK-LABEL: define{{.*}} @_GLOBAL__sub_I_fp_floatcontrol_stack
-
-// CHECK-DEBSTRICT: {{[ ]}}strictfp{{[ ]}}
-// CHECK-DEBSTRICT-NOT: "strictfp"
diff --git a/clang/test/CodeGen/fp-strictfp-exp.cpp b/clang/test/CodeGen/fp-strictfp-exp.cpp
index bca56f166659f..cf9485ccc2a39 100644
--- a/clang/test/CodeGen/fp-strictfp-exp.cpp
+++ b/clang/test/CodeGen/fp-strictfp-exp.cpp
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple mips64-linux-gnu -fexperimental-strict-floating-point -frounding-math -ffp-exception-behavior=strict -O2 -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple mips64-linux-gnu -fexperimental-strict-floating-point -ffp-exception-behavior=strict -O2 -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple mips64-linux-gnu -fexperimental-strict-floating-point -frounding-math -O2 -emit-llvm -o - %s | FileCheck %s
@@ -7,8 +8,7 @@
 // in this test will need to change.
 
 float fp_precise_1(float a, float b, float c) {
-// CHECK-LABEL: define{{.*}} float @_Z12fp_precise_1fff
-// CHECK: %[[M:.+]] = tail call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, metadata {{.*}})
-// CHECK: tail call float @llvm.experimental.constrained.fadd.f32(float %[[M]], float %c, metadata {{.*}})
   return a * b + c;
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
diff --git a/clang/test/CodeGen/fp-template.cpp b/clang/test/CodeGen/fp-template.cpp
index a945b23fff109..0ee9ae86a90df 100644
--- a/clang/test/CodeGen/fp-template.cpp
+++ b/clang/test/CodeGen/fp-template.cpp
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -fdelayed-template-parsing -o - %s | FileCheck %s
 
@@ -7,13 +8,22 @@ T templ_01(T x, T y) {
   return x + y;
 }
 
+// CHECK-LABEL: define dso_local noundef float @_Z7func_01ff(
+// CHECK-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z8templ_01IfET_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]])
+// CHECK-NEXT:    ret float [[CALL]]
+//
 float func_01(float x, float y) {
   return templ_01(x, y);
 }
 
-// CHECK-LABEL: define {{.*}} @_Z8templ_01IfET_S0_S0_
-// CHECK-SAME:  (float noundef %{{.*}}, float noundef %{{.*}}) #[[ATTR01:[0-9]+]]{{.*}} {
-// CHECK:       call float @llvm.experimental.constrained.fadd.f32
 
 
 template <typename Ty>
@@ -30,19 +40,39 @@ Ty templ_03(Ty x, Ty y) {
 
 #pragma STDC FENV_ROUND FE_TONEAREST
 
+// CHECK-LABEL: define dso_local noundef float @_Z7func_02ff(
+// CHECK-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z8templ_02IfET_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]])
+// CHECK-NEXT:    ret float [[CALL]]
+//
 float func_02(float x, float y) {
   return templ_02(x, y);
 }
 
-// CHECK-LABEL: define {{.*}} float @_Z8templ_02IfET_S0_S0_
-// CHECK:       %add = fadd float %0, %1
 
+// CHECK-LABEL: define dso_local noundef float @_Z7func_03ff(
+// CHECK-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_Z8templ_03IfET_S0_S0_(float noundef [[TMP0]], float noundef [[TMP1]])
+// CHECK-NEXT:    ret float [[CALL]]
+//
 float func_03(float x, float y) {
   return templ_03(x, y);
 }
 
-// CHECK-LABEL: define {{.*}} float @_Z8templ_03IfET_S0_S0_
-// CHECK:       call float @llvm.experimental.constrained.fsub.f32({{.*}}, metadata !"round.upward", metadata !"fpexcept.ignore")
 
 
 #pragma STDC FENV_ROUND FE_TONEAREST
@@ -53,11 +83,15 @@ namespace PR63542 {
     stable_sort(x, int());
     return result;
   }
+// CHECK-LABEL: define dso_local noundef float @_ZN7PR6354212linkage_wrapEv(
+// CHECK-SAME: ) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[CALL:%.*]] = call noundef float @_ZN7PR6354211stable_sortIiEEffT_(float noundef 0.000000e+00, i32 noundef 1)
+// CHECK-NEXT:    ret float [[CALL]]
+//
   float linkage_wrap() { return stable_sort(0.0, 1); }
 }
 
-// CHECK-LABEL: define {{.*}} float @_ZN7PR6354211stable_sortIiEEffT_(
-// CHECK:         fadd float
 
 // These pragmas set non-default FP environment before delayed parsing occurs.
 // It is used to check that the parsing uses FP options defined by command line
@@ -65,4 +99,3 @@ namespace PR63542 {
 #pragma STDC FENV_ROUND FE_TOWARDZERO
 #pragma STDC FENV_ACCESS ON
 
-// CHECK: attributes #[[ATTR01]] = { {{.*}}strictfp
diff --git a/clang/test/CodeGen/fpconstrained-cmp-double.c b/clang/test/CodeGen/fpconstrained-cmp-double.c
index 83446fc10595e..dbac07b648335 100644
--- a/clang/test/CodeGen/fpconstrained-cmp-double.c
+++ b/clang/test/CodeGen/fpconstrained-cmp-double.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -ffp-exception-behavior=ignore -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=FCMP
 // RUN: %clang_cc1 -ffp-exception-behavior=strict -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=EXCEPT
 // RUN: %clang_cc1 -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=MAYTRAP
@@ -5,147 +6,367 @@
 // RUN: %clang_cc1 -frounding-math -ffp-exception-behavior=strict -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=EXCEPT
 // RUN: %clang_cc1 -frounding-math -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=MAYTRAP
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietEqual(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0:[0-9]+]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp oeq double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietEqual(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0:[0-9]+]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"oeq") #[[ATTR2:[0-9]+]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietEqual(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietEqual(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp oeq double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"oeq", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"oeq", metadata !"fpexcept.maytrap")
   return f1 == f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietNotEqual(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp une double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietNotEqual(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"une") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietNotEqual(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietNotEqual(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp une double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"une", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"une", metadata !"fpexcept.maytrap")
   return f1 != f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingLess(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp olt double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingLess(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"olt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingLess(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingLess(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp olt double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"olt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"olt", metadata !"fpexcept.maytrap")
   return f1 < f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingLessEqual(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ole double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingLessEqual(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ole") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingLessEqual(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingLessEqual(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp ole double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"ole", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"ole", metadata !"fpexcept.maytrap")
   return f1 <= f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingGreater(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ogt double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingGreater(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ogt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingGreater(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingGreater(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp ogt double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"ogt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"ogt", metadata !"fpexcept.maytrap")
   return f1 > f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingGreaterEqual(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp oge double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingGreaterEqual(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"oge") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingGreaterEqual(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingGreaterEqual(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp oge double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"oge", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f64(double %{{.*}}, double %{{.*}}, metadata !"oge", metadata !"fpexcept.maytrap")
   return f1 >= f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietLess(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp olt double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietLess(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"olt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietLess(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietLess(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp olt double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"olt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"olt", metadata !"fpexcept.maytrap")
   return __builtin_isless(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietLessEqual(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ole double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietLessEqual(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ole") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietLessEqual(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietLessEqual(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp ole double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"ole", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"ole", metadata !"fpexcept.maytrap")
   return __builtin_islessequal(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietGreater(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ogt double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietGreater(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"ogt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietGreater(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietGreater(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp ogt double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"ogt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"ogt", metadata !"fpexcept.maytrap")
   return __builtin_isgreater(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietGreaterEqual(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp oge double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietGreaterEqual(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"oge") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietGreaterEqual(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietGreaterEqual(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp oge double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"oge", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"oge", metadata !"fpexcept.maytrap")
   return __builtin_isgreaterequal(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietLessGreater(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp one double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietLessGreater(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"one") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietLessGreater(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietLessGreater(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp one double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"one", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"one", metadata !"fpexcept.maytrap")
   return __builtin_islessgreater(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietUnordered(
+// FCMP-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// FCMP-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// FCMP-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp uno double [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietUnordered(
+// IGNORE-SAME: double noundef [[F1:%.*]], double noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca double, align 8
+// IGNORE-NEXT:    store double [[F1]], ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    store double [[F2]], ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP0:%.*]] = load double, ptr [[F1_ADDR]], align 8
+// IGNORE-NEXT:    [[TMP1:%.*]] = load double, ptr [[F2_ADDR]], align 8
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP1]], metadata !"uno") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietUnordered(double f1, double f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietUnordered(double noundef %f1, double noundef %f2)
 
-  // FCMP: fcmp uno double %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"uno", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f64(double %{{.*}}, double %{{.*}}, metadata !"uno", metadata !"fpexcept.maytrap")
   return __builtin_isunordered(f1, f2);
 
-  // CHECK: ret
 }
 
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
+// EXCEPT: {{.*}}
+// MAYTRAP: {{.*}}
diff --git a/clang/test/CodeGen/fpconstrained-cmp-float.c b/clang/test/CodeGen/fpconstrained-cmp-float.c
index 0854774d840e8..f26b069f28357 100644
--- a/clang/test/CodeGen/fpconstrained-cmp-float.c
+++ b/clang/test/CodeGen/fpconstrained-cmp-float.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -ffp-exception-behavior=ignore -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=FCMP
 // RUN: %clang_cc1 -ffp-exception-behavior=strict -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=EXCEPT
 // RUN: %clang_cc1 -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=MAYTRAP
@@ -5,147 +6,367 @@
 // RUN: %clang_cc1 -frounding-math -ffp-exception-behavior=strict -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=EXCEPT
 // RUN: %clang_cc1 -frounding-math -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK -check-prefix=MAYTRAP
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietEqual(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0:[0-9]+]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp oeq float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietEqual(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0:[0-9]+]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"oeq") #[[ATTR2:[0-9]+]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietEqual(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietEqual(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp oeq float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.maytrap")
   return f1 == f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietNotEqual(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp une float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietNotEqual(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"une") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietNotEqual(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietNotEqual(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp une float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.maytrap")
   return f1 != f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingLess(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp olt float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingLess(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"olt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingLess(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingLess(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp olt float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.maytrap")
   return f1 < f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingLessEqual(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ole float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingLessEqual(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"ole") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingLessEqual(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingLessEqual(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp ole float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.maytrap")
   return f1 <= f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingGreater(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ogt float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingGreater(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"ogt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingGreater(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingGreater(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp ogt float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.maytrap")
   return f1 > f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @SignalingGreaterEqual(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp oge float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @SignalingGreaterEqual(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"oge") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool SignalingGreaterEqual(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @SignalingGreaterEqual(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp oge float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.maytrap")
   return f1 >= f2;
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietLess(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp olt float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietLess(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"olt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietLess(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietLess(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp olt float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.maytrap")
   return __builtin_isless(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietLessEqual(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ole float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietLessEqual(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"ole") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietLessEqual(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietLessEqual(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp ole float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.maytrap")
   return __builtin_islessequal(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietGreater(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp ogt float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietGreater(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"ogt") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietGreater(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietGreater(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp ogt float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.maytrap")
   return __builtin_isgreater(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietGreaterEqual(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp oge float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietGreaterEqual(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"oge") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietGreaterEqual(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietGreaterEqual(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp oge float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.maytrap")
   return __builtin_isgreaterequal(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietLessGreater(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp one float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietLessGreater(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"one") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietLessGreater(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietLessGreater(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp one float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"one", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"one", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"one", metadata !"fpexcept.maytrap")
   return __builtin_islessgreater(f1, f2);
 
-  // CHECK: ret
 }
 
+// FCMP-LABEL: define dso_local zeroext i1 @QuietUnordered(
+// FCMP-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// FCMP-NEXT:  [[ENTRY:.*:]]
+// FCMP-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// FCMP-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// FCMP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// FCMP-NEXT:    [[CMP:%.*]] = fcmp uno float [[TMP0]], [[TMP1]]
+// FCMP-NEXT:    ret i1 [[CMP]]
+//
+// IGNORE-LABEL: define dso_local zeroext i1 @QuietUnordered(
+// IGNORE-SAME: float noundef [[F1:%.*]], float noundef [[F2:%.*]]) #[[ATTR0]] {
+// IGNORE-NEXT:  [[ENTRY:.*:]]
+// IGNORE-NEXT:    [[F1_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    [[F2_ADDR:%.*]] = alloca float, align 4
+// IGNORE-NEXT:    store float [[F1]], ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    store float [[F2]], ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP0:%.*]] = load float, ptr [[F1_ADDR]], align 4
+// IGNORE-NEXT:    [[TMP1:%.*]] = load float, ptr [[F2_ADDR]], align 4
+// IGNORE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP0]], float [[TMP1]], metadata !"uno") #[[ATTR2]] [ "fp.except"(metadata !"ignore") ]
+// IGNORE-NEXT:    ret i1 [[CMP]]
+//
 _Bool QuietUnordered(float f1, float f2) {
-  // CHECK-LABEL: define {{.*}}i1 @QuietUnordered(float noundef %f1, float noundef %f2)
 
-  // FCMP: fcmp uno float %{{.*}}, %{{.*}}
-  // IGNORE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"uno", metadata !"fpexcept.ignore")
-  // EXCEPT: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"uno", metadata !"fpexcept.strict")
-  // MAYTRAP: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"uno", metadata !"fpexcept.maytrap")
   return __builtin_isunordered(f1, f2);
 
-  // CHECK: ret
 }
 
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
+// EXCEPT: {{.*}}
+// MAYTRAP: {{.*}}
diff --git a/clang/test/CodeGen/fpconstrained.c b/clang/test/CodeGen/fpconstrained.c
index 97a5d23449a15..89fe538e1ac3d 100644
--- a/clang/test/CodeGen/fpconstrained.c
+++ b/clang/test/CodeGen/fpconstrained.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -frounding-math -ffp-exception-behavior=strict -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=FPMODELSTRICT
 // RUN: %clang_cc1 -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s -check-prefix=PRECISE
 // RUN: %clang_cc1 -ffast-math -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s -check-prefix=FAST
@@ -8,18 +9,60 @@
 
 float f0, f1, f2;
 
+// FPMODELSTRICT-LABEL: define dso_local void @foo(
+// FPMODELSTRICT-SAME: ) #[[ATTR0:[0-9]+]] {
+// FPMODELSTRICT-NEXT:  [[ENTRY:.*:]]
+// FPMODELSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr @f1, align 4
+// FPMODELSTRICT-NEXT:    [[TMP1:%.*]] = load float, ptr @f2, align 4
+// FPMODELSTRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// FPMODELSTRICT-NEXT:    store float [[ADD]], ptr @f0, align 4
+// FPMODELSTRICT-NEXT:    ret void
+//
+// PRECISE-LABEL: define dso_local void @foo(
+// PRECISE-SAME: ) #[[ATTR0:[0-9]+]] {
+// PRECISE-NEXT:  [[ENTRY:.*:]]
+// PRECISE-NEXT:    [[TMP0:%.*]] = load float, ptr @f1, align 4
+// PRECISE-NEXT:    [[TMP1:%.*]] = load float, ptr @f2, align 4
+// PRECISE-NEXT:    [[ADD:%.*]] = fadd contract float [[TMP0]], [[TMP1]]
+// PRECISE-NEXT:    store float [[ADD]], ptr @f0, align 4
+// PRECISE-NEXT:    ret void
+//
+// FAST-LABEL: define dso_local void @foo(
+// FAST-SAME: ) #[[ATTR0:[0-9]+]] {
+// FAST-NEXT:  [[ENTRY:.*:]]
+// FAST-NEXT:    [[TMP0:%.*]] = load float, ptr @f1, align 4
+// FAST-NEXT:    [[TMP1:%.*]] = load float, ptr @f2, align 4
+// FAST-NEXT:    [[ADD:%.*]] = fadd fast float [[TMP0]], [[TMP1]]
+// FAST-NEXT:    store float [[ADD]], ptr @f0, align 4
+// FAST-NEXT:    ret void
+//
+// FASTNOCONTRACT-LABEL: define dso_local void @foo(
+// FASTNOCONTRACT-SAME: ) #[[ATTR0:[0-9]+]] {
+// FASTNOCONTRACT-NEXT:  [[ENTRY:.*:]]
+// FASTNOCONTRACT-NEXT:    [[TMP0:%.*]] = load float, ptr @f1, align 4
+// FASTNOCONTRACT-NEXT:    [[TMP1:%.*]] = load float, ptr @f2, align 4
+// FASTNOCONTRACT-NEXT:    [[ADD:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[TMP0]], [[TMP1]]
+// FASTNOCONTRACT-NEXT:    store float [[ADD]], ptr @f0, align 4
+// FASTNOCONTRACT-NEXT:    ret void
+//
+// EXCEPT-LABEL: define dso_local void @foo(
+// EXCEPT-SAME: ) #[[ATTR0:[0-9]+]] {
+// EXCEPT-NEXT:  [[ENTRY:.*:]]
+// EXCEPT-NEXT:    [[TMP0:%.*]] = load float, ptr @f1, align 4
+// EXCEPT-NEXT:    [[TMP1:%.*]] = load float, ptr @f2, align 4
+// EXCEPT-NEXT:    [[ADD:%.*]] = call fast float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// EXCEPT-NEXT:    store float [[ADD]], ptr @f0, align 4
+// EXCEPT-NEXT:    ret void
+//
+// MAYTRAP-LABEL: define dso_local void @foo(
+// MAYTRAP-SAME: ) #[[ATTR0:[0-9]+]] {
+// MAYTRAP-NEXT:  [[ENTRY:.*:]]
+// MAYTRAP-NEXT:    [[TMP0:%.*]] = load float, ptr @f1, align 4
+// MAYTRAP-NEXT:    [[TMP1:%.*]] = load float, ptr @f2, align 4
+// MAYTRAP-NEXT:    [[ADD:%.*]] = call fast float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// MAYTRAP-NEXT:    store float [[ADD]], ptr @f0, align 4
+// MAYTRAP-NEXT:    ret void
+//
 void foo(void) {
-  // CHECK-LABEL: define {{.*}}void @foo()
-
-  // MAYTRAP: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-  // EXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // FPMODELSTRICT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-  // STRICTEXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-  // STRICTNOEXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-  // PRECISE: fadd contract float %{{.*}}, %{{.*}}
-  // FAST: fadd fast
-  // FASTNOCONTRACT: fadd reassoc nnan ninf nsz arcp afn float
   f0 = f1 + f2;
-
-  // CHECK: ret
 }
diff --git a/clang/test/CodeGen/fpconstrained.cpp b/clang/test/CodeGen/fpconstrained.cpp
index 222a0989cf6ec..29d910cb9617d 100644
--- a/clang/test/CodeGen/fpconstrained.cpp
+++ b/clang/test/CodeGen/fpconstrained.cpp
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -x c++ -fexceptions -fcxx-exceptions -frounding-math -ffp-exception-behavior=strict -fexperimental-strict-floating-point -emit-llvm -o - %s | FileCheck %s -check-prefix=FPMODELSTRICT
 // RUN: %clang_cc1 -x c++ -ffp-contract=fast -fexceptions -fcxx-exceptions -emit-llvm -o - %s | FileCheck %s -check-prefix=PRECISE
 // RUN: %clang_cc1 -x c++ -ffast-math -fexceptions -fcxx-exceptions -ffp-contract=fast -emit-llvm -o - %s | FileCheck %s -check-prefix=FAST
@@ -21,14 +22,8 @@ float f0, f1, f2;
   // CHECK-LABEL: define {{.*}}void @_ZN4aaaaIiED2Ev{{.*}}
 
   } catch (...) {
-    // MAYTRAP: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
-    // EXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-    // FPMODELSTRICT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
     // STRICTEXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
     // STRICTNOEXCEPT: llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-    // PRECISE: fadd contract float %{{.*}}, %{{.*}}
-    // FAST: fadd fast
-    // FASTNOCONTRACT: fadd reassoc nnan ninf nsz arcp afn float
     f0 = f1 + f2;
 
     // CHECK: ret void
@@ -41,6 +36,72 @@ float f0, f1, f2;
     aaaa<int> e;
   };
 
+// FPMODELSTRICT-LABEL: define dso_local noundef float @_Z3foov(
+// FPMODELSTRICT-SAME: ) #[[ATTR0:[0-9]+]] {
+// FPMODELSTRICT-NEXT:  [[ENTRY:.*:]]
+// FPMODELSTRICT-NEXT:    [[X:%.*]] = alloca [[CLASS_D:%.*]], align 1
+// FPMODELSTRICT-NEXT:    [[A:%.*]] = alloca [[CLASS_AAAA:%.*]], align 1
+// FPMODELSTRICT-NEXT:    call void @_ZN1dC1EPKci(ptr noundef nonnull align 1 dereferenceable(1) [[X]], ptr noundef @.str, i32 noundef 1) #[[ATTR4:[0-9]+]]
+// FPMODELSTRICT-NEXT:    [[TMP0:%.*]] = load float, ptr @f0, align 4
+// FPMODELSTRICT-NEXT:    call void @_ZN4aaaaIiED1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR5:[0-9]+]]
+// FPMODELSTRICT-NEXT:    call void @_ZN1dD1Ev(ptr noundef nonnull align 1 dead_on_return(1) dereferenceable(1) [[X]]) #[[ATTR5]]
+// FPMODELSTRICT-NEXT:    ret float [[TMP0]]
+//
+// PRECISE-LABEL: define dso_local noundef float @_Z3foov(
+// PRECISE-SAME: ) #[[ATTR0:[0-9]+]] {
+// PRECISE-NEXT:  [[ENTRY:.*:]]
+// PRECISE-NEXT:    [[X:%.*]] = alloca [[CLASS_D:%.*]], align 1
+// PRECISE-NEXT:    [[A:%.*]] = alloca [[CLASS_AAAA:%.*]], align 1
+// PRECISE-NEXT:    call void @_ZN1dC1EPKci(ptr noundef nonnull align 1 dereferenceable(1) [[X]], ptr noundef @.str, i32 noundef 1) #[[ATTR4:[0-9]+]]
+// PRECISE-NEXT:    [[TMP0:%.*]] = load float, ptr @f0, align 4
+// PRECISE-NEXT:    call void @_ZN4aaaaIiED1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR5:[0-9]+]]
+// PRECISE-NEXT:    call void @_ZN1dD1Ev(ptr noundef nonnull align 1 dead_on_return(1) dereferenceable(1) [[X]]) #[[ATTR5]]
+// PRECISE-NEXT:    ret float [[TMP0]]
+//
+// FAST-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z3foov(
+// FAST-SAME: ) #[[ATTR0:[0-9]+]] {
+// FAST-NEXT:  [[ENTRY:.*:]]
+// FAST-NEXT:    [[X:%.*]] = alloca [[CLASS_D:%.*]], align 1
+// FAST-NEXT:    [[A:%.*]] = alloca [[CLASS_AAAA:%.*]], align 1
+// FAST-NEXT:    call void @_ZN1dC1EPKci(ptr noundef nonnull align 1 dereferenceable(1) [[X]], ptr noundef @.str, i32 noundef 1) #[[ATTR4:[0-9]+]]
+// FAST-NEXT:    [[TMP0:%.*]] = load float, ptr @f0, align 4
+// FAST-NEXT:    call void @_ZN4aaaaIiED1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR5:[0-9]+]]
+// FAST-NEXT:    call void @_ZN1dD1Ev(ptr noundef nonnull align 1 dead_on_return(1) dereferenceable(1) [[X]]) #[[ATTR5]]
+// FAST-NEXT:    ret float [[TMP0]]
+//
+// FASTNOCONTRACT-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z3foov(
+// FASTNOCONTRACT-SAME: ) #[[ATTR0:[0-9]+]] {
+// FASTNOCONTRACT-NEXT:  [[ENTRY:.*:]]
+// FASTNOCONTRACT-NEXT:    [[X:%.*]] = alloca [[CLASS_D:%.*]], align 1
+// FASTNOCONTRACT-NEXT:    [[A:%.*]] = alloca [[CLASS_AAAA:%.*]], align 1
+// FASTNOCONTRACT-NEXT:    call void @_ZN1dC1EPKci(ptr noundef nonnull align 1 dereferenceable(1) [[X]], ptr noundef @.str, i32 noundef 1) #[[ATTR4:[0-9]+]]
+// FASTNOCONTRACT-NEXT:    [[TMP0:%.*]] = load float, ptr @f0, align 4
+// FASTNOCONTRACT-NEXT:    call void @_ZN4aaaaIiED1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR5:[0-9]+]]
+// FASTNOCONTRACT-NEXT:    call void @_ZN1dD1Ev(ptr noundef nonnull align 1 dead_on_return(1) dereferenceable(1) [[X]]) #[[ATTR5]]
+// FASTNOCONTRACT-NEXT:    ret float [[TMP0]]
+//
+// EXCEPT-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z3foov(
+// EXCEPT-SAME: ) #[[ATTR0:[0-9]+]] {
+// EXCEPT-NEXT:  [[ENTRY:.*:]]
+// EXCEPT-NEXT:    [[X:%.*]] = alloca [[CLASS_D:%.*]], align 1
+// EXCEPT-NEXT:    [[A:%.*]] = alloca [[CLASS_AAAA:%.*]], align 1
+// EXCEPT-NEXT:    call void @_ZN1dC1EPKci(ptr noundef nonnull align 1 dereferenceable(1) [[X]], ptr noundef @.str, i32 noundef 1) #[[ATTR5:[0-9]+]]
+// EXCEPT-NEXT:    [[TMP0:%.*]] = load float, ptr @f0, align 4
+// EXCEPT-NEXT:    call void @_ZN4aaaaIiED1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR6:[0-9]+]]
+// EXCEPT-NEXT:    call void @_ZN1dD1Ev(ptr noundef nonnull align 1 dead_on_return(1) dereferenceable(1) [[X]]) #[[ATTR6]]
+// EXCEPT-NEXT:    ret float [[TMP0]]
+//
+// MAYTRAP-LABEL: define dso_local noundef nofpclass(nan inf) float @_Z3foov(
+// MAYTRAP-SAME: ) #[[ATTR0:[0-9]+]] {
+// MAYTRAP-NEXT:  [[ENTRY:.*:]]
+// MAYTRAP-NEXT:    [[X:%.*]] = alloca [[CLASS_D:%.*]], align 1
+// MAYTRAP-NEXT:    [[A:%.*]] = alloca [[CLASS_AAAA:%.*]], align 1
+// MAYTRAP-NEXT:    call void @_ZN1dC1EPKci(ptr noundef nonnull align 1 dereferenceable(1) [[X]], ptr noundef @.str, i32 noundef 1) #[[ATTR5:[0-9]+]]
+// MAYTRAP-NEXT:    [[TMP0:%.*]] = load float, ptr @f0, align 4
+// MAYTRAP-NEXT:    call void @_ZN4aaaaIiED1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR6:[0-9]+]]
+// MAYTRAP-NEXT:    call void @_ZN1dD1Ev(ptr noundef nonnull align 1 dead_on_return(1) dereferenceable(1) [[X]]) #[[ATTR6]]
+// MAYTRAP-NEXT:    ret float [[TMP0]]
+//
 float foo() {
   d x("", 1);
   aaaa<int> a;
diff --git a/clang/test/CodeGen/math-libcalls.c b/clang/test/CodeGen/math-libcalls.c
index d4cd6f86b3c51..ce9c35d46c961 100644
--- a/clang/test/CodeGen/math-libcalls.c
+++ b/clang/test/CodeGen/math-libcalls.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown -Wno-implicit-function-declaration -w -o - -emit-llvm %s | FileCheck %s --check-prefixes=COMMON,NO__ERRNO
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown -Wno-implicit-function-declaration -w -o - -emit-llvm -fmath-errno %s | FileCheck %s --check-prefixes=COMMON,HAS_ERRNO
 // RUN: %clang_cc1 -triple x86_64-unknown-unknown -Wno-implicit-function-declaration -w -o - -emit-llvm -disable-llvm-passes -O2 %s | FileCheck %s --check-prefixes=COMMON,NO__ERRNO
@@ -8,742 +9,2061 @@
 
 // Test attributes and builtin codegen of math library calls.
 
+// HAS_MAYTRAP-LABEL: define dso_local void @foo(
+// HAS_MAYTRAP-SAME: ptr noundef [[D:%.*]], float noundef [[F:%.*]], ptr noundef [[FP:%.*]], ptr noundef [[L:%.*]], ptr noundef [[I:%.*]], ptr noundef [[C:%.*]]) #[[ATTR0:[0-9]+]] {
+// HAS_MAYTRAP-NEXT:  [[ENTRY:.*:]]
+// HAS_MAYTRAP-NEXT:    [[D_ADDR:%.*]] = alloca ptr, align 8
+// HAS_MAYTRAP-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// HAS_MAYTRAP-NEXT:    [[FP_ADDR:%.*]] = alloca ptr, align 8
+// HAS_MAYTRAP-NEXT:    [[L_ADDR:%.*]] = alloca ptr, align 8
+// HAS_MAYTRAP-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// HAS_MAYTRAP-NEXT:    [[C_ADDR:%.*]] = alloca ptr, align 8
+// HAS_MAYTRAP-NEXT:    store ptr [[D]], ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    store ptr [[FP]], ptr [[FP_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    store ptr [[L]], ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    store ptr [[C]], ptr [[C_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP0]]) #[[ATTR6:[0-9]+]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP1:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV1:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[FMOD:%.*]] = call double @llvm.frem.f64(double [[CONV]], double [[CONV1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CONV2:%.*]] = call float @llvm.fptrunc.f32.f64(double [[FMOD]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    store float [[CONV2]], ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP2:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP3:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[FMOD3:%.*]] = call float @llvm.frem.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    store float [[FMOD3]], ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP4:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV4:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP4]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP5:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV5:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP5]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[FMOD6:%.*]] = call x86_fp80 @llvm.frem.f80(x86_fp80 [[CONV4]], x86_fp80 [[CONV5]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CONV7:%.*]] = call float @llvm.fptrunc.f32.f80(x86_fp80 [[FMOD6]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    store float [[CONV7]], ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP6:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV8:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP6]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP7:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV9:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP7]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP8:%.*]] = call double @llvm.atan2.f64(double [[CONV8]], double [[CONV9]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP9:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP10:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP11:%.*]] = call float @llvm.atan2.f32(float [[TMP9]], float [[TMP10]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP12:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV10:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP12]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP13:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV11:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP13]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP14:%.*]] = call x86_fp80 @llvm.atan2.f80(x86_fp80 [[CONV10]], x86_fp80 [[CONV11]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP15:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV12:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP15]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP16:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV13:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP16]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP17:%.*]] = call double @llvm.copysign.f64(double [[CONV12]], double [[CONV13]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP18:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP19:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP20:%.*]] = call float @llvm.copysign.f32(float [[TMP18]], float [[TMP19]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP21:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV14:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP21]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP22:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV15:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP22]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP23:%.*]] = call x86_fp80 @llvm.copysign.f80(x86_fp80 [[CONV14]], x86_fp80 [[CONV15]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP24:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV16:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP24]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP25:%.*]] = call double @llvm.fabs.f64(double [[CONV16]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP26:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP27:%.*]] = call float @llvm.fabs.f32(float [[TMP26]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP28:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV17:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP28]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP29:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV17]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP30:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV18:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP30]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP31:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL:%.*]] = call double @frexp(double noundef [[CONV18]], ptr noundef [[TMP31]]) #[[ATTR7:[0-9]+]]
+// HAS_MAYTRAP-NEXT:    [[TMP32:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP33:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL19:%.*]] = call float @frexpf(float noundef [[TMP32]], ptr noundef [[TMP33]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP34:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV20:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP34]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP35:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL21:%.*]] = call x86_fp80 @frexpl(x86_fp80 noundef [[CONV20]], ptr noundef [[TMP35]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP36:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV22:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP36]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP37:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV23:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP37]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL24:%.*]] = call double @ldexp(double noundef [[CONV22]], i32 noundef [[CONV23]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP38:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP39:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV25:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP39]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL26:%.*]] = call float @ldexpf(float noundef [[TMP38]], i32 noundef [[CONV25]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP40:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV27:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP40]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP41:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV28:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP41]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL29:%.*]] = call x86_fp80 @ldexpl(x86_fp80 noundef [[CONV27]], i32 noundef [[CONV28]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP42:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV30:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP42]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP43:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL31:%.*]] = call double @modf(double noundef [[CONV30]], ptr noundef [[TMP43]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP44:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP45:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL32:%.*]] = call float @modff(float noundef [[TMP44]], ptr noundef [[TMP45]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP46:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV33:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP46]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP47:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL34:%.*]] = call x86_fp80 @modfl(x86_fp80 noundef [[CONV33]], ptr noundef [[TMP47]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP48:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL35:%.*]] = call double @nan(ptr noundef [[TMP48]]) #[[ATTR8:[0-9]+]]
+// HAS_MAYTRAP-NEXT:    [[TMP49:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL36:%.*]] = call float @nanf(ptr noundef [[TMP49]]) #[[ATTR8]]
+// HAS_MAYTRAP-NEXT:    [[TMP50:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL37:%.*]] = call x86_fp80 @nanl(ptr noundef [[TMP50]]) #[[ATTR8]]
+// HAS_MAYTRAP-NEXT:    [[TMP51:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV38:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP51]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP52:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV39:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP52]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP53:%.*]] = call double @llvm.pow.f64(double [[CONV38]], double [[CONV39]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP54:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP55:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP56:%.*]] = call float @llvm.pow.f32(float [[TMP54]], float [[TMP55]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP57:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV40:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP57]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP58:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV41:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP58]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP59:%.*]] = call x86_fp80 @llvm.pow.f80(x86_fp80 [[CONV40]], x86_fp80 [[CONV41]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP60:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV42:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP60]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP61:%.*]] = call double @llvm.acos.f64(double [[CONV42]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP62:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP63:%.*]] = call float @llvm.acos.f32(float [[TMP62]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP64:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV43:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP64]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP65:%.*]] = call x86_fp80 @llvm.acos.f80(x86_fp80 [[CONV43]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP66:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV44:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP66]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL45:%.*]] = call double @acosh(double noundef [[CONV44]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP67:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL46:%.*]] = call float @acoshf(float noundef [[TMP67]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP68:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV47:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP68]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL48:%.*]] = call x86_fp80 @acoshl(x86_fp80 noundef [[CONV47]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP69:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV49:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP69]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP70:%.*]] = call double @llvm.asin.f64(double [[CONV49]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP71:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP72:%.*]] = call float @llvm.asin.f32(float [[TMP71]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP73:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV50:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP73]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP74:%.*]] = call x86_fp80 @llvm.asin.f80(x86_fp80 [[CONV50]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP75:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV51:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP75]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL52:%.*]] = call double @asinh(double noundef [[CONV51]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP76:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL53:%.*]] = call float @asinhf(float noundef [[TMP76]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP77:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV54:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP77]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL55:%.*]] = call x86_fp80 @asinhl(x86_fp80 noundef [[CONV54]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP78:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV56:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP78]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP79:%.*]] = call double @llvm.atan.f64(double [[CONV56]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP80:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP81:%.*]] = call float @llvm.atan.f32(float [[TMP80]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP82:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV57:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP82]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP83:%.*]] = call x86_fp80 @llvm.atan.f80(x86_fp80 [[CONV57]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP84:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV58:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP84]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL59:%.*]] = call double @atanh(double noundef [[CONV58]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP85:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL60:%.*]] = call float @atanhf(float noundef [[TMP85]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP86:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV61:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP86]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL62:%.*]] = call x86_fp80 @atanhl(x86_fp80 noundef [[CONV61]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP87:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV63:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP87]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL64:%.*]] = call double @cbrt(double noundef [[CONV63]]) #[[ATTR9:[0-9]+]]
+// HAS_MAYTRAP-NEXT:    [[TMP88:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL65:%.*]] = call float @cbrtf(float noundef [[TMP88]]) #[[ATTR9]]
+// HAS_MAYTRAP-NEXT:    [[TMP89:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV66:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP89]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL67:%.*]] = call x86_fp80 @cbrtl(x86_fp80 noundef [[CONV66]]) #[[ATTR9]]
+// HAS_MAYTRAP-NEXT:    [[TMP90:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV68:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP90]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP91:%.*]] = call double @llvm.ceil.f64(double [[CONV68]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP92:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP93:%.*]] = call float @llvm.ceil.f32(float [[TMP92]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP94:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV69:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP94]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP95:%.*]] = call x86_fp80 @llvm.ceil.f80(x86_fp80 [[CONV69]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP96:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV70:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP96]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP97:%.*]] = call double @llvm.cos.f64(double [[CONV70]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP98:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP99:%.*]] = call float @llvm.cos.f32(float [[TMP98]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP100:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV71:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP100]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP101:%.*]] = call x86_fp80 @llvm.cos.f80(x86_fp80 [[CONV71]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP102:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV72:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP102]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP103:%.*]] = call double @llvm.cosh.f64(double [[CONV72]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP104:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP105:%.*]] = call float @llvm.cosh.f32(float [[TMP104]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP106:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV73:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP106]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP107:%.*]] = call x86_fp80 @llvm.cosh.f80(x86_fp80 [[CONV73]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP108:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV74:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP108]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL75:%.*]] = call double @erf(double noundef [[CONV74]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP109:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL76:%.*]] = call float @erff(float noundef [[TMP109]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP110:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV77:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP110]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL78:%.*]] = call x86_fp80 @erfl(x86_fp80 noundef [[CONV77]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP111:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV79:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP111]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL80:%.*]] = call double @erfc(double noundef [[CONV79]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP112:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL81:%.*]] = call float @erfcf(float noundef [[TMP112]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP113:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV82:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP113]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL83:%.*]] = call x86_fp80 @erfcl(x86_fp80 noundef [[CONV82]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP114:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV84:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP114]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP115:%.*]] = call double @llvm.exp.f64(double [[CONV84]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP116:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP117:%.*]] = call float @llvm.exp.f32(float [[TMP116]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP118:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV85:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP118]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP119:%.*]] = call x86_fp80 @llvm.exp.f80(x86_fp80 [[CONV85]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP120:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV86:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP120]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP121:%.*]] = call double @llvm.exp2.f64(double [[CONV86]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP122:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP123:%.*]] = call float @llvm.exp2.f32(float [[TMP122]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP124:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV87:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP124]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP125:%.*]] = call x86_fp80 @llvm.exp2.f80(x86_fp80 [[CONV87]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP126:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV88:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP126]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL89:%.*]] = call double @expm1(double noundef [[CONV88]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP127:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL90:%.*]] = call float @expm1f(float noundef [[TMP127]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP128:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV91:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP128]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL92:%.*]] = call x86_fp80 @expm1l(x86_fp80 noundef [[CONV91]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP129:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV93:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP129]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP130:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV94:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP130]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL95:%.*]] = call double @fdim(double noundef [[CONV93]], double noundef [[CONV94]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP131:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP132:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL96:%.*]] = call float @fdimf(float noundef [[TMP131]], float noundef [[TMP132]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP133:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV97:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP133]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP134:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV98:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP134]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL99:%.*]] = call x86_fp80 @fdiml(x86_fp80 noundef [[CONV97]], x86_fp80 noundef [[CONV98]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP135:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV100:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP135]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP136:%.*]] = call double @llvm.floor.f64(double [[CONV100]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP137:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP138:%.*]] = call float @llvm.floor.f32(float [[TMP137]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP139:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV101:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP139]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP140:%.*]] = call x86_fp80 @llvm.floor.f80(x86_fp80 [[CONV101]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP141:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV102:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP141]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP142:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV103:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP142]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP143:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV104:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP143]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP144:%.*]] = call double @llvm.fma.f64(double [[CONV102]], double [[CONV103]], double [[CONV104]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP145:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP146:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP147:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP148:%.*]] = call float @llvm.fma.f32(float [[TMP145]], float [[TMP146]], float [[TMP147]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP149:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV105:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP149]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP150:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV106:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP150]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP151:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV107:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP151]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP152:%.*]] = call x86_fp80 @llvm.fma.f80(x86_fp80 [[CONV105]], x86_fp80 [[CONV106]], x86_fp80 [[CONV107]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP153:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV108:%.*]] = call nsz double @llvm.fpext.f64.f32(float [[TMP153]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP154:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV109:%.*]] = call nsz double @llvm.fpext.f64.f32(float [[TMP154]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP155:%.*]] = call nsz double @llvm.maxnum.f64(double [[CONV108]], double [[CONV109]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP156:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP157:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP158:%.*]] = call nsz float @llvm.maxnum.f32(float [[TMP156]], float [[TMP157]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP159:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV110:%.*]] = call nsz x86_fp80 @llvm.fpext.f80.f32(float [[TMP159]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP160:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV111:%.*]] = call nsz x86_fp80 @llvm.fpext.f80.f32(float [[TMP160]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP161:%.*]] = call nsz x86_fp80 @llvm.maxnum.f80(x86_fp80 [[CONV110]], x86_fp80 [[CONV111]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP162:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV112:%.*]] = call nsz double @llvm.fpext.f64.f32(float [[TMP162]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP163:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV113:%.*]] = call nsz double @llvm.fpext.f64.f32(float [[TMP163]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP164:%.*]] = call nsz double @llvm.minnum.f64(double [[CONV112]], double [[CONV113]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP165:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP166:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP167:%.*]] = call nsz float @llvm.minnum.f32(float [[TMP165]], float [[TMP166]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP168:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV114:%.*]] = call nsz x86_fp80 @llvm.fpext.f80.f32(float [[TMP168]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP169:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV115:%.*]] = call nsz x86_fp80 @llvm.fpext.f80.f32(float [[TMP169]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP170:%.*]] = call nsz x86_fp80 @llvm.minnum.f80(x86_fp80 [[CONV114]], x86_fp80 [[CONV115]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP171:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP172:%.*]] = load double, ptr [[TMP171]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP173:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP174:%.*]] = load double, ptr [[TMP173]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP175:%.*]] = call double @llvm.maximumnum.f64(double [[TMP172]], double [[TMP174]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP176:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP177:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP178:%.*]] = call float @llvm.maximumnum.f32(float [[TMP176]], float [[TMP177]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP179:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP180:%.*]] = load x86_fp80, ptr [[TMP179]], align 16
+// HAS_MAYTRAP-NEXT:    [[TMP181:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP182:%.*]] = load x86_fp80, ptr [[TMP181]], align 16
+// HAS_MAYTRAP-NEXT:    [[TMP183:%.*]] = call x86_fp80 @llvm.maximumnum.f80(x86_fp80 [[TMP180]], x86_fp80 [[TMP182]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP184:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP185:%.*]] = load double, ptr [[TMP184]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP186:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP187:%.*]] = load double, ptr [[TMP186]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP188:%.*]] = call double @llvm.minimumnum.f64(double [[TMP185]], double [[TMP187]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP189:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP190:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP191:%.*]] = call float @llvm.minimumnum.f32(float [[TMP189]], float [[TMP190]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP192:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP193:%.*]] = load x86_fp80, ptr [[TMP192]], align 16
+// HAS_MAYTRAP-NEXT:    [[TMP194:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP195:%.*]] = load x86_fp80, ptr [[TMP194]], align 16
+// HAS_MAYTRAP-NEXT:    [[TMP196:%.*]] = call x86_fp80 @llvm.minimumnum.f80(x86_fp80 [[TMP193]], x86_fp80 [[TMP195]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP197:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV116:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP197]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP198:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV117:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP198]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL118:%.*]] = call double @hypot(double noundef [[CONV116]], double noundef [[CONV117]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP199:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP200:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL119:%.*]] = call float @hypotf(float noundef [[TMP199]], float noundef [[TMP200]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP201:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV120:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP201]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP202:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV121:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP202]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL122:%.*]] = call x86_fp80 @hypotl(x86_fp80 noundef [[CONV120]], x86_fp80 noundef [[CONV121]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP203:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV123:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP203]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL124:%.*]] = call i32 @ilogb(double noundef [[CONV123]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP204:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL125:%.*]] = call i32 @ilogbf(float noundef [[TMP204]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP205:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV126:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP205]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL127:%.*]] = call i32 @ilogbl(x86_fp80 noundef [[CONV126]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP206:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV128:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP206]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL129:%.*]] = call double @lgamma(double noundef [[CONV128]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP207:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL130:%.*]] = call float @lgammaf(float noundef [[TMP207]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP208:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV131:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP208]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL132:%.*]] = call x86_fp80 @lgammal(x86_fp80 noundef [[CONV131]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP209:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV133:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP209]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP210:%.*]] = call i64 @llvm.llrint.i64.f64(double [[CONV133]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP211:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP212:%.*]] = call i64 @llvm.llrint.i64.f32(float [[TMP211]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP213:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV134:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP213]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP214:%.*]] = call i64 @llvm.llrint.i64.f80(x86_fp80 [[CONV134]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP215:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV135:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP215]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP216:%.*]] = call i64 @llvm.llround.i64.f64(double [[CONV135]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP217:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP218:%.*]] = call i64 @llvm.llround.i64.f32(float [[TMP217]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP219:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV136:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP219]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP220:%.*]] = call i64 @llvm.llround.i64.f80(x86_fp80 [[CONV136]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP221:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV137:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP221]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP222:%.*]] = call double @llvm.log.f64(double [[CONV137]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP223:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP224:%.*]] = call float @llvm.log.f32(float [[TMP223]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP225:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV138:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP225]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP226:%.*]] = call x86_fp80 @llvm.log.f80(x86_fp80 [[CONV138]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP227:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV139:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP227]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP228:%.*]] = call double @llvm.log10.f64(double [[CONV139]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP229:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP230:%.*]] = call float @llvm.log10.f32(float [[TMP229]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP231:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV140:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP231]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP232:%.*]] = call x86_fp80 @llvm.log10.f80(x86_fp80 [[CONV140]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP233:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV141:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP233]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL142:%.*]] = call double @log1p(double noundef [[CONV141]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP234:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL143:%.*]] = call float @log1pf(float noundef [[TMP234]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP235:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV144:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP235]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL145:%.*]] = call x86_fp80 @log1pl(x86_fp80 noundef [[CONV144]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP236:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV146:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP236]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP237:%.*]] = call double @llvm.log2.f64(double [[CONV146]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP238:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP239:%.*]] = call float @llvm.log2.f32(float [[TMP238]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP240:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV147:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP240]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP241:%.*]] = call x86_fp80 @llvm.log2.f80(x86_fp80 [[CONV147]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP242:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV148:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP242]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL149:%.*]] = call double @logb(double noundef [[CONV148]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP243:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL150:%.*]] = call float @logbf(float noundef [[TMP243]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP244:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV151:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP244]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL152:%.*]] = call x86_fp80 @logbl(x86_fp80 noundef [[CONV151]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP245:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV153:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP245]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP246:%.*]] = call i64 @llvm.lrint.i64.f64(double [[CONV153]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP247:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP248:%.*]] = call i64 @llvm.lrint.i64.f32(float [[TMP247]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP249:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV154:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP249]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP250:%.*]] = call i64 @llvm.lrint.i64.f80(x86_fp80 [[CONV154]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP251:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV155:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP251]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP252:%.*]] = call i64 @llvm.lround.i64.f64(double [[CONV155]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP253:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP254:%.*]] = call i64 @llvm.lround.i64.f32(float [[TMP253]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP255:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV156:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP255]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP256:%.*]] = call i64 @llvm.lround.i64.f80(x86_fp80 [[CONV156]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP257:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV157:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP257]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP258:%.*]] = call double @llvm.nearbyint.f64(double [[CONV157]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP259:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP260:%.*]] = call float @llvm.nearbyint.f32(float [[TMP259]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP261:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV158:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP261]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP262:%.*]] = call x86_fp80 @llvm.nearbyint.f80(x86_fp80 [[CONV158]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP263:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV159:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP263]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP264:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV160:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP264]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL161:%.*]] = call double @nextafter(double noundef [[CONV159]], double noundef [[CONV160]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP265:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP266:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL162:%.*]] = call float @nextafterf(float noundef [[TMP265]], float noundef [[TMP266]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP267:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV163:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP267]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP268:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV164:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP268]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL165:%.*]] = call x86_fp80 @nextafterl(x86_fp80 noundef [[CONV163]], x86_fp80 noundef [[CONV164]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP269:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV166:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP269]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP270:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV167:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP270]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL168:%.*]] = call double @nexttoward(double noundef [[CONV166]], x86_fp80 noundef [[CONV167]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP271:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP272:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV169:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP272]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL170:%.*]] = call float @nexttowardf(float noundef [[TMP271]], x86_fp80 noundef [[CONV169]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP273:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV171:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP273]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP274:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV172:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP274]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL173:%.*]] = call x86_fp80 @nexttowardl(x86_fp80 noundef [[CONV171]], x86_fp80 noundef [[CONV172]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP275:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV174:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP275]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP276:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV175:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP276]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL176:%.*]] = call double @remainder(double noundef [[CONV174]], double noundef [[CONV175]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP277:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP278:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL177:%.*]] = call float @remainderf(float noundef [[TMP277]], float noundef [[TMP278]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP279:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV178:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP279]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP280:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV179:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP280]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL180:%.*]] = call x86_fp80 @remainderl(x86_fp80 noundef [[CONV178]], x86_fp80 noundef [[CONV179]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP281:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV181:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP281]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP282:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV182:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP282]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP283:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL183:%.*]] = call double @remquo(double noundef [[CONV181]], double noundef [[CONV182]], ptr noundef [[TMP283]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP284:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP285:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP286:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL184:%.*]] = call float @remquof(float noundef [[TMP284]], float noundef [[TMP285]], ptr noundef [[TMP286]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP287:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV185:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP287]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP288:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV186:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP288]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP289:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[CALL187:%.*]] = call x86_fp80 @remquol(x86_fp80 noundef [[CONV185]], x86_fp80 noundef [[CONV186]], ptr noundef [[TMP289]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP290:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV188:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP290]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP291:%.*]] = call double @llvm.rint.f64(double [[CONV188]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP292:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP293:%.*]] = call float @llvm.rint.f32(float [[TMP292]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP294:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV189:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP294]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP295:%.*]] = call x86_fp80 @llvm.rint.f80(x86_fp80 [[CONV189]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP296:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV190:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP296]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP297:%.*]] = call double @llvm.round.f64(double [[CONV190]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP298:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP299:%.*]] = call float @llvm.round.f32(float [[TMP298]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP300:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV191:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP300]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP301:%.*]] = call x86_fp80 @llvm.round.f80(x86_fp80 [[CONV191]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP302:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV192:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP302]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP303:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV193:%.*]] = call i64 @llvm.fptosi.i64.f32(float [[TMP303]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL194:%.*]] = call double @scalbln(double noundef [[CONV192]], i64 noundef [[CONV193]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP304:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP305:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV195:%.*]] = call i64 @llvm.fptosi.i64.f32(float [[TMP305]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL196:%.*]] = call float @scalblnf(float noundef [[TMP304]], i64 noundef [[CONV195]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP306:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV197:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP306]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP307:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV198:%.*]] = call i64 @llvm.fptosi.i64.f32(float [[TMP307]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL199:%.*]] = call x86_fp80 @scalblnl(x86_fp80 noundef [[CONV197]], i64 noundef [[CONV198]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP308:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV200:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP308]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP309:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV201:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP309]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL202:%.*]] = call double @scalbn(double noundef [[CONV200]], i32 noundef [[CONV201]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP310:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP311:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV203:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP311]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL204:%.*]] = call float @scalbnf(float noundef [[TMP310]], i32 noundef [[CONV203]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP312:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV205:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP312]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP313:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV206:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[TMP313]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL207:%.*]] = call x86_fp80 @scalbnl(x86_fp80 noundef [[CONV205]], i32 noundef [[CONV206]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP314:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV208:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP314]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP315:%.*]] = call double @llvm.sin.f64(double [[CONV208]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP316:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP317:%.*]] = call float @llvm.sin.f32(float [[TMP316]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP318:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV209:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP318]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP319:%.*]] = call x86_fp80 @llvm.sin.f80(x86_fp80 [[CONV209]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP320:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV210:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP320]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP321:%.*]] = call double @llvm.sinh.f64(double [[CONV210]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP322:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP323:%.*]] = call float @llvm.sinh.f32(float [[TMP322]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP324:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV211:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP324]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP325:%.*]] = call x86_fp80 @llvm.sinh.f80(x86_fp80 [[CONV211]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP326:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV212:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP326]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP327:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP328:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    call void @sincos(double noundef [[CONV212]], ptr noundef [[TMP327]], ptr noundef [[TMP328]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP329:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP330:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP331:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    call void @sincosf(float noundef [[TMP329]], ptr noundef [[TMP330]], ptr noundef [[TMP331]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP332:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV213:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP332]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP333:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    [[TMP334:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_MAYTRAP-NEXT:    call void @sincosl(x86_fp80 noundef [[CONV213]], ptr noundef [[TMP333]], ptr noundef [[TMP334]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP335:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV214:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP335]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP336:%.*]] = call double @llvm.sqrt.f64(double [[CONV214]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP337:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP338:%.*]] = call float @llvm.sqrt.f32(float [[TMP337]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP339:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV215:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP339]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP340:%.*]] = call x86_fp80 @llvm.sqrt.f80(x86_fp80 [[CONV215]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP341:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV216:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP341]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP342:%.*]] = call double @llvm.tan.f64(double [[CONV216]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP343:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP344:%.*]] = call float @llvm.tan.f32(float [[TMP343]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP345:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV217:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP345]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP346:%.*]] = call x86_fp80 @llvm.tan.f80(x86_fp80 [[CONV217]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP347:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV218:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP347]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP348:%.*]] = call double @llvm.tanh.f64(double [[CONV218]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP349:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP350:%.*]] = call float @llvm.tanh.f32(float [[TMP349]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP351:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV219:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP351]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP352:%.*]] = call x86_fp80 @llvm.tanh.f80(x86_fp80 [[CONV219]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP353:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV220:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP353]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL221:%.*]] = call double @tgamma(double noundef [[CONV220]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP354:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CALL222:%.*]] = call float @tgammaf(float noundef [[TMP354]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP355:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV223:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP355]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[CALL224:%.*]] = call x86_fp80 @tgammal(x86_fp80 noundef [[CONV223]]) #[[ATTR7]]
+// HAS_MAYTRAP-NEXT:    [[TMP356:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV225:%.*]] = call double @llvm.fpext.f64.f32(float [[TMP356]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP357:%.*]] = call double @llvm.trunc.f64(double [[CONV225]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP358:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[TMP359:%.*]] = call float @llvm.trunc.f32(float [[TMP358]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP360:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_MAYTRAP-NEXT:    [[CONV226:%.*]] = call x86_fp80 @llvm.fpext.f80.f32(float [[TMP360]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    [[TMP361:%.*]] = call x86_fp80 @llvm.trunc.f80(x86_fp80 [[CONV226]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// HAS_MAYTRAP-NEXT:    ret void
+//
+// HAS_ERRNO_GNU-LABEL: define dso_local void @foo(
+// HAS_ERRNO_GNU-SAME: ptr noundef [[D:%.*]], float noundef [[F:%.*]], ptr noundef [[FP:%.*]], ptr noundef [[L:%.*]], ptr noundef [[I:%.*]], ptr noundef [[C:%.*]]) #[[ATTR0:[0-9]+]] {
+// HAS_ERRNO_GNU-NEXT:  [[ENTRY:.*:]]
+// HAS_ERRNO_GNU-NEXT:    [[D_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_GNU-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// HAS_ERRNO_GNU-NEXT:    [[FP_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_GNU-NEXT:    [[L_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_GNU-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_GNU-NEXT:    [[C_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_GNU-NEXT:    store ptr [[D]], ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    store ptr [[FP]], ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    store ptr [[L]], ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    store ptr [[C]], ptr [[C_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP1:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL:%.*]] = call double @fmod(double noundef [[CONV]], double noundef [[CONV1]]) #[[ATTR6:[0-9]+]]
+// HAS_ERRNO_GNU-NEXT:    [[CONV2:%.*]] = fptrunc double [[CALL]] to float
+// HAS_ERRNO_GNU-NEXT:    store float [[CONV2]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP2:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP3:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL3:%.*]] = call float @fmodf(float noundef [[TMP2]], float noundef [[TMP3]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    store float [[CALL3]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP4:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV4:%.*]] = fpext float [[TMP4]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP5:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV5:%.*]] = fpext float [[TMP5]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL6:%.*]] = call x86_fp80 @fmodl(x86_fp80 noundef [[CONV4]], x86_fp80 noundef [[CONV5]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[CONV7:%.*]] = fptrunc x86_fp80 [[CALL6]] to float
+// HAS_ERRNO_GNU-NEXT:    store float [[CONV7]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP6:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV8:%.*]] = fpext float [[TMP6]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP7:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV9:%.*]] = fpext float [[TMP7]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL10:%.*]] = call double @atan2(double noundef [[CONV8]], double noundef [[CONV9]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP8:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP9:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL11:%.*]] = call float @atan2f(float noundef [[TMP8]], float noundef [[TMP9]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP10:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV12:%.*]] = fpext float [[TMP10]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP11:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV13:%.*]] = fpext float [[TMP11]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL14:%.*]] = call x86_fp80 @atan2l(x86_fp80 noundef [[CONV12]], x86_fp80 noundef [[CONV13]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP12:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV15:%.*]] = fpext float [[TMP12]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP13:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV16:%.*]] = fpext float [[TMP13]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP14:%.*]] = call double @llvm.copysign.f64(double [[CONV15]], double [[CONV16]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP15:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP16:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP17:%.*]] = call float @llvm.copysign.f32(float [[TMP15]], float [[TMP16]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP18:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV17:%.*]] = fpext float [[TMP18]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP19:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV18:%.*]] = fpext float [[TMP19]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP20:%.*]] = call x86_fp80 @llvm.copysign.f80(x86_fp80 [[CONV17]], x86_fp80 [[CONV18]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP21:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV19:%.*]] = fpext float [[TMP21]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP22:%.*]] = call double @llvm.fabs.f64(double [[CONV19]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP23:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP24:%.*]] = call float @llvm.fabs.f32(float [[TMP23]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP25:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV20:%.*]] = fpext float [[TMP25]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP26:%.*]] = call x86_fp80 @llvm.fabs.f80(x86_fp80 [[CONV20]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP27:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV21:%.*]] = fpext float [[TMP27]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP28:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL22:%.*]] = call double @frexp(double noundef [[CONV21]], ptr noundef [[TMP28]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP29:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP30:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL23:%.*]] = call float @frexpf(float noundef [[TMP29]], ptr noundef [[TMP30]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP31:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV24:%.*]] = fpext float [[TMP31]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP32:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL25:%.*]] = call x86_fp80 @frexpl(x86_fp80 noundef [[CONV24]], ptr noundef [[TMP32]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP33:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV26:%.*]] = fpext float [[TMP33]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP34:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV27:%.*]] = fptosi float [[TMP34]] to i32
+// HAS_ERRNO_GNU-NEXT:    [[CALL28:%.*]] = call double @ldexp(double noundef [[CONV26]], i32 noundef [[CONV27]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP35:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP36:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV29:%.*]] = fptosi float [[TMP36]] to i32
+// HAS_ERRNO_GNU-NEXT:    [[CALL30:%.*]] = call float @ldexpf(float noundef [[TMP35]], i32 noundef [[CONV29]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP37:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV31:%.*]] = fpext float [[TMP37]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP38:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV32:%.*]] = fptosi float [[TMP38]] to i32
+// HAS_ERRNO_GNU-NEXT:    [[CALL33:%.*]] = call x86_fp80 @ldexpl(x86_fp80 noundef [[CONV31]], i32 noundef [[CONV32]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP39:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV34:%.*]] = fpext float [[TMP39]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP40:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP41:%.*]] = call { double, double } @llvm.modf.f64(double [[CONV34]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP42:%.*]] = extractvalue { double, double } [[TMP41]], 0
+// HAS_ERRNO_GNU-NEXT:    [[TMP43:%.*]] = extractvalue { double, double } [[TMP41]], 1
+// HAS_ERRNO_GNU-NEXT:    store double [[TMP43]], ptr [[TMP40]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP44:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP45:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP46:%.*]] = call { float, float } @llvm.modf.f32(float [[TMP44]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP47:%.*]] = extractvalue { float, float } [[TMP46]], 0
+// HAS_ERRNO_GNU-NEXT:    [[TMP48:%.*]] = extractvalue { float, float } [[TMP46]], 1
+// HAS_ERRNO_GNU-NEXT:    store float [[TMP48]], ptr [[TMP45]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP49:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV35:%.*]] = fpext float [[TMP49]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP50:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP51:%.*]] = call { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 [[CONV35]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP52:%.*]] = extractvalue { x86_fp80, x86_fp80 } [[TMP51]], 0
+// HAS_ERRNO_GNU-NEXT:    [[TMP53:%.*]] = extractvalue { x86_fp80, x86_fp80 } [[TMP51]], 1
+// HAS_ERRNO_GNU-NEXT:    store x86_fp80 [[TMP53]], ptr [[TMP50]], align 16
+// HAS_ERRNO_GNU-NEXT:    [[TMP54:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL36:%.*]] = call double @nan(ptr noundef [[TMP54]]) #[[ATTR7:[0-9]+]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP55:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL37:%.*]] = call float @nanf(ptr noundef [[TMP55]]) #[[ATTR7]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP56:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL38:%.*]] = call x86_fp80 @nanl(ptr noundef [[TMP56]]) #[[ATTR7]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP57:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV39:%.*]] = fpext float [[TMP57]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP58:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV40:%.*]] = fpext float [[TMP58]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL41:%.*]] = call double @pow(double noundef [[CONV39]], double noundef [[CONV40]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP59:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP60:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL42:%.*]] = call float @powf(float noundef [[TMP59]], float noundef [[TMP60]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP61:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV43:%.*]] = fpext float [[TMP61]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP62:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV44:%.*]] = fpext float [[TMP62]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL45:%.*]] = call x86_fp80 @powl(x86_fp80 noundef [[CONV43]], x86_fp80 noundef [[CONV44]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP63:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV46:%.*]] = fpext float [[TMP63]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL47:%.*]] = call double @acos(double noundef [[CONV46]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP64:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL48:%.*]] = call float @acosf(float noundef [[TMP64]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP65:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV49:%.*]] = fpext float [[TMP65]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL50:%.*]] = call x86_fp80 @acosl(x86_fp80 noundef [[CONV49]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP66:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV51:%.*]] = fpext float [[TMP66]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL52:%.*]] = call double @acosh(double noundef [[CONV51]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP67:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL53:%.*]] = call float @acoshf(float noundef [[TMP67]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP68:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV54:%.*]] = fpext float [[TMP68]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL55:%.*]] = call x86_fp80 @acoshl(x86_fp80 noundef [[CONV54]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP69:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV56:%.*]] = fpext float [[TMP69]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL57:%.*]] = call double @asin(double noundef [[CONV56]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP70:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL58:%.*]] = call float @asinf(float noundef [[TMP70]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP71:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV59:%.*]] = fpext float [[TMP71]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL60:%.*]] = call x86_fp80 @asinl(x86_fp80 noundef [[CONV59]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP72:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV61:%.*]] = fpext float [[TMP72]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL62:%.*]] = call double @asinh(double noundef [[CONV61]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP73:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL63:%.*]] = call float @asinhf(float noundef [[TMP73]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP74:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV64:%.*]] = fpext float [[TMP74]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL65:%.*]] = call x86_fp80 @asinhl(x86_fp80 noundef [[CONV64]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP75:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV66:%.*]] = fpext float [[TMP75]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL67:%.*]] = call double @atan(double noundef [[CONV66]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP76:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL68:%.*]] = call float @atanf(float noundef [[TMP76]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP77:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV69:%.*]] = fpext float [[TMP77]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL70:%.*]] = call x86_fp80 @atanl(x86_fp80 noundef [[CONV69]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP78:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV71:%.*]] = fpext float [[TMP78]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL72:%.*]] = call double @atanh(double noundef [[CONV71]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP79:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL73:%.*]] = call float @atanhf(float noundef [[TMP79]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP80:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV74:%.*]] = fpext float [[TMP80]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL75:%.*]] = call x86_fp80 @atanhl(x86_fp80 noundef [[CONV74]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP81:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV76:%.*]] = fpext float [[TMP81]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL77:%.*]] = call double @cbrt(double noundef [[CONV76]]) #[[ATTR8:[0-9]+]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP82:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL78:%.*]] = call float @cbrtf(float noundef [[TMP82]]) #[[ATTR8]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP83:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV79:%.*]] = fpext float [[TMP83]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL80:%.*]] = call x86_fp80 @cbrtl(x86_fp80 noundef [[CONV79]]) #[[ATTR8]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP84:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV81:%.*]] = fpext float [[TMP84]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP85:%.*]] = call double @llvm.ceil.f64(double [[CONV81]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP86:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP87:%.*]] = call float @llvm.ceil.f32(float [[TMP86]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP88:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV82:%.*]] = fpext float [[TMP88]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP89:%.*]] = call x86_fp80 @llvm.ceil.f80(x86_fp80 [[CONV82]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP90:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV83:%.*]] = fpext float [[TMP90]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL84:%.*]] = call double @cos(double noundef [[CONV83]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP91:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL85:%.*]] = call float @cosf(float noundef [[TMP91]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP92:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV86:%.*]] = fpext float [[TMP92]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL87:%.*]] = call x86_fp80 @cosl(x86_fp80 noundef [[CONV86]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP93:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV88:%.*]] = fpext float [[TMP93]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL89:%.*]] = call double @cosh(double noundef [[CONV88]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP94:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL90:%.*]] = call float @coshf(float noundef [[TMP94]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP95:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV91:%.*]] = fpext float [[TMP95]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL92:%.*]] = call x86_fp80 @coshl(x86_fp80 noundef [[CONV91]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP96:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV93:%.*]] = fpext float [[TMP96]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL94:%.*]] = call double @erf(double noundef [[CONV93]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP97:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL95:%.*]] = call float @erff(float noundef [[TMP97]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP98:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV96:%.*]] = fpext float [[TMP98]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL97:%.*]] = call x86_fp80 @erfl(x86_fp80 noundef [[CONV96]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP99:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV98:%.*]] = fpext float [[TMP99]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL99:%.*]] = call double @erfc(double noundef [[CONV98]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP100:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL100:%.*]] = call float @erfcf(float noundef [[TMP100]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP101:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV101:%.*]] = fpext float [[TMP101]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL102:%.*]] = call x86_fp80 @erfcl(x86_fp80 noundef [[CONV101]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP102:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV103:%.*]] = fpext float [[TMP102]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL104:%.*]] = call double @exp(double noundef [[CONV103]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP103:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL105:%.*]] = call float @expf(float noundef [[TMP103]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP104:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV106:%.*]] = fpext float [[TMP104]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL107:%.*]] = call x86_fp80 @expl(x86_fp80 noundef [[CONV106]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP105:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV108:%.*]] = fpext float [[TMP105]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL109:%.*]] = call double @exp2(double noundef [[CONV108]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP106:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL110:%.*]] = call float @exp2f(float noundef [[TMP106]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP107:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV111:%.*]] = fpext float [[TMP107]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL112:%.*]] = call x86_fp80 @exp2l(x86_fp80 noundef [[CONV111]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP108:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV113:%.*]] = fpext float [[TMP108]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL114:%.*]] = call double @expm1(double noundef [[CONV113]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP109:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL115:%.*]] = call float @expm1f(float noundef [[TMP109]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP110:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV116:%.*]] = fpext float [[TMP110]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL117:%.*]] = call x86_fp80 @expm1l(x86_fp80 noundef [[CONV116]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP111:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV118:%.*]] = fpext float [[TMP111]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP112:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV119:%.*]] = fpext float [[TMP112]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL120:%.*]] = call double @fdim(double noundef [[CONV118]], double noundef [[CONV119]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP113:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP114:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL121:%.*]] = call float @fdimf(float noundef [[TMP113]], float noundef [[TMP114]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP115:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV122:%.*]] = fpext float [[TMP115]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP116:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV123:%.*]] = fpext float [[TMP116]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL124:%.*]] = call x86_fp80 @fdiml(x86_fp80 noundef [[CONV122]], x86_fp80 noundef [[CONV123]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP117:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV125:%.*]] = fpext float [[TMP117]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP118:%.*]] = call double @llvm.floor.f64(double [[CONV125]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP119:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP120:%.*]] = call float @llvm.floor.f32(float [[TMP119]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP121:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV126:%.*]] = fpext float [[TMP121]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP122:%.*]] = call x86_fp80 @llvm.floor.f80(x86_fp80 [[CONV126]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP123:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV127:%.*]] = fpext float [[TMP123]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP124:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV128:%.*]] = fpext float [[TMP124]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP125:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV129:%.*]] = fpext float [[TMP125]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP126:%.*]] = call double @llvm.fma.f64(double [[CONV127]], double [[CONV128]], double [[CONV129]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP127:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP128:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP129:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP130:%.*]] = call float @llvm.fma.f32(float [[TMP127]], float [[TMP128]], float [[TMP129]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP131:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV130:%.*]] = fpext float [[TMP131]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP132:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV131:%.*]] = fpext float [[TMP132]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP133:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV132:%.*]] = fpext float [[TMP133]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP134:%.*]] = call x86_fp80 @llvm.fma.f80(x86_fp80 [[CONV130]], x86_fp80 [[CONV131]], x86_fp80 [[CONV132]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP135:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV133:%.*]] = fpext nsz float [[TMP135]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP136:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV134:%.*]] = fpext nsz float [[TMP136]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP137:%.*]] = call nsz double @llvm.maxnum.f64(double [[CONV133]], double [[CONV134]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP138:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP139:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP140:%.*]] = call nsz float @llvm.maxnum.f32(float [[TMP138]], float [[TMP139]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP141:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV135:%.*]] = fpext nsz float [[TMP141]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP142:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV136:%.*]] = fpext nsz float [[TMP142]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP143:%.*]] = call nsz x86_fp80 @llvm.maxnum.f80(x86_fp80 [[CONV135]], x86_fp80 [[CONV136]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP144:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV137:%.*]] = fpext nsz float [[TMP144]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP145:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV138:%.*]] = fpext nsz float [[TMP145]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP146:%.*]] = call nsz double @llvm.minnum.f64(double [[CONV137]], double [[CONV138]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP147:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP148:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP149:%.*]] = call nsz float @llvm.minnum.f32(float [[TMP147]], float [[TMP148]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP150:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV139:%.*]] = fpext nsz float [[TMP150]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP151:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV140:%.*]] = fpext nsz float [[TMP151]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP152:%.*]] = call nsz x86_fp80 @llvm.minnum.f80(x86_fp80 [[CONV139]], x86_fp80 [[CONV140]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP153:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP154:%.*]] = load double, ptr [[TMP153]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP155:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP156:%.*]] = load double, ptr [[TMP155]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP157:%.*]] = call double @llvm.maximumnum.f64(double [[TMP154]], double [[TMP156]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP158:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP159:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP160:%.*]] = call float @llvm.maximumnum.f32(float [[TMP158]], float [[TMP159]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP161:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP162:%.*]] = load x86_fp80, ptr [[TMP161]], align 16
+// HAS_ERRNO_GNU-NEXT:    [[TMP163:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP164:%.*]] = load x86_fp80, ptr [[TMP163]], align 16
+// HAS_ERRNO_GNU-NEXT:    [[TMP165:%.*]] = call x86_fp80 @llvm.maximumnum.f80(x86_fp80 [[TMP162]], x86_fp80 [[TMP164]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP166:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP167:%.*]] = load double, ptr [[TMP166]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP168:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP169:%.*]] = load double, ptr [[TMP168]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP170:%.*]] = call double @llvm.minimumnum.f64(double [[TMP167]], double [[TMP169]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP171:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP172:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP173:%.*]] = call float @llvm.minimumnum.f32(float [[TMP171]], float [[TMP172]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP174:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP175:%.*]] = load x86_fp80, ptr [[TMP174]], align 16
+// HAS_ERRNO_GNU-NEXT:    [[TMP176:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP177:%.*]] = load x86_fp80, ptr [[TMP176]], align 16
+// HAS_ERRNO_GNU-NEXT:    [[TMP178:%.*]] = call x86_fp80 @llvm.minimumnum.f80(x86_fp80 [[TMP175]], x86_fp80 [[TMP177]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP179:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV141:%.*]] = fpext float [[TMP179]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP180:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV142:%.*]] = fpext float [[TMP180]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL143:%.*]] = call double @hypot(double noundef [[CONV141]], double noundef [[CONV142]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP181:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP182:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL144:%.*]] = call float @hypotf(float noundef [[TMP181]], float noundef [[TMP182]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP183:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV145:%.*]] = fpext float [[TMP183]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP184:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV146:%.*]] = fpext float [[TMP184]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL147:%.*]] = call x86_fp80 @hypotl(x86_fp80 noundef [[CONV145]], x86_fp80 noundef [[CONV146]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP185:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV148:%.*]] = fpext float [[TMP185]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL149:%.*]] = call i32 @ilogb(double noundef [[CONV148]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP186:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL150:%.*]] = call i32 @ilogbf(float noundef [[TMP186]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP187:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV151:%.*]] = fpext float [[TMP187]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL152:%.*]] = call i32 @ilogbl(x86_fp80 noundef [[CONV151]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP188:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV153:%.*]] = fpext float [[TMP188]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL154:%.*]] = call double @lgamma(double noundef [[CONV153]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP189:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL155:%.*]] = call float @lgammaf(float noundef [[TMP189]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP190:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV156:%.*]] = fpext float [[TMP190]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL157:%.*]] = call x86_fp80 @lgammal(x86_fp80 noundef [[CONV156]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP191:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV158:%.*]] = fpext float [[TMP191]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL159:%.*]] = call i64 @llrint(double noundef [[CONV158]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP192:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL160:%.*]] = call i64 @llrintf(float noundef [[TMP192]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP193:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV161:%.*]] = fpext float [[TMP193]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL162:%.*]] = call i64 @llrintl(x86_fp80 noundef [[CONV161]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP194:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV163:%.*]] = fpext float [[TMP194]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL164:%.*]] = call i64 @llround(double noundef [[CONV163]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP195:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL165:%.*]] = call i64 @llroundf(float noundef [[TMP195]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP196:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV166:%.*]] = fpext float [[TMP196]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL167:%.*]] = call i64 @llroundl(x86_fp80 noundef [[CONV166]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP197:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV168:%.*]] = fpext float [[TMP197]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL169:%.*]] = call double @log(double noundef [[CONV168]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP198:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL170:%.*]] = call float @logf(float noundef [[TMP198]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP199:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV171:%.*]] = fpext float [[TMP199]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL172:%.*]] = call x86_fp80 @logl(x86_fp80 noundef [[CONV171]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP200:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV173:%.*]] = fpext float [[TMP200]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL174:%.*]] = call double @log10(double noundef [[CONV173]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP201:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL175:%.*]] = call float @log10f(float noundef [[TMP201]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP202:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV176:%.*]] = fpext float [[TMP202]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL177:%.*]] = call x86_fp80 @log10l(x86_fp80 noundef [[CONV176]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP203:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV178:%.*]] = fpext float [[TMP203]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL179:%.*]] = call double @log1p(double noundef [[CONV178]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP204:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL180:%.*]] = call float @log1pf(float noundef [[TMP204]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP205:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV181:%.*]] = fpext float [[TMP205]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL182:%.*]] = call x86_fp80 @log1pl(x86_fp80 noundef [[CONV181]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP206:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV183:%.*]] = fpext float [[TMP206]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL184:%.*]] = call double @log2(double noundef [[CONV183]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP207:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL185:%.*]] = call float @log2f(float noundef [[TMP207]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP208:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV186:%.*]] = fpext float [[TMP208]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL187:%.*]] = call x86_fp80 @log2l(x86_fp80 noundef [[CONV186]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP209:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV188:%.*]] = fpext float [[TMP209]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL189:%.*]] = call double @logb(double noundef [[CONV188]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP210:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL190:%.*]] = call float @logbf(float noundef [[TMP210]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP211:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV191:%.*]] = fpext float [[TMP211]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL192:%.*]] = call x86_fp80 @logbl(x86_fp80 noundef [[CONV191]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP212:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV193:%.*]] = fpext float [[TMP212]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL194:%.*]] = call i64 @lrint(double noundef [[CONV193]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP213:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL195:%.*]] = call i64 @lrintf(float noundef [[TMP213]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP214:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV196:%.*]] = fpext float [[TMP214]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL197:%.*]] = call i64 @lrintl(x86_fp80 noundef [[CONV196]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP215:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV198:%.*]] = fpext float [[TMP215]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL199:%.*]] = call i64 @lround(double noundef [[CONV198]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP216:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL200:%.*]] = call i64 @lroundf(float noundef [[TMP216]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP217:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV201:%.*]] = fpext float [[TMP217]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL202:%.*]] = call i64 @lroundl(x86_fp80 noundef [[CONV201]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP218:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV203:%.*]] = fpext float [[TMP218]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP219:%.*]] = call double @llvm.nearbyint.f64(double [[CONV203]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP220:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP221:%.*]] = call float @llvm.nearbyint.f32(float [[TMP220]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP222:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV204:%.*]] = fpext float [[TMP222]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP223:%.*]] = call x86_fp80 @llvm.nearbyint.f80(x86_fp80 [[CONV204]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP224:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV205:%.*]] = fpext float [[TMP224]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP225:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV206:%.*]] = fpext float [[TMP225]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL207:%.*]] = call double @nextafter(double noundef [[CONV205]], double noundef [[CONV206]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP226:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP227:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL208:%.*]] = call float @nextafterf(float noundef [[TMP226]], float noundef [[TMP227]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP228:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV209:%.*]] = fpext float [[TMP228]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP229:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV210:%.*]] = fpext float [[TMP229]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL211:%.*]] = call x86_fp80 @nextafterl(x86_fp80 noundef [[CONV209]], x86_fp80 noundef [[CONV210]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP230:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV212:%.*]] = fpext float [[TMP230]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP231:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV213:%.*]] = fpext float [[TMP231]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL214:%.*]] = call double @nexttoward(double noundef [[CONV212]], x86_fp80 noundef [[CONV213]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP232:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP233:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV215:%.*]] = fpext float [[TMP233]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL216:%.*]] = call float @nexttowardf(float noundef [[TMP232]], x86_fp80 noundef [[CONV215]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP234:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV217:%.*]] = fpext float [[TMP234]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP235:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV218:%.*]] = fpext float [[TMP235]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL219:%.*]] = call x86_fp80 @nexttowardl(x86_fp80 noundef [[CONV217]], x86_fp80 noundef [[CONV218]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP236:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV220:%.*]] = fpext float [[TMP236]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP237:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV221:%.*]] = fpext float [[TMP237]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL222:%.*]] = call double @remainder(double noundef [[CONV220]], double noundef [[CONV221]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP238:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP239:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL223:%.*]] = call float @remainderf(float noundef [[TMP238]], float noundef [[TMP239]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP240:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV224:%.*]] = fpext float [[TMP240]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP241:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV225:%.*]] = fpext float [[TMP241]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL226:%.*]] = call x86_fp80 @remainderl(x86_fp80 noundef [[CONV224]], x86_fp80 noundef [[CONV225]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP242:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV227:%.*]] = fpext float [[TMP242]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP243:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV228:%.*]] = fpext float [[TMP243]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP244:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL229:%.*]] = call double @remquo(double noundef [[CONV227]], double noundef [[CONV228]], ptr noundef [[TMP244]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP245:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP246:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP247:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL230:%.*]] = call float @remquof(float noundef [[TMP245]], float noundef [[TMP246]], ptr noundef [[TMP247]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP248:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV231:%.*]] = fpext float [[TMP248]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP249:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV232:%.*]] = fpext float [[TMP249]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP250:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[CALL233:%.*]] = call x86_fp80 @remquol(x86_fp80 noundef [[CONV231]], x86_fp80 noundef [[CONV232]], ptr noundef [[TMP250]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP251:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV234:%.*]] = fpext float [[TMP251]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP252:%.*]] = call double @llvm.rint.f64(double [[CONV234]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP253:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP254:%.*]] = call float @llvm.rint.f32(float [[TMP253]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP255:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV235:%.*]] = fpext float [[TMP255]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP256:%.*]] = call x86_fp80 @llvm.rint.f80(x86_fp80 [[CONV235]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP257:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV236:%.*]] = fpext float [[TMP257]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP258:%.*]] = call double @llvm.round.f64(double [[CONV236]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP259:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP260:%.*]] = call float @llvm.round.f32(float [[TMP259]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP261:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV237:%.*]] = fpext float [[TMP261]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP262:%.*]] = call x86_fp80 @llvm.round.f80(x86_fp80 [[CONV237]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP263:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV238:%.*]] = fpext float [[TMP263]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP264:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV239:%.*]] = fptosi float [[TMP264]] to i64
+// HAS_ERRNO_GNU-NEXT:    [[CALL240:%.*]] = call double @scalbln(double noundef [[CONV238]], i64 noundef [[CONV239]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP265:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP266:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV241:%.*]] = fptosi float [[TMP266]] to i64
+// HAS_ERRNO_GNU-NEXT:    [[CALL242:%.*]] = call float @scalblnf(float noundef [[TMP265]], i64 noundef [[CONV241]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP267:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV243:%.*]] = fpext float [[TMP267]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP268:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV244:%.*]] = fptosi float [[TMP268]] to i64
+// HAS_ERRNO_GNU-NEXT:    [[CALL245:%.*]] = call x86_fp80 @scalblnl(x86_fp80 noundef [[CONV243]], i64 noundef [[CONV244]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP269:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV246:%.*]] = fpext float [[TMP269]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP270:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV247:%.*]] = fptosi float [[TMP270]] to i32
+// HAS_ERRNO_GNU-NEXT:    [[CALL248:%.*]] = call double @scalbn(double noundef [[CONV246]], i32 noundef [[CONV247]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP271:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP272:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV249:%.*]] = fptosi float [[TMP272]] to i32
+// HAS_ERRNO_GNU-NEXT:    [[CALL250:%.*]] = call float @scalbnf(float noundef [[TMP271]], i32 noundef [[CONV249]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP273:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV251:%.*]] = fpext float [[TMP273]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP274:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV252:%.*]] = fptosi float [[TMP274]] to i32
+// HAS_ERRNO_GNU-NEXT:    [[CALL253:%.*]] = call x86_fp80 @scalbnl(x86_fp80 noundef [[CONV251]], i32 noundef [[CONV252]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP275:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV254:%.*]] = fpext float [[TMP275]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL255:%.*]] = call double @sin(double noundef [[CONV254]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP276:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL256:%.*]] = call float @sinf(float noundef [[TMP276]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP277:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV257:%.*]] = fpext float [[TMP277]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL258:%.*]] = call x86_fp80 @sinl(x86_fp80 noundef [[CONV257]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP278:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV259:%.*]] = fpext float [[TMP278]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL260:%.*]] = call double @sinh(double noundef [[CONV259]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP279:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL261:%.*]] = call float @sinhf(float noundef [[TMP279]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP280:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV262:%.*]] = fpext float [[TMP280]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL263:%.*]] = call x86_fp80 @sinhl(x86_fp80 noundef [[CONV262]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP281:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV264:%.*]] = fpext float [[TMP281]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP282:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP283:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    call void @sincos(double noundef [[CONV264]], ptr noundef [[TMP282]], ptr noundef [[TMP283]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP284:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP285:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP286:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    call void @sincosf(float noundef [[TMP284]], ptr noundef [[TMP285]], ptr noundef [[TMP286]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP287:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV265:%.*]] = fpext float [[TMP287]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP288:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    [[TMP289:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_GNU-NEXT:    call void @sincosl(x86_fp80 noundef [[CONV265]], ptr noundef [[TMP288]], ptr noundef [[TMP289]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP290:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV266:%.*]] = fpext float [[TMP290]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL267:%.*]] = call double @sqrt(double noundef [[CONV266]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP291:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL268:%.*]] = call float @sqrtf(float noundef [[TMP291]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP292:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV269:%.*]] = fpext float [[TMP292]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL270:%.*]] = call x86_fp80 @sqrtl(x86_fp80 noundef [[CONV269]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP293:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV271:%.*]] = fpext float [[TMP293]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL272:%.*]] = call double @tan(double noundef [[CONV271]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP294:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL273:%.*]] = call float @tanf(float noundef [[TMP294]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP295:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV274:%.*]] = fpext float [[TMP295]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL275:%.*]] = call x86_fp80 @tanl(x86_fp80 noundef [[CONV274]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP296:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV276:%.*]] = fpext float [[TMP296]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL277:%.*]] = call double @tanh(double noundef [[CONV276]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP297:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL278:%.*]] = call float @tanhf(float noundef [[TMP297]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP298:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV279:%.*]] = fpext float [[TMP298]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL280:%.*]] = call x86_fp80 @tanhl(x86_fp80 noundef [[CONV279]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP299:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV281:%.*]] = fpext float [[TMP299]] to double
+// HAS_ERRNO_GNU-NEXT:    [[CALL282:%.*]] = call double @tgamma(double noundef [[CONV281]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP300:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CALL283:%.*]] = call float @tgammaf(float noundef [[TMP300]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP301:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV284:%.*]] = fpext float [[TMP301]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[CALL285:%.*]] = call x86_fp80 @tgammal(x86_fp80 noundef [[CONV284]]) #[[ATTR6]]
+// HAS_ERRNO_GNU-NEXT:    [[TMP302:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV286:%.*]] = fpext float [[TMP302]] to double
+// HAS_ERRNO_GNU-NEXT:    [[TMP303:%.*]] = call double @llvm.trunc.f64(double [[CONV286]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP304:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[TMP305:%.*]] = call float @llvm.trunc.f32(float [[TMP304]])
+// HAS_ERRNO_GNU-NEXT:    [[TMP306:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_GNU-NEXT:    [[CONV287:%.*]] = fpext float [[TMP306]] to x86_fp80
+// HAS_ERRNO_GNU-NEXT:    [[TMP307:%.*]] = call x86_fp80 @llvm.trunc.f80(x86_fp80 [[CONV287]])
+// HAS_ERRNO_GNU-NEXT:    ret void
+//
+// HAS_ERRNO_WIN-LABEL: define dso_local void @foo(
+// HAS_ERRNO_WIN-SAME: ptr noundef [[D:%.*]], float noundef [[F:%.*]], ptr noundef [[FP:%.*]], ptr noundef [[L:%.*]], ptr noundef [[I:%.*]], ptr noundef [[C:%.*]]) #[[ATTR0:[0-9]+]] {
+// HAS_ERRNO_WIN-NEXT:  [[ENTRY:.*:]]
+// HAS_ERRNO_WIN-NEXT:    [[C_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_WIN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_WIN-NEXT:    [[L_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_WIN-NEXT:    [[FP_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_WIN-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
+// HAS_ERRNO_WIN-NEXT:    [[D_ADDR:%.*]] = alloca ptr, align 8
+// HAS_ERRNO_WIN-NEXT:    store ptr [[C]], ptr [[C_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    store ptr [[L]], ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    store ptr [[FP]], ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    store float [[F]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    store ptr [[D]], ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV:%.*]] = fpext float [[TMP0]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP1:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV1:%.*]] = fpext float [[TMP1]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL:%.*]] = call double @fmod(double noundef [[CONV1]], double noundef [[CONV]]) #[[ATTR6:[0-9]+]]
+// HAS_ERRNO_WIN-NEXT:    [[CONV2:%.*]] = fptrunc double [[CALL]] to float
+// HAS_ERRNO_WIN-NEXT:    store float [[CONV2]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP2:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP3:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL3:%.*]] = call float @fmodf(float noundef [[TMP3]], float noundef [[TMP2]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    store float [[CALL3]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP4:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV4:%.*]] = fpext float [[TMP4]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP5:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV5:%.*]] = fpext float [[TMP5]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL6:%.*]] = call double @fmodl(double noundef [[CONV5]], double noundef [[CONV4]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[CONV7:%.*]] = fptrunc double [[CALL6]] to float
+// HAS_ERRNO_WIN-NEXT:    store float [[CONV7]], ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP6:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV8:%.*]] = fpext float [[TMP6]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP7:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV9:%.*]] = fpext float [[TMP7]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL10:%.*]] = call double @atan2(double noundef [[CONV9]], double noundef [[CONV8]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP8:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP9:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL11:%.*]] = call float @atan2f(float noundef [[TMP9]], float noundef [[TMP8]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP10:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV12:%.*]] = fpext float [[TMP10]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP11:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV13:%.*]] = fpext float [[TMP11]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL14:%.*]] = call double @atan2l(double noundef [[CONV13]], double noundef [[CONV12]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP12:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV15:%.*]] = fpext float [[TMP12]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP13:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV16:%.*]] = fpext float [[TMP13]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP14:%.*]] = call double @llvm.copysign.f64(double [[CONV15]], double [[CONV16]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP15:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP16:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP17:%.*]] = call float @llvm.copysign.f32(float [[TMP15]], float [[TMP16]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP18:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV17:%.*]] = fpext float [[TMP18]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP19:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV18:%.*]] = fpext float [[TMP19]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP20:%.*]] = call double @llvm.copysign.f64(double [[CONV17]], double [[CONV18]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP21:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV19:%.*]] = fpext float [[TMP21]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP22:%.*]] = call double @llvm.fabs.f64(double [[CONV19]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP23:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP24:%.*]] = call float @llvm.fabs.f32(float [[TMP23]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP25:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV20:%.*]] = fpext float [[TMP25]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP26:%.*]] = call double @llvm.fabs.f64(double [[CONV20]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP27:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP28:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV21:%.*]] = fpext float [[TMP28]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL22:%.*]] = call double @frexp(double noundef [[CONV21]], ptr noundef [[TMP27]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP29:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP30:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL23:%.*]] = call float @frexpf(float noundef [[TMP30]], ptr noundef [[TMP29]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP31:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP32:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV24:%.*]] = fpext float [[TMP32]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL25:%.*]] = call double @frexpl(double noundef [[CONV24]], ptr noundef [[TMP31]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP33:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV26:%.*]] = fptosi float [[TMP33]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP34:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV27:%.*]] = fpext float [[TMP34]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL28:%.*]] = call double @ldexp(double noundef [[CONV27]], i32 noundef [[CONV26]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP35:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV29:%.*]] = fptosi float [[TMP35]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP36:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL30:%.*]] = call float @ldexpf(float noundef [[TMP36]], i32 noundef [[CONV29]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP37:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV31:%.*]] = fptosi float [[TMP37]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP38:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV32:%.*]] = fpext float [[TMP38]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL33:%.*]] = call double @ldexpl(double noundef [[CONV32]], i32 noundef [[CONV31]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP39:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV34:%.*]] = fpext float [[TMP39]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP40:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP41:%.*]] = call { double, double } @llvm.modf.f64(double [[CONV34]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP42:%.*]] = extractvalue { double, double } [[TMP41]], 0
+// HAS_ERRNO_WIN-NEXT:    [[TMP43:%.*]] = extractvalue { double, double } [[TMP41]], 1
+// HAS_ERRNO_WIN-NEXT:    store double [[TMP43]], ptr [[TMP40]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP44:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP45:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP46:%.*]] = call { float, float } @llvm.modf.f32(float [[TMP44]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP47:%.*]] = extractvalue { float, float } [[TMP46]], 0
+// HAS_ERRNO_WIN-NEXT:    [[TMP48:%.*]] = extractvalue { float, float } [[TMP46]], 1
+// HAS_ERRNO_WIN-NEXT:    store float [[TMP48]], ptr [[TMP45]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP49:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV35:%.*]] = fpext float [[TMP49]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP50:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP51:%.*]] = call { double, double } @llvm.modf.f64(double [[CONV35]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP52:%.*]] = extractvalue { double, double } [[TMP51]], 0
+// HAS_ERRNO_WIN-NEXT:    [[TMP53:%.*]] = extractvalue { double, double } [[TMP51]], 1
+// HAS_ERRNO_WIN-NEXT:    store double [[TMP53]], ptr [[TMP50]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP54:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[CALL36:%.*]] = call double @nan(ptr noundef [[TMP54]]) #[[ATTR7:[0-9]+]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP55:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[CALL37:%.*]] = call float @nanf(ptr noundef [[TMP55]]) #[[ATTR7]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP56:%.*]] = load ptr, ptr [[C_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[CALL38:%.*]] = call double @nanl(ptr noundef [[TMP56]]) #[[ATTR7]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP57:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV39:%.*]] = fpext float [[TMP57]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP58:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV40:%.*]] = fpext float [[TMP58]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL41:%.*]] = call double @pow(double noundef [[CONV40]], double noundef [[CONV39]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP59:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP60:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL42:%.*]] = call float @powf(float noundef [[TMP60]], float noundef [[TMP59]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP61:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV43:%.*]] = fpext float [[TMP61]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP62:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV44:%.*]] = fpext float [[TMP62]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL45:%.*]] = call double @powl(double noundef [[CONV44]], double noundef [[CONV43]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP63:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV46:%.*]] = fpext float [[TMP63]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL47:%.*]] = call double @acos(double noundef [[CONV46]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP64:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL48:%.*]] = call float @acosf(float noundef [[TMP64]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP65:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV49:%.*]] = fpext float [[TMP65]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL50:%.*]] = call double @acosl(double noundef [[CONV49]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP66:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV51:%.*]] = fpext float [[TMP66]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL52:%.*]] = call double @acosh(double noundef [[CONV51]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP67:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL53:%.*]] = call float @acoshf(float noundef [[TMP67]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP68:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV54:%.*]] = fpext float [[TMP68]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL55:%.*]] = call double @acoshl(double noundef [[CONV54]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP69:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV56:%.*]] = fpext float [[TMP69]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL57:%.*]] = call double @asin(double noundef [[CONV56]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP70:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL58:%.*]] = call float @asinf(float noundef [[TMP70]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP71:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV59:%.*]] = fpext float [[TMP71]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL60:%.*]] = call double @asinl(double noundef [[CONV59]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP72:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV61:%.*]] = fpext float [[TMP72]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL62:%.*]] = call double @asinh(double noundef [[CONV61]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP73:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL63:%.*]] = call float @asinhf(float noundef [[TMP73]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP74:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV64:%.*]] = fpext float [[TMP74]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL65:%.*]] = call double @asinhl(double noundef [[CONV64]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP75:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV66:%.*]] = fpext float [[TMP75]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL67:%.*]] = call double @atan(double noundef [[CONV66]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP76:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL68:%.*]] = call float @atanf(float noundef [[TMP76]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP77:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV69:%.*]] = fpext float [[TMP77]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL70:%.*]] = call double @atanl(double noundef [[CONV69]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP78:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV71:%.*]] = fpext float [[TMP78]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL72:%.*]] = call double @atanh(double noundef [[CONV71]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP79:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL73:%.*]] = call float @atanhf(float noundef [[TMP79]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP80:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV74:%.*]] = fpext float [[TMP80]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL75:%.*]] = call double @atanhl(double noundef [[CONV74]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP81:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV76:%.*]] = fpext float [[TMP81]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL77:%.*]] = call double @cbrt(double noundef [[CONV76]]) #[[ATTR8:[0-9]+]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP82:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL78:%.*]] = call float @cbrtf(float noundef [[TMP82]]) #[[ATTR8]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP83:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV79:%.*]] = fpext float [[TMP83]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL80:%.*]] = call double @cbrtl(double noundef [[CONV79]]) #[[ATTR8]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP84:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV81:%.*]] = fpext float [[TMP84]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP85:%.*]] = call double @llvm.ceil.f64(double [[CONV81]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP86:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP87:%.*]] = call float @llvm.ceil.f32(float [[TMP86]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP88:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV82:%.*]] = fpext float [[TMP88]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP89:%.*]] = call double @llvm.ceil.f64(double [[CONV82]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP90:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV83:%.*]] = fpext float [[TMP90]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL84:%.*]] = call double @cos(double noundef [[CONV83]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP91:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL85:%.*]] = call float @cosf(float noundef [[TMP91]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP92:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV86:%.*]] = fpext float [[TMP92]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL87:%.*]] = call double @cosl(double noundef [[CONV86]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP93:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV88:%.*]] = fpext float [[TMP93]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL89:%.*]] = call double @cosh(double noundef [[CONV88]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP94:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL90:%.*]] = call float @coshf(float noundef [[TMP94]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP95:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV91:%.*]] = fpext float [[TMP95]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL92:%.*]] = call double @coshl(double noundef [[CONV91]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP96:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV93:%.*]] = fpext float [[TMP96]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL94:%.*]] = call double @erf(double noundef [[CONV93]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP97:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL95:%.*]] = call float @erff(float noundef [[TMP97]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP98:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV96:%.*]] = fpext float [[TMP98]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL97:%.*]] = call double @erfl(double noundef [[CONV96]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP99:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV98:%.*]] = fpext float [[TMP99]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL99:%.*]] = call double @erfc(double noundef [[CONV98]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP100:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL100:%.*]] = call float @erfcf(float noundef [[TMP100]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP101:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV101:%.*]] = fpext float [[TMP101]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL102:%.*]] = call double @erfcl(double noundef [[CONV101]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP102:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV103:%.*]] = fpext float [[TMP102]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL104:%.*]] = call double @exp(double noundef [[CONV103]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP103:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL105:%.*]] = call float @expf(float noundef [[TMP103]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP104:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV106:%.*]] = fpext float [[TMP104]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL107:%.*]] = call double @expl(double noundef [[CONV106]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP105:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV108:%.*]] = fpext float [[TMP105]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL109:%.*]] = call double @exp2(double noundef [[CONV108]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP106:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL110:%.*]] = call float @exp2f(float noundef [[TMP106]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP107:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV111:%.*]] = fpext float [[TMP107]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL112:%.*]] = call double @exp2l(double noundef [[CONV111]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP108:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV113:%.*]] = fpext float [[TMP108]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL114:%.*]] = call double @expm1(double noundef [[CONV113]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP109:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL115:%.*]] = call float @expm1f(float noundef [[TMP109]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP110:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV116:%.*]] = fpext float [[TMP110]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL117:%.*]] = call double @expm1l(double noundef [[CONV116]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP111:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV118:%.*]] = fpext float [[TMP111]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP112:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV119:%.*]] = fpext float [[TMP112]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL120:%.*]] = call double @fdim(double noundef [[CONV119]], double noundef [[CONV118]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP113:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP114:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL121:%.*]] = call float @fdimf(float noundef [[TMP114]], float noundef [[TMP113]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP115:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV122:%.*]] = fpext float [[TMP115]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP116:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV123:%.*]] = fpext float [[TMP116]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL124:%.*]] = call double @fdiml(double noundef [[CONV123]], double noundef [[CONV122]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP117:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV125:%.*]] = fpext float [[TMP117]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP118:%.*]] = call double @llvm.floor.f64(double [[CONV125]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP119:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP120:%.*]] = call float @llvm.floor.f32(float [[TMP119]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP121:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV126:%.*]] = fpext float [[TMP121]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP122:%.*]] = call double @llvm.floor.f64(double [[CONV126]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP123:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV127:%.*]] = fpext float [[TMP123]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP124:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV128:%.*]] = fpext float [[TMP124]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP125:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV129:%.*]] = fpext float [[TMP125]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP126:%.*]] = call double @llvm.fma.f64(double [[CONV127]], double [[CONV128]], double [[CONV129]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP127:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP128:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP129:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP130:%.*]] = call float @llvm.fma.f32(float [[TMP127]], float [[TMP128]], float [[TMP129]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP131:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV130:%.*]] = fpext float [[TMP131]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP132:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV131:%.*]] = fpext float [[TMP132]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP133:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV132:%.*]] = fpext float [[TMP133]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP134:%.*]] = call double @llvm.fma.f64(double [[CONV130]], double [[CONV131]], double [[CONV132]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP135:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV133:%.*]] = fpext nsz float [[TMP135]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP136:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV134:%.*]] = fpext nsz float [[TMP136]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP137:%.*]] = call nsz double @llvm.maxnum.f64(double [[CONV133]], double [[CONV134]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP138:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP139:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP140:%.*]] = call nsz float @llvm.maxnum.f32(float [[TMP138]], float [[TMP139]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP141:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV135:%.*]] = fpext nsz float [[TMP141]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP142:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV136:%.*]] = fpext nsz float [[TMP142]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP143:%.*]] = call nsz double @llvm.maxnum.f64(double [[CONV135]], double [[CONV136]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP144:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV137:%.*]] = fpext nsz float [[TMP144]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP145:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV138:%.*]] = fpext nsz float [[TMP145]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP146:%.*]] = call nsz double @llvm.minnum.f64(double [[CONV137]], double [[CONV138]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP147:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP148:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP149:%.*]] = call nsz float @llvm.minnum.f32(float [[TMP147]], float [[TMP148]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP150:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV139:%.*]] = fpext nsz float [[TMP150]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP151:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV140:%.*]] = fpext nsz float [[TMP151]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP152:%.*]] = call nsz double @llvm.minnum.f64(double [[CONV139]], double [[CONV140]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP153:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP154:%.*]] = load double, ptr [[TMP153]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP155:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP156:%.*]] = load double, ptr [[TMP155]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP157:%.*]] = call double @llvm.maximumnum.f64(double [[TMP154]], double [[TMP156]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP158:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP159:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP160:%.*]] = call float @llvm.maximumnum.f32(float [[TMP158]], float [[TMP159]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP161:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP162:%.*]] = load double, ptr [[TMP161]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP163:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP164:%.*]] = load double, ptr [[TMP163]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP165:%.*]] = call double @llvm.maximumnum.f64(double [[TMP162]], double [[TMP164]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP166:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP167:%.*]] = load double, ptr [[TMP166]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP168:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP169:%.*]] = load double, ptr [[TMP168]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP170:%.*]] = call double @llvm.minimumnum.f64(double [[TMP167]], double [[TMP169]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP171:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP172:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP173:%.*]] = call float @llvm.minimumnum.f32(float [[TMP171]], float [[TMP172]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP174:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP175:%.*]] = load double, ptr [[TMP174]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP176:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP177:%.*]] = load double, ptr [[TMP176]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP178:%.*]] = call double @llvm.minimumnum.f64(double [[TMP175]], double [[TMP177]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP179:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV141:%.*]] = fpext float [[TMP179]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP180:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV142:%.*]] = fpext float [[TMP180]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL143:%.*]] = call double @hypot(double noundef [[CONV142]], double noundef [[CONV141]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP181:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP182:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL144:%.*]] = call float @hypotf(float noundef [[TMP182]], float noundef [[TMP181]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP183:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV145:%.*]] = fpext float [[TMP183]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP184:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV146:%.*]] = fpext float [[TMP184]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL147:%.*]] = call double @hypotl(double noundef [[CONV146]], double noundef [[CONV145]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP185:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV148:%.*]] = fpext float [[TMP185]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL149:%.*]] = call i32 @ilogb(double noundef [[CONV148]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP186:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL150:%.*]] = call i32 @ilogbf(float noundef [[TMP186]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP187:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV151:%.*]] = fpext float [[TMP187]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL152:%.*]] = call i32 @ilogbl(double noundef [[CONV151]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP188:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV153:%.*]] = fpext float [[TMP188]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL154:%.*]] = call double @lgamma(double noundef [[CONV153]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP189:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL155:%.*]] = call float @lgammaf(float noundef [[TMP189]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP190:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV156:%.*]] = fpext float [[TMP190]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL157:%.*]] = call double @lgammal(double noundef [[CONV156]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP191:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV158:%.*]] = fpext float [[TMP191]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL159:%.*]] = call i64 @llrint(double noundef [[CONV158]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP192:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL160:%.*]] = call i64 @llrintf(float noundef [[TMP192]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP193:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV161:%.*]] = fpext float [[TMP193]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL162:%.*]] = call i64 @llrintl(double noundef [[CONV161]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP194:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV163:%.*]] = fpext float [[TMP194]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL164:%.*]] = call i64 @llround(double noundef [[CONV163]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP195:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL165:%.*]] = call i64 @llroundf(float noundef [[TMP195]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP196:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV166:%.*]] = fpext float [[TMP196]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL167:%.*]] = call i64 @llroundl(double noundef [[CONV166]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP197:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV168:%.*]] = fpext float [[TMP197]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL169:%.*]] = call double @log(double noundef [[CONV168]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP198:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL170:%.*]] = call float @logf(float noundef [[TMP198]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP199:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV171:%.*]] = fpext float [[TMP199]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL172:%.*]] = call double @logl(double noundef [[CONV171]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP200:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV173:%.*]] = fpext float [[TMP200]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL174:%.*]] = call double @log10(double noundef [[CONV173]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP201:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL175:%.*]] = call float @log10f(float noundef [[TMP201]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP202:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV176:%.*]] = fpext float [[TMP202]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL177:%.*]] = call double @log10l(double noundef [[CONV176]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP203:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV178:%.*]] = fpext float [[TMP203]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL179:%.*]] = call double @log1p(double noundef [[CONV178]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP204:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL180:%.*]] = call float @log1pf(float noundef [[TMP204]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP205:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV181:%.*]] = fpext float [[TMP205]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL182:%.*]] = call double @log1pl(double noundef [[CONV181]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP206:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV183:%.*]] = fpext float [[TMP206]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL184:%.*]] = call double @log2(double noundef [[CONV183]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP207:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL185:%.*]] = call float @log2f(float noundef [[TMP207]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP208:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV186:%.*]] = fpext float [[TMP208]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL187:%.*]] = call double @log2l(double noundef [[CONV186]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP209:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV188:%.*]] = fpext float [[TMP209]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL189:%.*]] = call double @logb(double noundef [[CONV188]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP210:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL190:%.*]] = call float @logbf(float noundef [[TMP210]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP211:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV191:%.*]] = fpext float [[TMP211]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL192:%.*]] = call double @logbl(double noundef [[CONV191]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP212:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV193:%.*]] = fpext float [[TMP212]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL194:%.*]] = call i32 @lrint(double noundef [[CONV193]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP213:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL195:%.*]] = call i32 @lrintf(float noundef [[TMP213]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP214:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV196:%.*]] = fpext float [[TMP214]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL197:%.*]] = call i32 @lrintl(double noundef [[CONV196]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP215:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV198:%.*]] = fpext float [[TMP215]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL199:%.*]] = call i32 @lround(double noundef [[CONV198]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP216:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL200:%.*]] = call i32 @lroundf(float noundef [[TMP216]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP217:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV201:%.*]] = fpext float [[TMP217]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL202:%.*]] = call i32 @lroundl(double noundef [[CONV201]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP218:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV203:%.*]] = fpext float [[TMP218]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP219:%.*]] = call double @llvm.nearbyint.f64(double [[CONV203]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP220:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP221:%.*]] = call float @llvm.nearbyint.f32(float [[TMP220]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP222:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV204:%.*]] = fpext float [[TMP222]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP223:%.*]] = call double @llvm.nearbyint.f64(double [[CONV204]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP224:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV205:%.*]] = fpext float [[TMP224]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP225:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV206:%.*]] = fpext float [[TMP225]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL207:%.*]] = call double @nextafter(double noundef [[CONV206]], double noundef [[CONV205]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP226:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP227:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL208:%.*]] = call float @nextafterf(float noundef [[TMP227]], float noundef [[TMP226]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP228:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV209:%.*]] = fpext float [[TMP228]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP229:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV210:%.*]] = fpext float [[TMP229]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL211:%.*]] = call double @nextafterl(double noundef [[CONV210]], double noundef [[CONV209]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP230:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV212:%.*]] = fpext float [[TMP230]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP231:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV213:%.*]] = fpext float [[TMP231]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL214:%.*]] = call double @nexttoward(double noundef [[CONV213]], double noundef [[CONV212]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP232:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV215:%.*]] = fpext float [[TMP232]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP233:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL216:%.*]] = call float @nexttowardf(float noundef [[TMP233]], double noundef [[CONV215]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP234:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV217:%.*]] = fpext float [[TMP234]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP235:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV218:%.*]] = fpext float [[TMP235]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL219:%.*]] = call double @nexttowardl(double noundef [[CONV218]], double noundef [[CONV217]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP236:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV220:%.*]] = fpext float [[TMP236]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP237:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV221:%.*]] = fpext float [[TMP237]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL222:%.*]] = call double @remainder(double noundef [[CONV221]], double noundef [[CONV220]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP238:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP239:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL223:%.*]] = call float @remainderf(float noundef [[TMP239]], float noundef [[TMP238]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP240:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV224:%.*]] = fpext float [[TMP240]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP241:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV225:%.*]] = fpext float [[TMP241]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL226:%.*]] = call double @remainderl(double noundef [[CONV225]], double noundef [[CONV224]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP242:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP243:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV227:%.*]] = fpext float [[TMP243]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP244:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV228:%.*]] = fpext float [[TMP244]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL229:%.*]] = call double @remquo(double noundef [[CONV228]], double noundef [[CONV227]], ptr noundef [[TMP242]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP245:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP246:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP247:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL230:%.*]] = call float @remquof(float noundef [[TMP247]], float noundef [[TMP246]], ptr noundef [[TMP245]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP248:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP249:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV231:%.*]] = fpext float [[TMP249]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP250:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV232:%.*]] = fpext float [[TMP250]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL233:%.*]] = call double @remquol(double noundef [[CONV232]], double noundef [[CONV231]], ptr noundef [[TMP248]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP251:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV234:%.*]] = fpext float [[TMP251]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP252:%.*]] = call double @llvm.rint.f64(double [[CONV234]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP253:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP254:%.*]] = call float @llvm.rint.f32(float [[TMP253]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP255:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV235:%.*]] = fpext float [[TMP255]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP256:%.*]] = call double @llvm.rint.f64(double [[CONV235]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP257:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV236:%.*]] = fpext float [[TMP257]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP258:%.*]] = call double @llvm.round.f64(double [[CONV236]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP259:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP260:%.*]] = call float @llvm.round.f32(float [[TMP259]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP261:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV237:%.*]] = fpext float [[TMP261]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP262:%.*]] = call double @llvm.round.f64(double [[CONV237]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP263:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV238:%.*]] = fptosi float [[TMP263]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP264:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV239:%.*]] = fpext float [[TMP264]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL240:%.*]] = call double @scalbln(double noundef [[CONV239]], i32 noundef [[CONV238]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP265:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV241:%.*]] = fptosi float [[TMP265]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP266:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL242:%.*]] = call float @scalblnf(float noundef [[TMP266]], i32 noundef [[CONV241]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP267:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV243:%.*]] = fptosi float [[TMP267]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP268:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV244:%.*]] = fpext float [[TMP268]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL245:%.*]] = call double @scalblnl(double noundef [[CONV244]], i32 noundef [[CONV243]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP269:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV246:%.*]] = fptosi float [[TMP269]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP270:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV247:%.*]] = fpext float [[TMP270]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL248:%.*]] = call double @scalbn(double noundef [[CONV247]], i32 noundef [[CONV246]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP271:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV249:%.*]] = fptosi float [[TMP271]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP272:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL250:%.*]] = call float @scalbnf(float noundef [[TMP272]], i32 noundef [[CONV249]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP273:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV251:%.*]] = fptosi float [[TMP273]] to i32
+// HAS_ERRNO_WIN-NEXT:    [[TMP274:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV252:%.*]] = fpext float [[TMP274]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL253:%.*]] = call double @scalbnl(double noundef [[CONV252]], i32 noundef [[CONV251]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP275:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV254:%.*]] = fpext float [[TMP275]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL255:%.*]] = call double @sin(double noundef [[CONV254]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP276:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL256:%.*]] = call float @sinf(float noundef [[TMP276]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP277:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV257:%.*]] = fpext float [[TMP277]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL258:%.*]] = call double @sinl(double noundef [[CONV257]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP278:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV259:%.*]] = fpext float [[TMP278]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL260:%.*]] = call double @sinh(double noundef [[CONV259]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP279:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL261:%.*]] = call float @sinhf(float noundef [[TMP279]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP280:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV262:%.*]] = fpext float [[TMP280]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL263:%.*]] = call double @sinhl(double noundef [[CONV262]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP281:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP282:%.*]] = load ptr, ptr [[D_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP283:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV264:%.*]] = fpext float [[TMP283]] to double
+// HAS_ERRNO_WIN-NEXT:    call void @sincos(double noundef [[CONV264]], ptr noundef [[TMP282]], ptr noundef [[TMP281]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP284:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP285:%.*]] = load ptr, ptr [[FP_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP286:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    call void @sincosf(float noundef [[TMP286]], ptr noundef [[TMP285]], ptr noundef [[TMP284]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP287:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP288:%.*]] = load ptr, ptr [[L_ADDR]], align 8
+// HAS_ERRNO_WIN-NEXT:    [[TMP289:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV265:%.*]] = fpext float [[TMP289]] to double
+// HAS_ERRNO_WIN-NEXT:    call void @sincosl(double noundef [[CONV265]], ptr noundef [[TMP288]], ptr noundef [[TMP287]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP290:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV266:%.*]] = fpext float [[TMP290]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL267:%.*]] = call double @sqrt(double noundef [[CONV266]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP291:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL268:%.*]] = call float @sqrtf(float noundef [[TMP291]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP292:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV269:%.*]] = fpext float [[TMP292]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL270:%.*]] = call double @sqrtl(double noundef [[CONV269]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP293:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV271:%.*]] = fpext float [[TMP293]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL272:%.*]] = call double @tan(double noundef [[CONV271]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP294:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL273:%.*]] = call float @tanf(float noundef [[TMP294]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP295:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV274:%.*]] = fpext float [[TMP295]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL275:%.*]] = call double @tanl(double noundef [[CONV274]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP296:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV276:%.*]] = fpext float [[TMP296]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL277:%.*]] = call double @tanh(double noundef [[CONV276]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP297:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL278:%.*]] = call float @tanhf(float noundef [[TMP297]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP298:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV279:%.*]] = fpext float [[TMP298]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL280:%.*]] = call double @tanhl(double noundef [[CONV279]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP299:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV281:%.*]] = fpext float [[TMP299]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL282:%.*]] = call double @tgamma(double noundef [[CONV281]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP300:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CALL283:%.*]] = call float @tgammaf(float noundef [[TMP300]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP301:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV284:%.*]] = fpext float [[TMP301]] to double
+// HAS_ERRNO_WIN-NEXT:    [[CALL285:%.*]] = call double @tgammal(double noundef [[CONV284]]) #[[ATTR6]]
+// HAS_ERRNO_WIN-NEXT:    [[TMP302:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV286:%.*]] = fpext float [[TMP302]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP303:%.*]] = call double @llvm.trunc.f64(double [[CONV286]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP304:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[TMP305:%.*]] = call float @llvm.trunc.f32(float [[TMP304]])
+// HAS_ERRNO_WIN-NEXT:    [[TMP306:%.*]] = load float, ptr [[F_ADDR]], align 4
+// HAS_ERRNO_WIN-NEXT:    [[CONV287:%.*]] = fpext float [[TMP306]] to double
+// HAS_ERRNO_WIN-NEXT:    [[TMP307:%.*]] = call double @llvm.trunc.f64(double [[CONV287]])
+// HAS_ERRNO_WIN-NEXT:    ret void
+//
 void foo(double *d, float f, float *fp, long double *l, int *i, const char *c) {
   f = fmod(f,f);     f = fmodf(f,f);    f = fmodl(f,f);
 
-  // NO__ERRNO: frem double
-  // NO__ERRNO: frem float
-  // NO__ERRNO: frem x86_fp80
-  // HAS_ERRNO: declare double @fmod(double noundef, double noundef) [[NOT_READNONE:#[0-9]+]]
-  // HAS_ERRNO: declare float @fmodf(float noundef, float noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare x86_fp80 @fmodl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare double @llvm.experimental.constrained.frem.f64(
-  // HAS_MAYTRAP: declare float @llvm.experimental.constrained.frem.f32(
-  // HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.frem.f80(
 
   atan2(f,f);    atan2f(f,f) ;  atan2l(f, f);
 
-  // NO__ERRNO: declare double @llvm.atan2.f64(double, double) [[READNONE_INTRINSIC:#[0-9]+]]
-  // NO__ERRNO: declare float @llvm.atan2.f32(float, float) [[READNONE_INTRINSIC]]
-  // NO__ERRNO: declare x86_fp80 @llvm.atan2.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC]]
-  // HAS_ERRNO: declare double @atan2(double noundef, double noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare float @atan2f(float noundef, float noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare x86_fp80 @atan2l(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare double @llvm.experimental.constrained.atan2.f64(
-  // HAS_MAYTRAP: declare float @llvm.experimental.constrained.atan2.f32(
-  // HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.atan2.f80(
 
   copysign(f,f); copysignf(f,f);copysignl(f,f);
 
-  // NO__ERRNO: declare double @llvm.copysign.f64(double, double) [[READNONE_INTRINSIC2:#[0-9]+]]
-  // NO__ERRNO: declare float @llvm.copysign.f32(float, float) [[READNONE_INTRINSIC2]]
-  // NO__ERRNO: declare x86_fp80 @llvm.copysign.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-  // HAS_ERRNO: declare double @llvm.copysign.f64(double, double) [[READNONE_INTRINSIC2:#[0-9]+]]
-  // HAS_ERRNO: declare float @llvm.copysign.f32(float, float) [[READNONE_INTRINSIC2]]
-  // HAS_ERRNO: declare x86_fp80 @llvm.copysign.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-  // HAS_MAYTRAP: declare double @llvm.copysign.f64(double, double) [[READNONE_INTRINSIC2:#[0-9]+]]
-  // HAS_MAYTRAP: declare float @llvm.copysign.f32(float, float) [[READNONE_INTRINSIC2]]
-  // HAS_MAYTRAP: declare x86_fp80 @llvm.copysign.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
 
   fabs(f);       fabsf(f);      fabsl(f);
 
-  // NO__ERRNO: declare double @llvm.fabs.f64(double) [[READNONE_INTRINSIC2]]
-  // NO__ERRNO: declare float @llvm.fabs.f32(float) [[READNONE_INTRINSIC2]]
-  // NO__ERRNO: declare x86_fp80 @llvm.fabs.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-  // HAS_ERRNO: declare double @llvm.fabs.f64(double) [[READNONE_INTRINSIC2]]
-  // HAS_ERRNO: declare float @llvm.fabs.f32(float) [[READNONE_INTRINSIC2]]
-  // HAS_ERRNO: declare x86_fp80 @llvm.fabs.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-  // HAS_MAYTRAP: declare double @llvm.fabs.f64(double) [[READNONE_INTRINSIC2]]
-  // HAS_MAYTRAP: declare float @llvm.fabs.f32(float) [[READNONE_INTRINSIC2]]
-  // HAS_MAYTRAP: declare x86_fp80 @llvm.fabs.f80(x86_fp80) [[READNONE_INTRINSIC2]]
 
   frexp(f,i);    frexpf(f,i);   frexpl(f,i);
 
-  // NO__ERRNO: declare double @frexp(double noundef, ptr noundef) [[NOT_READNONE:#[0-9]+]]
-  // NO__ERRNO: declare float @frexpf(float noundef, ptr noundef) [[NOT_READNONE]]
-  // NO__ERRNO: declare x86_fp80 @frexpl(x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare double @frexp(double noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare float @frexpf(float noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare x86_fp80 @frexpl(x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare double @frexp(double noundef, ptr noundef) [[NOT_READNONE:#[0-9]+]]
-  // HAS_MAYTRAP: declare float @frexpf(float noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare x86_fp80 @frexpl(x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
 
   ldexp(f,f);    ldexpf(f,f);   ldexpl(f,f);
 
-  // NO__ERRNO: declare double @ldexp(double noundef, i32 noundef) [[READNONE:#[0-9]+]]
-  // NO__ERRNO: declare float @ldexpf(float noundef, i32 noundef) [[READNONE]]
-  // NO__ERRNO: declare x86_fp80 @ldexpl(x86_fp80 noundef, i32 noundef) [[READNONE]]
-  // HAS_ERRNO: declare double @ldexp(double noundef, i32 noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare float @ldexpf(float noundef, i32 noundef) [[NOT_READNONE]]
-  // HAS_ERRNO: declare x86_fp80 @ldexpl(x86_fp80 noundef, i32 noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare double @ldexp(double noundef, i32 noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare float @ldexpf(float noundef, i32 noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare x86_fp80 @ldexpl(x86_fp80 noundef, i32 noundef) [[NOT_READNONE]]
 
   modf(f,d);       modff(f,fp);      modfl(f,l);
 
-  // NO__ERRNO: declare { double, double } @llvm.modf.f64(double) [[READNONE_INTRINSIC]]
-  // NO__ERRNO: declare { float, float } @llvm.modf.f32(float) [[READNONE_INTRINSIC]]
-  // NO__ERRNO: declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80) [[READNONE_INTRINSIC]]
-  // HAS_ERRNO: declare { double, double } @llvm.modf.f64(double) [[READNONE_INTRINSIC:#[0-9]+]]
-  // HAS_ERRNO: declare { float, float } @llvm.modf.f32(float) [[READNONE_INTRINSIC]]
-  // HAS_ERRNO: declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80) [[READNONE_INTRINSIC]]
-  // HAS_MAYTRAP: declare double @modf(double noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare float @modff(float noundef, ptr noundef) [[NOT_READNONE]]
-  // HAS_MAYTRAP: declare x86_fp80 @modfl(x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
 
   nan(c);        nanf(c);       nanl(c);
 
-// NO__ERRNO: declare double @nan(ptr noundef) [[READONLY:#[0-9]+]]
-// NO__ERRNO: declare float @nanf(ptr noundef) [[READONLY]]
-// NO__ERRNO: declare x86_fp80 @nanl(ptr noundef) [[READONLY]]
-// HAS_ERRNO: declare double @nan(ptr noundef) [[READONLY:#[0-9]+]]
-// HAS_ERRNO: declare float @nanf(ptr noundef) [[READONLY]]
-// HAS_ERRNO: declare x86_fp80 @nanl(ptr noundef) [[READONLY]]
-// HAS_MAYTRAP: declare double @nan(ptr noundef) [[READONLY:#[0-9]+]]
-// HAS_MAYTRAP: declare float @nanf(ptr noundef) [[READONLY]]
-// HAS_MAYTRAP: declare x86_fp80 @nanl(ptr noundef) [[READONLY]]
 
   pow(f,f);        powf(f,f);       powl(f,f);
 
-// NO__ERRNO: declare double @llvm.pow.f64(double, double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.pow.f32(float, float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.pow.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @pow(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @powf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @powl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.pow.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.pow.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.pow.f80({{.*}})
 
 
   /* math */
   acos(f);       acosf(f);      acosl(f);
 
-// NO__ERRNO: declare double @llvm.acos.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.acos.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.acos.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @acos(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @acosf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @acosl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.acos.f64(
-// HAS_MAYTRAP: declare float  @llvm.experimental.constrained.acos.f32(
-// HAS_MAYTRAP: declare x86_fp80  @llvm.experimental.constrained.acos.f80(
 
 
   acosh(f);      acoshf(f);     acoshl(f);
 
-// NO__ERRNO: declare double @acosh(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @acoshf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @acoshl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @acosh(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @acoshf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @acoshl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @acosh(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @acoshf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @acoshl(x86_fp80 noundef) [[NOT_READNONE]]
 
   asin(f);       asinf(f);      asinl(f);
 
-// NO__ERRNO: declare double @llvm.asin.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.asin.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.asin.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @asin(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @asinf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @asinl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.asin.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.asin.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.asin.f80(
 
   asinh(f);      asinhf(f);     asinhl(f);
 
-// NO__ERRNO: declare double @asinh(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @asinhf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @asinhl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @asinh(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @asinhf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @asinhl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @asinh(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @asinhf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @asinhl(x86_fp80 noundef) [[NOT_READNONE]]
 
   atan(f);       atanf(f);      atanl(f);
 
-// NO__ERRNO: declare double @llvm.atan.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.atan.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.atan.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @atan(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @atanf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @atanl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.atan.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.atan.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.atan.f80(
 
   atanh(f);      atanhf(f);     atanhl(f);
 
-// NO__ERRNO: declare double @atanh(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @atanhf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @atanhl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @atanh(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @atanhf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @atanhl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @atanh(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @atanhf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @atanhl(x86_fp80 noundef) [[NOT_READNONE]]
 
   cbrt(f);       cbrtf(f);      cbrtl(f);
 
-// NO__ERRNO: declare double @cbrt(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @cbrtf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @cbrtl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @cbrt(double noundef) [[READNONE:#[0-9]+]]
-// HAS_ERRNO: declare float @cbrtf(float noundef) [[READNONE]]
-// HAS_ERRNO: declare x86_fp80 @cbrtl(x86_fp80 noundef) [[READNONE]]
-// HAS_MAYTRAP: declare double @cbrt(double noundef) [[READNONE:#[0-9]+]]
-// HAS_MAYTRAP: declare float @cbrtf(float noundef) [[READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @cbrtl(x86_fp80 noundef) [[READNONE]]
 
   ceil(f);       ceilf(f);      ceill(f);
 
-// NO__ERRNO: declare double @llvm.ceil.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.ceil.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.ceil.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.ceil.f64(double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.ceil.f32(float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.ceil.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.ceil.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.ceil.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.ceil.f80(
 
   cos(f);        cosf(f);       cosl(f);
 
-// NO__ERRNO: declare double @llvm.cos.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.cos.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.cos.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @cos(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @cosf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @cosl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.cos.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.cos.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.cos.f80(
 
   cosh(f);       coshf(f);      coshl(f);
 
-// NO__ERRNO: declare double @llvm.cosh.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.cosh.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.cosh.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @cosh(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @coshf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @coshl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.cosh.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.cosh.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.cosh.f80(
 
   erf(f);        erff(f);       erfl(f);
 
-// NO__ERRNO: declare double @erf(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @erff(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @erfl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @erf(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @erff(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @erfl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @erf(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @erff(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @erfl(x86_fp80 noundef) [[NOT_READNONE]]
 
   erfc(f);       erfcf(f);      erfcl(f);
 
-// NO__ERRNO: declare double @erfc(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @erfcf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @erfcl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @erfc(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @erfcf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @erfcl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @erfc(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @erfcf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @erfcl(x86_fp80 noundef) [[NOT_READNONE]]
 
   exp(f);        expf(f);       expl(f);
 
-// NO__ERRNO: declare double @llvm.exp.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.exp.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.exp.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @exp(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @expf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @expl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.exp.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.exp.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.exp.f80(
 
   exp2(f);       exp2f(f);      exp2l(f);
 
-// NO__ERRNO: declare double @llvm.exp2.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.exp2.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.exp2.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @exp2(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @exp2f(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @exp2l(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.exp2.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.exp2.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.exp2.f80(
 
   expm1(f);      expm1f(f);     expm1l(f);
 
-// NO__ERRNO: declare double @expm1(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @expm1f(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @expm1l(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @expm1(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @expm1f(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @expm1l(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @expm1(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @expm1f(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @expm1l(x86_fp80 noundef) [[NOT_READNONE]]
 
   fdim(f,f);       fdimf(f,f);      fdiml(f,f);
 
-// NO__ERRNO: declare double @fdim(double noundef, double noundef) [[READNONE]]
-// NO__ERRNO: declare float @fdimf(float noundef, float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @fdiml(x86_fp80 noundef, x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @fdim(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @fdimf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @fdiml(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @fdim(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @fdimf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @fdiml(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
 
   floor(f);      floorf(f);     floorl(f);
 
-// NO__ERRNO: declare double @llvm.floor.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.floor.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.floor.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.floor.f64(double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.floor.f32(float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.floor.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.floor.f64
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.floor.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.floor.f80(
 
   fma(f,f,f);        fmaf(f,f,f);       fmal(f,f,f);
 
-// NO__ERRNO: declare double @llvm.fma.f64(double, double, double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.fma.f32(float, float, float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.fma.f80(x86_fp80, x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @fma(double noundef, double noundef, double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @fmaf(float noundef, float noundef, float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @fmal(x86_fp80 noundef, x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
 
 // On GNU or Win, fma never sets errno, so we can convert to the intrinsic.
 
-// HAS_ERRNO_GNU: declare double @llvm.fma.f64(double, double, double) [[READNONE_INTRINSIC:#[0-9]+]]
-// HAS_ERRNO_GNU: declare float @llvm.fma.f32(float, float, float) [[READNONE_INTRINSIC]]
-// HAS_ERRNO_GNU: declare x86_fp80 @llvm.fma.f80(x86_fp80, x86_fp80, x86_fp80) [[READNONE_INTRINSIC]]
 
-// HAS_ERRNO_WIN: declare double @llvm.fma.f64(double, double, double) [[READNONE_INTRINSIC:#[0-9]+]]
-// HAS_ERRNO_WIN: declare float @llvm.fma.f32(float, float, float) [[READNONE_INTRINSIC]]
 // Long double is just double on win, so no f80 use/declaration.
-// HAS_ERRNO_WIN-NOT: declare x86_fp80 @llvm.fma.f80(x86_fp80, x86_fp80, x86_fp80)
 
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.fma.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.fma.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.fma.f80(
 
   fmax(f,f);       fmaxf(f,f);      fmaxl(f,f);
 
-// NO__ERRNO: declare double @llvm.maxnum.f64(double, double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.maxnum.f32(float, float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.maxnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.maxnum.f64(double, double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.maxnum.f32(float, float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.maxnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.maxnum.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.maxnum.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.maxnum.f80(
 
   fmin(f,f);       fminf(f,f);      fminl(f,f);
 
-// NO__ERRNO: declare double @llvm.minnum.f64(double, double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.minnum.f32(float, float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.minnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.minnum.f64(double, double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.minnum.f32(float, float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.minnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.minnum.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.minnum.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.minnum.f80(
 
   fmaximum_num(*d,*d);       fmaximum_numf(f,f);      fmaximum_numl(*l,*l);
 
-// COMMON: declare double @llvm.maximumnum.f64(double, double) [[READNONE_INTRINSIC2]]
-// COMMON: declare float @llvm.maximumnum.f32(float, float) [[READNONE_INTRINSIC2]]
-// COMMON: declare x86_fp80 @llvm.maximumnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
 
   fminimum_num(*d,*d);       fminimum_numf(f,f);      fminimum_numl(*l,*l);
 
-// COMMON: declare double @llvm.minimumnum.f64(double, double) [[READNONE_INTRINSIC2]]
-// COMMON: declare float @llvm.minimumnum.f32(float, float) [[READNONE_INTRINSIC2]]
-// COMMON: declare x86_fp80 @llvm.minimumnum.f80(x86_fp80, x86_fp80) [[READNONE_INTRINSIC2]]
 
   hypot(f,f);      hypotf(f,f);     hypotl(f,f);
 
-// NO__ERRNO: declare double @hypot(double noundef, double noundef) [[READNONE]]
-// NO__ERRNO: declare float @hypotf(float noundef, float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @hypotl(x86_fp80 noundef, x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @hypot(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @hypotf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @hypotl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @hypot(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @hypotf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @hypotl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
 
   ilogb(f);      ilogbf(f);     ilogbl(f);
 
-// NO__ERRNO: declare i32 @ilogb(double noundef) [[READNONE]]
-// NO__ERRNO: declare i32 @ilogbf(float noundef) [[READNONE]]
-// NO__ERRNO: declare i32 @ilogbl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare i32 @ilogb(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i32 @ilogbf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i32 @ilogbl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i32 @ilogb(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i32 @ilogbf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i32 @ilogbl(x86_fp80 noundef) [[NOT_READNONE]]
 
   lgamma(f);     lgammaf(f);    lgammal(f);
 
-// NO__ERRNO: declare double @lgamma(double noundef) [[NOT_READNONE]]
-// NO__ERRNO: declare float @lgammaf(float noundef) [[NOT_READNONE]]
-// NO__ERRNO: declare x86_fp80 @lgammal(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare double @lgamma(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @lgammaf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @lgammal(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @lgamma(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @lgammaf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @lgammal(x86_fp80 noundef) [[NOT_READNONE]]
 
   llrint(f);     llrintf(f);    llrintl(f);
 
-// NO__ERRNO: declare i64 @llvm.llrint.i64.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.llrint.i64.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.llrint.i64.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare i64 @llrint(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @llrintf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @llrintl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.llrint.i64.f64(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.llrint.i64.f32(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.llrint.i64.f80(
 
   llround(f);    llroundf(f);   llroundl(f);
 
-// NO__ERRNO: declare i64 @llvm.llround.i64.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.llround.i64.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.llround.i64.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare i64 @llround(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @llroundf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @llroundl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.llround.i64.f64(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.llround.i64.f32(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.llround.i64.f80(
 
   log(f);        logf(f);       logl(f);
 
-// NO__ERRNO: declare double @llvm.log.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.log.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.log.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @log(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @logf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @logl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.log.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.log.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.log.f80(
 
   log10(f);      log10f(f);     log10l(f);
 
-// NO__ERRNO: declare double @llvm.log10.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.log10.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.log10.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @log10(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @log10f(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @log10l(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.log10.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.log10.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.log10.f80(
 
   log1p(f);      log1pf(f);     log1pl(f);
 
-// NO__ERRNO: declare double @log1p(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @log1pf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @log1pl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @log1p(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @log1pf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @log1pl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @log1p(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @log1pf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @log1pl(x86_fp80 noundef) [[NOT_READNONE]]
 
   log2(f);       log2f(f);      log2l(f);
 
-// NO__ERRNO: declare double @llvm.log2.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.log2.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.log2.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @log2(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @log2f(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @log2l(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.log2.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.log2.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.log2.f80(
 
   logb(f);       logbf(f);      logbl(f);
 
-// NO__ERRNO: declare double @logb(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @logbf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @logbl(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @logb(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @logbf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @logbl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @logb(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @logbf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @logbl(x86_fp80 noundef) [[NOT_READNONE]]
 
   lrint(f);      lrintf(f);     lrintl(f);
 
-// NO__ERRNO: declare i64 @llvm.lrint.i64.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.lrint.i64.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.lrint.i64.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare i64 @lrint(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @lrintf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @lrintl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.lrint.i64.f64(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.lrint.i64.f32(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.lrint.i64.f80(
 
   lround(f);     lroundf(f);    lroundl(f);
 
-// NO__ERRNO: declare i64 @llvm.lround.i64.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.lround.i64.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare i64 @llvm.lround.i64.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare i64 @lround(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @lroundf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare i64 @lroundl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.lround.i64.f64(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.lround.i64.f32(
-// HAS_MAYTRAP: declare i64 @llvm.experimental.constrained.lround.i64.f80(
 
   nearbyint(f);  nearbyintf(f); nearbyintl(f);
 
-// NO__ERRNO: declare double @llvm.nearbyint.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.nearbyint.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.nearbyint.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.nearbyint.f64(double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.nearbyint.f32(float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.nearbyint.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.nearbyint.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.nearbyint.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.nearbyint.f80(
 
   nextafter(f,f);  nextafterf(f,f); nextafterl(f,f);
 
-// NO__ERRNO: declare double @nextafter(double noundef, double noundef) [[READNONE]]
-// NO__ERRNO: declare float @nextafterf(float noundef, float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @nextafterl(x86_fp80 noundef, x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @nextafter(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @nextafterf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @nextafterl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @nextafter(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @nextafterf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @nextafterl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
 
   nexttoward(f,f); nexttowardf(f,f);nexttowardl(f,f);
 
-// NO__ERRNO: declare double @nexttoward(double noundef, x86_fp80 noundef) [[READNONE]]
-// NO__ERRNO: declare float @nexttowardf(float noundef, x86_fp80 noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @nexttowardl(x86_fp80 noundef, x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @nexttoward(double noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @nexttowardf(float noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @nexttowardl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @nexttoward(double noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @nexttowardf(float noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @nexttowardl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
 
   remainder(f,f);  remainderf(f,f); remainderl(f,f);
 
-// NO__ERRNO: declare double @remainder(double noundef, double noundef) [[READNONE]]
-// NO__ERRNO: declare float @remainderf(float noundef, float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @remainderl(x86_fp80 noundef, x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @remainder(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @remainderf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @remainderl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @remainder(double noundef, double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @remainderf(float noundef, float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @remainderl(x86_fp80 noundef, x86_fp80 noundef) [[NOT_READNONE]]
 
   remquo(f,f,i);  remquof(f,f,i); remquol(f,f,i);
 
-// NO__ERRNO: declare double @remquo(double noundef, double noundef, ptr noundef) [[NOT_READNONE]]
-// NO__ERRNO: declare float @remquof(float noundef, float noundef, ptr noundef) [[NOT_READNONE]]
-// NO__ERRNO: declare x86_fp80 @remquol(x86_fp80 noundef, x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare double @remquo(double noundef, double noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @remquof(float noundef, float noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @remquol(x86_fp80 noundef, x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @remquo(double noundef, double noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @remquof(float noundef, float noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @remquol(x86_fp80 noundef, x86_fp80 noundef, ptr noundef) [[NOT_READNONE]]
 
   rint(f);       rintf(f);      rintl(f);
 
-// NO__ERRNO: declare double @llvm.rint.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.rint.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.rint.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.rint.f64(double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.rint.f32(float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.rint.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.rint.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.rint.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.rint.f80(
 
   round(f);      roundf(f);     roundl(f);
 
-// NO__ERRNO: declare double @llvm.round.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.round.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.round.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.round.f64(double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.round.f32(float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.round.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.round.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.round.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.round.f80(
 
   scalbln(f,f);    scalblnf(f,f);   scalblnl(f,f);
 
-// NO__ERRNO: declare double @scalbln(double noundef, i64 noundef) [[READNONE]]
-// NO__ERRNO: declare float @scalblnf(float noundef, i64 noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @scalblnl(x86_fp80 noundef, i64 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @scalbln(double noundef, i64 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @scalblnf(float noundef, i64 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @scalblnl(x86_fp80 noundef, i64 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @scalbln(double noundef, i64 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @scalblnf(float noundef, i64 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @scalblnl(x86_fp80 noundef, i64 noundef) [[NOT_READNONE]]
 
   scalbn(f,f);     scalbnf(f,f);    scalbnl(f,f);
 
-// NO__ERRNO: declare double @scalbn(double noundef, i32 noundef) [[READNONE]]
-// NO__ERRNO: declare float @scalbnf(float noundef, i32 noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @scalbnl(x86_fp80 noundef, i32 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @scalbn(double noundef, i32 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @scalbnf(float noundef, i32 noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @scalbnl(x86_fp80 noundef, i32 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @scalbn(double noundef, i32 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @scalbnf(float noundef, i32 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @scalbnl(x86_fp80 noundef, i32 noundef) [[NOT_READNONE]]
 
   sin(f);        sinf(f);       sinl(f);
 
-// NO__ERRNO: declare double @llvm.sin.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.sin.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.sin.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @sin(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @sinf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @sinl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.sin.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.sin.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.sin.f80(
 
   sinh(f);       sinhf(f);      sinhl(f);
 
-// NO__ERRNO: declare double @llvm.sinh.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.sinh.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.sinh.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @sinh(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @sinhf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @sinhl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.sinh.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.sinh.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.sinh.f80(
 
 sincos(f, d, d);       sincosf(f, fp, fp);        sincosl(f, l, l);
 
-// NO__ERRNO: declare { double, double } @llvm.sincos.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare { float, float } @llvm.sincos.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare void @sincos(double noundef, ptr noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare void @sincosf(float noundef, ptr noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare void @sincosl(x86_fp80 noundef, ptr noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare void @sincos(double noundef, ptr noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare void @sincosf(float noundef, ptr noundef, ptr noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare void @sincosl(x86_fp80 noundef, ptr noundef, ptr noundef) [[NOT_READNONE]]
 
   sqrt(f);       sqrtf(f);      sqrtl(f);
 
-// NO__ERRNO: declare double @llvm.sqrt.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.sqrt.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.sqrt.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @sqrt(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @sqrtf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @sqrtl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.sqrt.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.sqrt.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.sqrt.f80(
 
   tan(f);        tanf(f);       tanl(f);
 
-// NO__ERRNO: declare double @llvm.tan.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.tan.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.tan.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @tan(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @tanf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @tanl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.tan.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.tan.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.tan.f80(
 
   tanh(f);       tanhf(f);      tanhl(f);
 
-// NO__ERRNO: declare double @llvm.tanh.f64(double) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare float @llvm.tanh.f32(float) [[READNONE_INTRINSIC]]
-// NO__ERRNO: declare x86_fp80 @llvm.tanh.f80(x86_fp80) [[READNONE_INTRINSIC]]
-// HAS_ERRNO: declare double @tanh(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @tanhf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @tanhl(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @llvm.experimental.constrained.tanh.f64(
-// HAS_MAYTRAP: declare float @llvm.experimental.constrained.tanh.f32(
-// HAS_MAYTRAP: declare x86_fp80 @llvm.experimental.constrained.tanh.f80(
 
   tgamma(f);     tgammaf(f);    tgammal(f);
 
-// NO__ERRNO: declare double @tgamma(double noundef) [[READNONE]]
-// NO__ERRNO: declare float @tgammaf(float noundef) [[READNONE]]
-// NO__ERRNO: declare x86_fp80 @tgammal(x86_fp80 noundef) [[READNONE]]
-// HAS_ERRNO: declare double @tgamma(double noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare float @tgammaf(float noundef) [[NOT_READNONE]]
-// HAS_ERRNO: declare x86_fp80 @tgammal(x86_fp80 noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare double @tgamma(double noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare float @tgammaf(float noundef) [[NOT_READNONE]]
-// HAS_MAYTRAP: declare x86_fp80 @tgammal(x86_fp80 noundef) [[NOT_READNONE]]
 
   trunc(f);      truncf(f);     truncl(f);
 
-// NO__ERRNO: declare double @llvm.trunc.f64(double) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare float @llvm.trunc.f32(float) [[READNONE_INTRINSIC2]]
-// NO__ERRNO: declare x86_fp80 @llvm.trunc.f80(x86_fp80) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare double @llvm.trunc.f64(double) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare float @llvm.trunc.f32(float) [[READNONE_INTRINSIC2]]
-// HAS_ERRNO: declare x86_fp80 @llvm.trunc.f80(x86_fp80) [[READNONE_INTRINSIC2]]
 };
 
-// NO__ERRNO: attributes [[READNONE_INTRINSIC]] = { {{.*}}memory(none){{.*}} }
-// NO__ERRNO: attributes [[READNONE_INTRINSIC2]] = { {{.*}}memory(none){{.*}} }
-// NO__ERRNO: attributes [[NOT_READNONE]] = { nounwind {{.*}} }
-// NO__ERRNO: attributes [[READNONE]] = { {{.*}}memory(none){{.*}} }
-// NO__ERRNO: attributes [[READONLY]] = { {{.*}}memory(read){{.*}} }
 
-// HAS_ERRNO: attributes [[NOT_READNONE]] = { nounwind {{.*}} }
-// HAS_ERRNO: attributes [[READNONE_INTRINSIC2]] = { {{.*}}memory(none){{.*}} }
-// HAS_ERRNO: attributes [[READNONE_INTRINSIC]] = { {{.*}}memory(none){{.*}} }
-// HAS_ERRNO: attributes [[READONLY]] = { {{.*}}memory(read){{.*}} }
-// HAS_ERRNO: attributes [[READNONE]] = { {{.*}}memory(none){{.*}} }
 
-// HAS_MAYTRAP: attributes [[NOT_READNONE]] = { nounwind {{.*}} }
-// HAS_MAYTRAP: attributes [[READNONE]] = { {{.*}}memory(none){{.*}} }
 
-// HAS_ERRNO_GNU: attributes [[READNONE_INTRINSIC]] = { {{.*}}memory(none){{.*}} }
-// HAS_ERRNO_WIN: attributes [[READNONE_INTRINSIC]] = { {{.*}}memory(none){{.*}} }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// COMMON: {{.*}}
+// HAS_ERRNO: {{.*}}
+// NO__ERRNO: {{.*}}
diff --git a/clang/test/CodeGen/pragma-fenv_access.c b/clang/test/CodeGen/pragma-fenv_access.c
index 76c38f957d632..e09f3a536916b 100644
--- a/clang/test/CodeGen/pragma-fenv_access.c
+++ b/clang/test/CodeGen/pragma-fenv_access.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -ffp-exception-behavior=strict -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck --check-prefixes=CHECK,STRICT %s
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -frounding-math -ffp-exception-behavior=strict -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck --check-prefixes=CHECK,STRICT-RND %s
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -ffp-exception-behavior=strict -triple %itanium_abi_triple -emit-llvm %s -o - -fms-extensions -DMS | FileCheck --check-prefixes=CHECK,STRICT %s
@@ -5,14 +6,57 @@
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck --check-prefixes=CHECK,DEFAULT %s
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -frounding-math -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck --check-prefixes=CHECK,DEFAULT-RND %s
 
+// STRICT-LABEL: define dso_local float @func_00(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_00(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_00(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_00(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7:[0-9]+]] [ "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_00(float x, float y) {
   return x + y;
 }
-// CHECK-LABEL: @func_00
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// STRICT-RND: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-// DEFAULT-RND: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.ignore")
-// DEFAULT: fadd float
 
 
 #ifdef MS
@@ -21,27 +65,165 @@ float func_00(float x, float y) {
 #pragma STDC FENV_ACCESS ON
 #endif
 
+// STRICT-LABEL: define dso_local float @func_01(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_01(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_01(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1:[0-9]+]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_01(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_01(float x, float y) {
   return x + y;
 }
-// CHECK-LABEL: @func_01
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_02(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_02(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6:[0-9]+]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_02(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7:[0-9]+]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_02(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_02(float x, float y) {
   #pragma float_control(except, off)
   #pragma STDC FENV_ACCESS OFF
   return x + y;
 }
-// CHECK-LABEL: @func_02
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
 
+// STRICT-LABEL: define dso_local float @func_03(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_03(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_03(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_03(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_03(float x, float y) {
   return x + y;
 }
-// CHECK-LABEL: @func_03
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
 
 
 #ifdef MS
@@ -50,40 +232,318 @@ float func_03(float x, float y) {
 #pragma STDC FENV_ACCESS OFF
 #endif
 
+// STRICT-LABEL: define dso_local float @func_04(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_04(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_04(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_04(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR2:[0-9]+]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_04(float x, float y) {
   #pragma float_control(except, off)
   return x + y;
 }
-// CHECK-LABEL: @func_04
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-// DEFAULT: fadd float
 
 
+// STRICT-LABEL: define dso_local float @func_04a(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_04a(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_04a(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_04a(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_04a(float x, float y) {
   #pragma float_control(except, on)
   return x + y;
 }
-// CHECK-LABEL: @func_04a
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_05(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_05(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_05(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_05(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_05(float x, float y) {
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_05
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_06(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_06(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_06(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_06(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR2]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_06(float x, float y) {
   #pragma float_control(except, off)
   return x + y;
 }
-// CHECK-LABEL: @func_06
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-// DEFAULT: fadd float
 
 
+// STRICT-LABEL: define dso_local float @func_07(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"une") #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    br i1 [[TOBOOL]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// STRICT:       [[IF_THEN]]:
+// STRICT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[MUL:%.*]] = fmul float [[TMP3]], 2.000000e+00
+// STRICT-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    br label %[[IF_END]]
+// STRICT:       [[IF_END]]:
+// STRICT-NEXT:    [[TMP4:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float 4.000000e+00) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_07(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"une") #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    br i1 [[TOBOOL]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// STRICT-RND:       [[IF_THEN]]:
+// STRICT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[MUL:%.*]] = fmul float [[TMP3]], 2.000000e+00
+// STRICT-RND-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    br label %[[IF_END]]
+// STRICT-RND:       [[IF_END]]:
+// STRICT-RND-NEXT:    [[TMP4:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float 4.000000e+00) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_07(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"une") #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    br i1 [[TOBOOL]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// DEFAULT:       [[IF_THEN]]:
+// DEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[MUL:%.*]] = fmul float [[TMP3]], 2.000000e+00
+// DEFAULT-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    br label %[[IF_END]]
+// DEFAULT:       [[IF_END]]:
+// DEFAULT-NEXT:    [[TMP4:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float 4.000000e+00) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_07(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP2]], float 0.000000e+00, metadata !"une") #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    br i1 [[TOBOOL]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// DEFAULT-RND:       [[IF_THEN]]:
+// DEFAULT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[MUL:%.*]] = fmul float [[TMP3]], 2.000000e+00
+// DEFAULT-RND-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    br label %[[IF_END]]
+// DEFAULT-RND:       [[IF_END]]:
+// DEFAULT-RND-NEXT:    [[TMP4:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP4]], float 4.000000e+00) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_07(float x, float y) {
   x -= y;
   if (x) {
@@ -92,73 +552,417 @@ float func_07(float x, float y) {
   }
   return y + 4.0F;
 }
-// CHECK-LABEL: @func_07
-// STRICT: call float @llvm.experimental.constrained.fsub.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// STRICT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fsub.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-// DEFAULT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
 
+// STRICT-LABEL: define dso_local float @func_08(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtp") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_08(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtp") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_08(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtp") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_08(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtp") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_08(float x, float y) {
   #pragma STDC FENV_ROUND FE_UPWARD
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_08
-// CHECK:  call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.upward", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_09(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_09(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_09(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_09(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_09(float x, float y) {
   #pragma STDC FENV_ROUND FE_TONEAREST
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_09
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_10(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_10(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_10(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_10(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_10(float x, float y) {
   #pragma STDC FENV_ROUND FE_TONEAREST
   #pragma clang fp exceptions(ignore)
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_10
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
 
+// STRICT-LABEL: define dso_local float @func_11(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_11(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_11(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_11(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR2]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_11(float x, float y) {
   #pragma STDC FENV_ROUND FE_TONEAREST
   #pragma clang fp exceptions(ignore)
   #pragma STDC FENV_ACCESS OFF
   return x + y;
 }
-// CHECK-LABEL: @func_11
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-// DEFAULT: fadd float
 
 
+// STRICT-LABEL: define dso_local float @func_12(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.except"(metadata !"maytrap") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_12(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.except"(metadata !"maytrap") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_12(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.except"(metadata !"maytrap") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_12(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.except"(metadata !"maytrap") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_12(float x, float y) {
   #pragma clang fp exceptions(maytrap)
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_12
-// CHECK:  call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
 
 
+// STRICT-LABEL: define dso_local float @func_13(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"maytrap") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_13(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"maytrap") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_13(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"maytrap") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_13(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"maytrap") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_13(float x, float y) {
   #pragma clang fp exceptions(maytrap)
   #pragma STDC FENV_ROUND FE_UPWARD
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_13
-// CHECK:  call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.upward", metadata !"fpexcept.maytrap")
 
 
+// STRICT-LABEL: define dso_local float @func_14(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[RES:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// STRICT-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// STRICT-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// STRICT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_14(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[RES:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// STRICT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// STRICT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_14(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[RES:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// DEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_14(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[RES:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// DEFAULT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// DEFAULT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_14(float x, float y, float z) {
   #pragma STDC FENV_ACCESS ON
   float res = x * y;
@@ -167,13 +971,84 @@ float func_14(float x, float y, float z) {
     return res + z;
   }
 }
-// CHECK-LABEL: @func_14
-// STRICT:  call float @llvm.experimental.constrained.fmul.f32({{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-// STRICT:  call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fmul.f32({{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
 
 
+// STRICT-LABEL: define dso_local float @func_15(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[RES:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtz") ]
+// STRICT-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// STRICT-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// STRICT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR6]] [ "fp.control"(metadata !"rtz") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_15(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[RES:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtz") ]
+// STRICT-RND-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// STRICT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// STRICT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR6]] [ "fp.control"(metadata !"rtz") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_15(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[RES:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtz") ]
+// DEFAULT-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// DEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR7]] [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_15(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[RES:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtz") ]
+// DEFAULT-RND-NEXT:    store float [[MUL]], ptr [[RES]], align 4
+// DEFAULT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[RES]], align 4
+// DEFAULT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP2]], float [[TMP3]]) #[[ATTR7]] [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_15(float x, float y, float z) {
   #pragma STDC FENV_ROUND FE_TOWARDZERO
   #pragma STDC FENV_ACCESS ON
@@ -183,13 +1058,80 @@ float func_15(float x, float y, float z) {
     return res + z;
   }
 }
-// CHECK-LABEL: @func_15
-// STRICT:  call float @llvm.experimental.constrained.fmul.f32({{.*}}, metadata !"round.towardzero", metadata !"fpexcept.strict")
-// STRICT:  call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.towardzero", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fmul.f32({{.*}}, metadata !"round.towardzero", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.towardzero", metadata !"fpexcept.ignore")
 
 
+// STRICT-LABEL: define dso_local float @func_16(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float 2.000000e+00) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP3]], 4.000000e+00
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_16(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float 2.000000e+00) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP3]], 4.000000e+00
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_16(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float 2.000000e+00) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// DEFAULT-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP3]], 4.000000e+00
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_16(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[TMP1]], float [[TMP0]]) #[[ATTR7]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    store float [[SUB]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP2:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[TMP2]], float 2.000000e+00) #[[ATTR7]] [ "fp.control"(metadata !"rte") ]
+// DEFAULT-RND-NEXT:    store float [[MUL]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP3:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP3]], 4.000000e+00
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_16(float x, float y) {
   x -= y;
   {
@@ -202,88 +1144,475 @@ float func_16(float x, float y) {
     return y + 4.0F;
   }
 }
-// CHECK-LABEL: @func_16
-// STRICT: call float @llvm.experimental.constrained.fsub.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// STRICT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fsub.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-// DEFAULT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// DEFAULT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_17(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_17(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_17(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_17(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_17(float x, float y) {
   #pragma STDC FENV_ROUND FE_DYNAMIC
   #pragma STDC FENV_ACCESS ON
   return x + y;
 }
-// CHECK-LABEL: @func_17
-// CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
 
 
+// STRICT-LABEL: define dso_local float @func_18(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_18(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_18(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_18(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR2]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_18(float x, float y) {
   #pragma STDC FENV_ROUND FE_DYNAMIC
   return x + y;
 }
-// CHECK-LABEL: @func_18
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// DEFAULT: fadd float
 
 #pragma STDC FENV_ACCESS ON
+// STRICT-LABEL: define dso_local float @func_19(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_19(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_19(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR1]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_19(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_19(float x, float y) {
   return x + y;
 }
-// CHECK-LABEL: @func_19
-// STRICT:  call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.dynamic", metadata !"fpexcept.strict")
 
 #pragma STDC FENV_ACCESS OFF
+// STRICT-LABEL: define dso_local float @func_20(
+// STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-NEXT:    ret float [[ADD]]
+//
+// STRICT-RND-LABEL: define dso_local float @func_20(
+// STRICT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// STRICT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// STRICT-RND-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte") ]
+// STRICT-RND-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-LABEL: define dso_local float @func_20(
+// DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR0]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-NEXT:    ret float [[ADD]]
+//
+// DEFAULT-RND-LABEL: define dso_local float @func_20(
+// DEFAULT-RND-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]]) #[[ATTR2]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// DEFAULT-RND-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// DEFAULT-RND-NEXT:    [[ADD:%.*]] = fadd float [[TMP0]], [[TMP1]]
+// DEFAULT-RND-NEXT:    ret float [[ADD]]
+//
 float func_20(float x, float y) {
   return x + y;
 }
-// CHECK-LABEL: @func_20
-// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// DEFAULT: fadd float
 
 typedef double vector4double __attribute__((__vector_size__(32)));
 typedef float  vector4float  __attribute__((__vector_size__(16)));
+// STRICT-LABEL: define dso_local <4 x float> @func_21(
+// STRICT-SAME: ptr noundef byval(<4 x double>) align 32 [[TMP0:%.*]]) #[[ATTR2:[0-9]+]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca <4 x double>, align 32
+// STRICT-NEXT:    [[X:%.*]] = load <4 x double>, ptr [[TMP0]], align 32
+// STRICT-NEXT:    store <4 x double> [[X]], ptr [[X_ADDR]], align 32
+// STRICT-NEXT:    [[TMP1:%.*]] = load <4 x double>, ptr [[X_ADDR]], align 32
+// STRICT-NEXT:    [[CONV:%.*]] = call <4 x float> @llvm.fptrunc.v4f32.v4f64(<4 x double> [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtp") ]
+// STRICT-NEXT:    ret <4 x float> [[CONV]]
+//
+// STRICT-RND-LABEL: define dso_local <4 x float> @func_21(
+// STRICT-RND-SAME: ptr noundef byval(<4 x double>) align 32 [[TMP0:%.*]]) #[[ATTR2:[0-9]+]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <4 x double>, align 32
+// STRICT-RND-NEXT:    [[X:%.*]] = load <4 x double>, ptr [[TMP0]], align 32
+// STRICT-RND-NEXT:    store <4 x double> [[X]], ptr [[X_ADDR]], align 32
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load <4 x double>, ptr [[X_ADDR]], align 32
+// STRICT-RND-NEXT:    [[CONV:%.*]] = call <4 x float> @llvm.fptrunc.v4f32.v4f64(<4 x double> [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtp") ]
+// STRICT-RND-NEXT:    ret <4 x float> [[CONV]]
+//
+// DEFAULT-LABEL: define dso_local <4 x float> @func_21(
+// DEFAULT-SAME: ptr noundef byval(<4 x double>) align 32 [[TMP0:%.*]]) #[[ATTR3:[0-9]+]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca <4 x double>, align 32
+// DEFAULT-NEXT:    [[X:%.*]] = load <4 x double>, ptr [[TMP0]], align 32
+// DEFAULT-NEXT:    store <4 x double> [[X]], ptr [[X_ADDR]], align 32
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load <4 x double>, ptr [[X_ADDR]], align 32
+// DEFAULT-NEXT:    [[CONV:%.*]] = call <4 x float> @llvm.fptrunc.v4f32.v4f64(<4 x double> [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret <4 x float> [[CONV]]
+//
+// DEFAULT-RND-LABEL: define dso_local <4 x float> @func_21(
+// DEFAULT-RND-SAME: ptr noundef byval(<4 x double>) align 32 [[TMP0:%.*]]) #[[ATTR3:[0-9]+]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <4 x double>, align 32
+// DEFAULT-RND-NEXT:    [[X:%.*]] = load <4 x double>, ptr [[TMP0]], align 32
+// DEFAULT-RND-NEXT:    store <4 x double> [[X]], ptr [[X_ADDR]], align 32
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load <4 x double>, ptr [[X_ADDR]], align 32
+// DEFAULT-RND-NEXT:    [[CONV:%.*]] = call <4 x float> @llvm.fptrunc.v4f32.v4f64(<4 x double> [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret <4 x float> [[CONV]]
+//
 vector4float func_21(vector4double x) {
   #pragma STDC FENV_ROUND FE_UPWARD
   return __builtin_convertvector(x, vector4float);
 }
-// CHECK-LABEL: @func_21
-// STRICT: call <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double> {{.*}}, metadata !"round.upward", metadata !"fpexcept.strict")
 
 typedef short vector8short __attribute__((__vector_size__(16)));
 typedef double vector8double __attribute__((__vector_size__(64)));
+// STRICT-LABEL: define dso_local <8 x double> @func_24(
+// STRICT-SAME: <8 x i16> noundef [[X:%.*]]) #[[ATTR3:[0-9]+]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca <8 x i16>, align 16
+// STRICT-NEXT:    store <8 x i16> [[X]], ptr [[X_ADDR]], align 16
+// STRICT-NEXT:    [[TMP0:%.*]] = load <8 x i16>, ptr [[X_ADDR]], align 16
+// STRICT-NEXT:    [[CONV:%.*]] = call <8 x double> @llvm.sitofp.v8f64.v8i16(<8 x i16> [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rtz") ]
+// STRICT-NEXT:    ret <8 x double> [[CONV]]
+//
+// STRICT-RND-LABEL: define dso_local <8 x double> @func_24(
+// STRICT-RND-SAME: <8 x i16> noundef [[X:%.*]]) #[[ATTR3:[0-9]+]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <8 x i16>, align 16
+// STRICT-RND-NEXT:    store <8 x i16> [[X]], ptr [[X_ADDR]], align 16
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load <8 x i16>, ptr [[X_ADDR]], align 16
+// STRICT-RND-NEXT:    [[CONV:%.*]] = call <8 x double> @llvm.sitofp.v8f64.v8i16(<8 x i16> [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rtz") ]
+// STRICT-RND-NEXT:    ret <8 x double> [[CONV]]
+//
+// DEFAULT-LABEL: define dso_local <8 x double> @func_24(
+// DEFAULT-SAME: <8 x i16> noundef [[X:%.*]]) #[[ATTR4:[0-9]+]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca <8 x i16>, align 16
+// DEFAULT-NEXT:    store <8 x i16> [[X]], ptr [[X_ADDR]], align 16
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load <8 x i16>, ptr [[X_ADDR]], align 16
+// DEFAULT-NEXT:    [[CONV:%.*]] = call <8 x double> @llvm.sitofp.v8f64.v8i16(<8 x i16> [[TMP0]]) #[[ATTR7]] [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret <8 x double> [[CONV]]
+//
+// DEFAULT-RND-LABEL: define dso_local <8 x double> @func_24(
+// DEFAULT-RND-SAME: <8 x i16> noundef [[X:%.*]]) #[[ATTR4:[0-9]+]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <8 x i16>, align 16
+// DEFAULT-RND-NEXT:    store <8 x i16> [[X]], ptr [[X_ADDR]], align 16
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load <8 x i16>, ptr [[X_ADDR]], align 16
+// DEFAULT-RND-NEXT:    [[CONV:%.*]] = call <8 x double> @llvm.sitofp.v8f64.v8i16(<8 x i16> [[TMP0]]) #[[ATTR7]] [ "fp.control"(metadata !"rtz"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret <8 x double> [[CONV]]
+//
 vector8double func_24(vector8short x) {
   #pragma STDC FENV_ROUND FE_TOWARDZERO
   return __builtin_convertvector(x, vector8double);
 }
-// CHECK-LABEL: @func_24
-// STRICT: call <8 x double> @llvm.experimental.constrained.sitofp.v8f64.v8i16(<8 x i16> {{.*}}, metadata !"round.towardzero", metadata !"fpexcept.strict")
 
 typedef unsigned int vector16uint __attribute__((__vector_size__(64)));
 typedef double vector16double __attribute__((__vector_size__(128)));
+// STRICT-LABEL: define dso_local <16 x double> @func_25(
+// STRICT-SAME: ptr noundef byval(<16 x i32>) align 64 [[TMP0:%.*]]) #[[ATTR4:[0-9]+]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca <16 x i32>, align 64
+// STRICT-NEXT:    [[X:%.*]] = load <16 x i32>, ptr [[TMP0]], align 64
+// STRICT-NEXT:    store <16 x i32> [[X]], ptr [[X_ADDR]], align 64
+// STRICT-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[X_ADDR]], align 64
+// STRICT-NEXT:    [[CONV:%.*]] = call <16 x double> @llvm.uitofp.v16f64.v16i32(<16 x i32> [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtn") ]
+// STRICT-NEXT:    ret <16 x double> [[CONV]]
+//
+// STRICT-RND-LABEL: define dso_local <16 x double> @func_25(
+// STRICT-RND-SAME: ptr noundef byval(<16 x i32>) align 64 [[TMP0:%.*]]) #[[ATTR4:[0-9]+]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <16 x i32>, align 64
+// STRICT-RND-NEXT:    [[X:%.*]] = load <16 x i32>, ptr [[TMP0]], align 64
+// STRICT-RND-NEXT:    store <16 x i32> [[X]], ptr [[X_ADDR]], align 64
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[X_ADDR]], align 64
+// STRICT-RND-NEXT:    [[CONV:%.*]] = call <16 x double> @llvm.uitofp.v16f64.v16i32(<16 x i32> [[TMP1]]) #[[ATTR6]] [ "fp.control"(metadata !"rtn") ]
+// STRICT-RND-NEXT:    ret <16 x double> [[CONV]]
+//
+// DEFAULT-LABEL: define dso_local <16 x double> @func_25(
+// DEFAULT-SAME: ptr noundef byval(<16 x i32>) align 64 [[TMP0:%.*]]) #[[ATTR5:[0-9]+]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca <16 x i32>, align 64
+// DEFAULT-NEXT:    [[X:%.*]] = load <16 x i32>, ptr [[TMP0]], align 64
+// DEFAULT-NEXT:    store <16 x i32> [[X]], ptr [[X_ADDR]], align 64
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[X_ADDR]], align 64
+// DEFAULT-NEXT:    [[CONV:%.*]] = call <16 x double> @llvm.uitofp.v16f64.v16i32(<16 x i32> [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-NEXT:    ret <16 x double> [[CONV]]
+//
+// DEFAULT-RND-LABEL: define dso_local <16 x double> @func_25(
+// DEFAULT-RND-SAME: ptr noundef byval(<16 x i32>) align 64 [[TMP0:%.*]]) #[[ATTR5:[0-9]+]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <16 x i32>, align 64
+// DEFAULT-RND-NEXT:    [[X:%.*]] = load <16 x i32>, ptr [[TMP0]], align 64
+// DEFAULT-RND-NEXT:    store <16 x i32> [[X]], ptr [[X_ADDR]], align 64
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load <16 x i32>, ptr [[X_ADDR]], align 64
+// DEFAULT-RND-NEXT:    [[CONV:%.*]] = call <16 x double> @llvm.uitofp.v16f64.v16i32(<16 x i32> [[TMP1]]) #[[ATTR7]] [ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
+// DEFAULT-RND-NEXT:    ret <16 x double> [[CONV]]
+//
 vector16double func_25(vector16uint x) {
   #pragma STDC FENV_ROUND FE_DOWNWARD
   return __builtin_convertvector(x, vector16double);
 }
-// CHECK-LABEL: @func_25
-// STRICT: call <16 x double> @llvm.experimental.constrained.uitofp.v16f64.v16i32(<16 x i32> {{.*}}, metadata !"round.downward", metadata !"fpexcept.strict")
 
 typedef float vector2float __attribute__((__vector_size__(8)));
 typedef char vector2char __attribute__((__vector_size__(2)));
+// STRICT-LABEL: define dso_local i16 @func_22(
+// STRICT-SAME: double noundef [[X_COERCE:%.*]]) #[[ATTR0]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[RETVAL:%.*]] = alloca <2 x i8>, align 2
+// STRICT-NEXT:    [[X:%.*]] = alloca <2 x float>, align 8
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca <2 x float>, align 8
+// STRICT-NEXT:    store double [[X_COERCE]], ptr [[X]], align 8
+// STRICT-NEXT:    [[X1:%.*]] = load <2 x float>, ptr [[X]], align 8
+// STRICT-NEXT:    store <2 x float> [[X1]], ptr [[X_ADDR]], align 8
+// STRICT-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[X_ADDR]], align 8
+// STRICT-NEXT:    [[CONV:%.*]] = call <2 x i8> @llvm.fptosi.v2i8.v2f32(<2 x float> [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    store <2 x i8> [[CONV]], ptr [[RETVAL]], align 2
+// STRICT-NEXT:    [[TMP1:%.*]] = load i16, ptr [[RETVAL]], align 2
+// STRICT-NEXT:    ret i16 [[TMP1]]
+//
+// STRICT-RND-LABEL: define dso_local i16 @func_22(
+// STRICT-RND-SAME: double noundef [[X_COERCE:%.*]]) #[[ATTR0]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[RETVAL:%.*]] = alloca <2 x i8>, align 2
+// STRICT-RND-NEXT:    [[X:%.*]] = alloca <2 x float>, align 8
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <2 x float>, align 8
+// STRICT-RND-NEXT:    store double [[X_COERCE]], ptr [[X]], align 8
+// STRICT-RND-NEXT:    [[X1:%.*]] = load <2 x float>, ptr [[X]], align 8
+// STRICT-RND-NEXT:    store <2 x float> [[X1]], ptr [[X_ADDR]], align 8
+// STRICT-RND-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[X_ADDR]], align 8
+// STRICT-RND-NEXT:    [[CONV:%.*]] = call <2 x i8> @llvm.fptosi.v2i8.v2f32(<2 x float> [[TMP0]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    store <2 x i8> [[CONV]], ptr [[RETVAL]], align 2
+// STRICT-RND-NEXT:    [[TMP1:%.*]] = load i16, ptr [[RETVAL]], align 2
+// STRICT-RND-NEXT:    ret i16 [[TMP1]]
+//
+// DEFAULT-LABEL: define dso_local i16 @func_22(
+// DEFAULT-SAME: double noundef [[X_COERCE:%.*]]) #[[ATTR0]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[RETVAL:%.*]] = alloca <2 x i8>, align 2
+// DEFAULT-NEXT:    [[X:%.*]] = alloca <2 x float>, align 8
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca <2 x float>, align 8
+// DEFAULT-NEXT:    store double [[X_COERCE]], ptr [[X]], align 8
+// DEFAULT-NEXT:    [[X1:%.*]] = load <2 x float>, ptr [[X]], align 8
+// DEFAULT-NEXT:    store <2 x float> [[X1]], ptr [[X_ADDR]], align 8
+// DEFAULT-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[X_ADDR]], align 8
+// DEFAULT-NEXT:    [[CONV:%.*]] = fptosi <2 x float> [[TMP0]] to <2 x i8>
+// DEFAULT-NEXT:    store <2 x i8> [[CONV]], ptr [[RETVAL]], align 2
+// DEFAULT-NEXT:    [[TMP1:%.*]] = load i16, ptr [[RETVAL]], align 2
+// DEFAULT-NEXT:    ret i16 [[TMP1]]
+//
+// DEFAULT-RND-LABEL: define dso_local i16 @func_22(
+// DEFAULT-RND-SAME: double noundef [[X_COERCE:%.*]]) #[[ATTR2]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[RETVAL:%.*]] = alloca <2 x i8>, align 2
+// DEFAULT-RND-NEXT:    [[X:%.*]] = alloca <2 x float>, align 8
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <2 x float>, align 8
+// DEFAULT-RND-NEXT:    store double [[X_COERCE]], ptr [[X]], align 8
+// DEFAULT-RND-NEXT:    [[X1:%.*]] = load <2 x float>, ptr [[X]], align 8
+// DEFAULT-RND-NEXT:    store <2 x float> [[X1]], ptr [[X_ADDR]], align 8
+// DEFAULT-RND-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[X_ADDR]], align 8
+// DEFAULT-RND-NEXT:    [[CONV:%.*]] = fptosi <2 x float> [[TMP0]] to <2 x i8>
+// DEFAULT-RND-NEXT:    store <2 x i8> [[CONV]], ptr [[RETVAL]], align 2
+// DEFAULT-RND-NEXT:    [[TMP1:%.*]] = load i16, ptr [[RETVAL]], align 2
+// DEFAULT-RND-NEXT:    ret i16 [[TMP1]]
+//
 vector2char func_22(vector2float x) {
   #pragma float_control(except, off)
   return __builtin_convertvector(x, vector2char);
 }
-// CHECK-LABEL: @func_22
-// STRICT: call <2 x i8> @llvm.experimental.constrained.fptosi.v2i8.v2f32(<2 x float> {{.*}}, metadata !"fpexcept.ignore")
 
 typedef float vector3float __attribute__((__vector_size__(12)));
 typedef unsigned long long vector3ulong __attribute__((__vector_size__(24)));
+// STRICT-LABEL: define dso_local <3 x i64> @func_23(
+// STRICT-SAME: <3 x float> noundef [[X:%.*]]) #[[ATTR5:[0-9]+]] {
+// STRICT-NEXT:  [[ENTRY:.*:]]
+// STRICT-NEXT:    [[X_ADDR:%.*]] = alloca <3 x float>, align 16
+// STRICT-NEXT:    [[EXTRACTVEC:%.*]] = shufflevector <3 x float> [[X]], <3 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// STRICT-NEXT:    store <4 x float> [[EXTRACTVEC]], ptr [[X_ADDR]], align 16
+// STRICT-NEXT:    [[LOADVECN:%.*]] = load <4 x float>, ptr [[X_ADDR]], align 16
+// STRICT-NEXT:    [[EXTRACTVEC1:%.*]] = shufflevector <4 x float> [[LOADVECN]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
+// STRICT-NEXT:    [[CONV:%.*]] = call <3 x i64> @llvm.fptoui.v3i64.v3f32(<3 x float> [[EXTRACTVEC1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-NEXT:    ret <3 x i64> [[CONV]]
+//
+// STRICT-RND-LABEL: define dso_local <3 x i64> @func_23(
+// STRICT-RND-SAME: <3 x float> noundef [[X:%.*]]) #[[ATTR5:[0-9]+]] {
+// STRICT-RND-NEXT:  [[ENTRY:.*:]]
+// STRICT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <3 x float>, align 16
+// STRICT-RND-NEXT:    [[EXTRACTVEC:%.*]] = shufflevector <3 x float> [[X]], <3 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// STRICT-RND-NEXT:    store <4 x float> [[EXTRACTVEC]], ptr [[X_ADDR]], align 16
+// STRICT-RND-NEXT:    [[LOADVECN:%.*]] = load <4 x float>, ptr [[X_ADDR]], align 16
+// STRICT-RND-NEXT:    [[EXTRACTVEC1:%.*]] = shufflevector <4 x float> [[LOADVECN]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
+// STRICT-RND-NEXT:    [[CONV:%.*]] = call <3 x i64> @llvm.fptoui.v3i64.v3f32(<3 x float> [[EXTRACTVEC1]]) #[[ATTR6]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// STRICT-RND-NEXT:    ret <3 x i64> [[CONV]]
+//
+// DEFAULT-LABEL: define dso_local <3 x i64> @func_23(
+// DEFAULT-SAME: <3 x float> noundef [[X:%.*]]) #[[ATTR6:[0-9]+]] {
+// DEFAULT-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca <3 x float>, align 16
+// DEFAULT-NEXT:    [[EXTRACTVEC:%.*]] = shufflevector <3 x float> [[X]], <3 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// DEFAULT-NEXT:    store <4 x float> [[EXTRACTVEC]], ptr [[X_ADDR]], align 16
+// DEFAULT-NEXT:    [[LOADVECN:%.*]] = load <4 x float>, ptr [[X_ADDR]], align 16
+// DEFAULT-NEXT:    [[EXTRACTVEC1:%.*]] = shufflevector <4 x float> [[LOADVECN]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
+// DEFAULT-NEXT:    [[CONV:%.*]] = fptoui <3 x float> [[EXTRACTVEC1]] to <3 x i64>
+// DEFAULT-NEXT:    ret <3 x i64> [[CONV]]
+//
+// DEFAULT-RND-LABEL: define dso_local <3 x i64> @func_23(
+// DEFAULT-RND-SAME: <3 x float> noundef [[X:%.*]]) #[[ATTR6:[0-9]+]] {
+// DEFAULT-RND-NEXT:  [[ENTRY:.*:]]
+// DEFAULT-RND-NEXT:    [[X_ADDR:%.*]] = alloca <3 x float>, align 16
+// DEFAULT-RND-NEXT:    [[EXTRACTVEC:%.*]] = shufflevector <3 x float> [[X]], <3 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// DEFAULT-RND-NEXT:    store <4 x float> [[EXTRACTVEC]], ptr [[X_ADDR]], align 16
+// DEFAULT-RND-NEXT:    [[LOADVECN:%.*]] = load <4 x float>, ptr [[X_ADDR]], align 16
+// DEFAULT-RND-NEXT:    [[EXTRACTVEC1:%.*]] = shufflevector <4 x float> [[LOADVECN]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
+// DEFAULT-RND-NEXT:    [[CONV:%.*]] = fptoui <3 x float> [[EXTRACTVEC1]] to <3 x i64>
+// DEFAULT-RND-NEXT:    ret <3 x i64> [[CONV]]
+//
 vector3ulong func_23(vector3float x) {
   #pragma float_control(except, off)
   return __builtin_convertvector(x, vector3ulong);
 }
-// CHECK-LABEL: @func_23
-// STRICT: call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f32(<3 x float> {{.*}}, metadata !"fpexcept.ignore")
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
diff --git a/clang/test/CodeGen/strictfp-elementwise-builtins.cpp b/clang/test/CodeGen/strictfp-elementwise-builtins.cpp
index 6453d50f044aa..8bafa396e2a71 100644
--- a/clang/test/CodeGen/strictfp-elementwise-builtins.cpp
+++ b/clang/test/CodeGen/strictfp-elementwise-builtins.cpp
@@ -10,7 +10,7 @@ typedef float float4 __attribute__((ext_vector_type(4)));
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z11strict_faddDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ADD:%.*]] = tail call <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float> [[A]], <4 x float> [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4:[0-9]+]]
+// CHECK-NEXT:    [[ADD:%.*]] = fadd <4 x float> [[A]], [[B]]
 // CHECK-NEXT:    ret <4 x float> [[ADD]]
 //
 float4 strict_fadd(float4 a, float4 b) {
@@ -18,9 +18,9 @@ float4 strict_fadd(float4 a, float4 b) {
 }
 
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_absDv4_f
-// CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR2:[0-9]+]] {
+// CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_ABS:%.*]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> [[A]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_ABS:%.*]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> [[A]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    ret <4 x float> [[ELT_ABS]]
 //
 float4 strict_elementwise_abs(float4 a) {
@@ -30,7 +30,7 @@ float4 strict_elementwise_abs(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_maxDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MAX:%.*]] = tail call <4 x float> @llvm.experimental.constrained.maxnum.v4f32(<4 x float> [[A]], <4 x float> [[B]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MAX:%.*]] = tail call <4 x float> @llvm.maxnum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[ELT_MAX]]
 //
 float4 strict_elementwise_max(float4 a, float4 b) {
@@ -40,7 +40,7 @@ float4 strict_elementwise_max(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_minDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MIN:%.*]] = tail call <4 x float> @llvm.experimental.constrained.minnum.v4f32(<4 x float> [[A]], <4 x float> [[B]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MIN:%.*]] = tail call <4 x float> @llvm.minnum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[ELT_MIN]]
 //
 float4 strict_elementwise_min(float4 a, float4 b) {
@@ -48,9 +48,9 @@ float4 strict_elementwise_min(float4 a, float4 b) {
 }
 
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_maximumDv4_fS_
-// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
+// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MAXIMUM:%.*]] = tail call <4 x float> @llvm.maximum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[ELT_MAXIMUM]]
 //
 float4 strict_elementwise_maximum(float4 a, float4 b) {
@@ -58,9 +58,9 @@ float4 strict_elementwise_maximum(float4 a, float4 b) {
 }
 
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z26strict_elementwise_minimumDv4_fS_
-// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR2]] {
+// CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR4]]
+// CHECK-NEXT:    [[ELT_MINIMUM:%.*]] = tail call <4 x float> @llvm.minimum.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[ELT_MINIMUM]]
 //
 float4 strict_elementwise_minimum(float4 a, float4 b) {
@@ -70,7 +70,7 @@ float4 strict_elementwise_minimum(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_ceilDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> [[A]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.ceil.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_ceil(float4 a) {
@@ -80,7 +80,7 @@ float4 strict_elementwise_ceil(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_acosDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.acos.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.acos.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_acos(float4 a) {
@@ -90,7 +90,7 @@ float4 strict_elementwise_acos(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_cosDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.cos.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.cos.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_cos(float4 a) {
@@ -100,7 +100,7 @@ float4 strict_elementwise_cos(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_coshDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.cosh.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.cosh.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_cosh(float4 a) {
@@ -110,7 +110,7 @@ float4 strict_elementwise_cosh(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_expDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.exp.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.exp.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_exp(float4 a) {
@@ -120,7 +120,7 @@ float4 strict_elementwise_exp(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_exp2Dv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.exp2.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.exp2.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_exp2(float4 a) {
@@ -130,7 +130,7 @@ float4 strict_elementwise_exp2(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z24strict_elementwise_floorDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> [[A]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.floor.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_floor(float4 a) {
@@ -140,7 +140,7 @@ float4 strict_elementwise_floor(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_logDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.log.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.log.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_log(float4 a) {
@@ -150,7 +150,7 @@ float4 strict_elementwise_log(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_log2Dv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.log2.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.log2.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_log2(float4 a) {
@@ -160,7 +160,7 @@ float4 strict_elementwise_log2(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z24strict_elementwise_log10Dv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.log2.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.log2.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_log10(float4 a) {
@@ -170,7 +170,7 @@ float4 strict_elementwise_log10(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z28strict_elementwise_roundevenDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.roundeven.v4f32(<4 x float> [[A]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.roundeven.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_roundeven(float4 a) {
@@ -180,7 +180,7 @@ float4 strict_elementwise_roundeven(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z24strict_elementwise_roundDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.round.v4f32(<4 x float> [[A]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.round.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_round(float4 a) {
@@ -190,7 +190,7 @@ float4 strict_elementwise_round(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_rintDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.rint.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.rint.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_rint(float4 a) {
@@ -200,7 +200,7 @@ float4 strict_elementwise_rint(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z28strict_elementwise_nearbyintDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.nearbyint.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_nearbyint(float4 a) {
@@ -210,7 +210,7 @@ float4 strict_elementwise_nearbyint(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_asinDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.asin.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.asin.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_asin(float4 a) {
@@ -220,7 +220,7 @@ float4 strict_elementwise_asin(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_sinDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.sin.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.sin.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_sin(float4 a) {
@@ -230,7 +230,7 @@ float4 strict_elementwise_sin(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_sinhDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.sinh.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.sinh.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_sinh(float4 a) {
@@ -240,7 +240,7 @@ float4 strict_elementwise_sinh(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_sqrtDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.sqrt.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_sqrt(float4 a) {
@@ -250,7 +250,7 @@ float4 strict_elementwise_sqrt(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_atanDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.atan.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.atan.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_atan(float4 a) {
@@ -260,7 +260,7 @@ float4 strict_elementwise_atan(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_tanDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.tan.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.tan.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_tan(float4 a) {
@@ -270,7 +270,7 @@ float4 strict_elementwise_tan(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_tanhDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.tanh.v4f32(<4 x float> [[A]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.tanh.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_tanh(float4 a) {
@@ -280,7 +280,7 @@ float4 strict_elementwise_tanh(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z24strict_elementwise_atan2Dv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.atan2.v4f32(<4 x float> [[A]], <4 x float> [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.atan2.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_atan2(float4 a, float4 b) {
@@ -290,7 +290,7 @@ float4 strict_elementwise_atan2(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z24strict_elementwise_truncDv4_f
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> [[A]], metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.trunc.v4f32(<4 x float> [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_trunc(float4 a) {
@@ -300,7 +300,7 @@ float4 strict_elementwise_trunc(float4 a) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_fmaDv4_fS_S_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]], <4 x float> noundef [[C:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> [[A]], <4 x float> [[B]], <4 x float> [[C]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.fma.v4f32(<4 x float> [[A]], <4 x float> [[B]], <4 x float> [[C]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_fma(float4 a, float4 b, float4 c) {
@@ -310,7 +310,7 @@ float4 strict_elementwise_fma(float4 a, float4 b, float4 c) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z22strict_elementwise_powDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.pow.v4f32(<4 x float> [[A]], <4 x float> [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.pow.v4f32(<4 x float> [[A]], <4 x float> [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    ret <4 x float> [[TMP0]]
 //
 float4 strict_elementwise_pow(float4 a, float4 b) {
@@ -320,8 +320,8 @@ float4 strict_elementwise_pow(float4 a, float4 b) {
 // CHECK-LABEL: define dso_local noundef <4 x float> @_Z23strict_elementwise_fmodDv4_fS_
 // CHECK-SAME: (<4 x float> noundef [[A:%.*]], <4 x float> noundef [[B:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[TMP0:%.*]] = tail call <4 x float> @llvm.experimental.constrained.frem.v4f32(<4 x float> [[A]], <4 x float> [[B]], metadata !"round.dynamic", metadata !"fpexcept.strict") #[[ATTR4]]
-// CHECK-NEXT:    ret <4 x float> [[TMP0]]
+// CHECK-NEXT:    [[FMOD:%.*]] = frem <4 x float> [[A]], [[B]]
+// CHECK-NEXT:    ret <4 x float> [[FMOD]]
 //
 float4 strict_elementwise_fmod(float4 a, float4 b) {
   return __builtin_elementwise_fmod(a, b);
diff --git a/clang/test/CodeGen/strictfp_builtins.c b/clang/test/CodeGen/strictfp_builtins.c
index 58815c7de4fa9..7063f607ffa30 100644
--- a/clang/test/CodeGen/strictfp_builtins.c
+++ b/clang/test/CodeGen/strictfp_builtins.c
@@ -17,7 +17,7 @@ int printf(const char *, ...);
 // CHECK-NEXT:    store i32 [[X:%.*]], ptr [[X_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[STR_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[X_ADDR]], align 4
-// CHECK-NEXT:    [[CALL:%.*]] = call i32 (ptr, ...) @printf(ptr noundef @.str, ptr noundef [[TMP0]], i32 noundef [[TMP1]]) #[[ATTR4:[0-9]+]]
+// CHECK-NEXT:    [[CALL:%.*]] = call i32 (ptr, ...) @printf(ptr noundef @.str, ptr noundef [[TMP0]], i32 noundef [[TMP1]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    ret void
 //
 void p(char *str, int x) {
@@ -31,21 +31,21 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[ISZERO:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ISZERO:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"oeq") #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    br i1 [[ISZERO]], label [[FPCLASSIFY_END:%.*]], label [[FPCLASSIFY_NOT_ZERO:%.*]]
 // CHECK:       fpclassify_end:
 // CHECK-NEXT:    [[FPCLASSIFY_RESULT:%.*]] = phi i32 [ 4, [[ENTRY:%.*]] ], [ 0, [[FPCLASSIFY_NOT_ZERO]] ], [ 1, [[FPCLASSIFY_NOT_NAN:%.*]] ], [ [[TMP2:%.*]], [[FPCLASSIFY_NOT_INF:%.*]] ]
-// CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[FPCLASSIFY_RESULT]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[FPCLASSIFY_RESULT]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 // CHECK:       fpclassify_not_zero:
-// CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP0]], metadata !"uno", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP0]], double [[TMP0]], metadata !"uno") #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FPCLASSIFY_END]], label [[FPCLASSIFY_NOT_NAN]]
 // CHECK:       fpclassify_not_nan:
-// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5:[0-9]+]]
-// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq") #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    br i1 [[ISINF]], label [[FPCLASSIFY_END]], label [[FPCLASSIFY_NOT_INF]]
 // CHECK:       fpclassify_not_inf:
-// CHECK-NEXT:    [[ISNORMAL:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x10000000000000, metadata !"uge", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ISNORMAL:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP1]], double 0x10000000000000, metadata !"uge") #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2]] = select i1 [[ISNORMAL]], i32 2, i32 3
 // CHECK-NEXT:    br label [[FPCLASSIFY_END]]
 //
@@ -60,9 +60,9 @@ void test_fpclassify(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 516) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_fp16_isinf(_Float16 h) {
@@ -76,9 +76,9 @@ void test_fp16_isinf(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 516) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_float_isinf(float f) {
@@ -92,9 +92,9 @@ void test_float_isinf(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 516) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.4, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.4, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_double_isinf(double d) {
@@ -108,9 +108,9 @@ void test_double_isinf(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 504) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.5, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.5, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_fp16_isfinite(_Float16 h) {
@@ -124,9 +124,9 @@ void test_fp16_isfinite(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 504) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.6, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.6, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_float_isfinite(float f) {
@@ -140,9 +140,9 @@ void test_float_isfinite(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 504) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.7, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.7, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_double_isfinite(double d) {
@@ -156,13 +156,13 @@ void test_double_isfinite(double d) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5]]
-// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq") #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = bitcast double [[TMP0]] to i64
 // CHECK-NEXT:    [[TMP3:%.*]] = icmp slt i64 [[TMP2]], 0
 // CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP3]], i32 -1, i32 1
 // CHECK-NEXT:    [[TMP5:%.*]] = select i1 [[ISINF]], i32 [[TMP4]], i32 0
-// CHECK-NEXT:    call void @p(ptr noundef @.str.8, i32 noundef [[TMP5]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.8, i32 noundef [[TMP5]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_isinf_sign(double d) {
@@ -176,9 +176,9 @@ void test_isinf_sign(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 3) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.9, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.9, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_fp16_isnan(_Float16 h) {
@@ -192,9 +192,9 @@ void test_fp16_isnan(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 3) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.10, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.10, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_float_isnan(float f) {
@@ -208,9 +208,9 @@ void test_float_isnan(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 3) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.11, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.11, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_double_isnan(double d) {
@@ -224,9 +224,9 @@ void test_double_isnan(double d) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 264) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 264) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
-// CHECK-NEXT:    call void @p(ptr noundef @.str.12, i32 noundef [[TMP2]]) #[[ATTR4]]
+// CHECK-NEXT:    call void @p(ptr noundef @.str.12, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_isnormal(double d) {

>From fb77509d7a0bae104f28bd9822d066a3bf551170 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Tue, 14 Apr 2026 00:30:13 -0700
Subject: [PATCH 10/12] [Clang][PCH][OpenCL] Update constrained FP CHECK
 patterns for operand bundle IR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Update CHECK lines in PCH and OpenCL tests to match the new operand bundle
IR format produced by the constrained FP → operand bundle migration.

PCH/pragma-floatcontrol.c: update @llvm.experimental.constrained.fmul/fadd/
fmuladd to @llvm.fmul/fadd/fmuladd with fp.control operand bundles.

CodeGenOpenCL/builtin-store-half-rounding-constrained.cl: update
@llvm.experimental.constrained.fptrunc patterns to @llvm.fptrunc with
fp.control + fp.except operand bundles and new rounding mode names
(rtp/rtn instead of round.upward/round.downward).

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 .../builtin-store-half-rounding-constrained.cl              | 4 ++--
 clang/test/PCH/pragma-floatcontrol.c                        | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/clang/test/CodeGenOpenCL/builtin-store-half-rounding-constrained.cl b/clang/test/CodeGenOpenCL/builtin-store-half-rounding-constrained.cl
index ba3b3165e325a..190366800222a 100644
--- a/clang/test/CodeGenOpenCL/builtin-store-half-rounding-constrained.cl
+++ b/clang/test/CodeGenOpenCL/builtin-store-half-rounding-constrained.cl
@@ -1,7 +1,7 @@
 // RUN: %clang_cc1 %s -cl-std=cl3.0 -triple x86_64-unknown-unknown -disable-llvm-passes -emit-llvm -o - | FileCheck %s
 
 // CHECK-LABEL: @test_store_float(
-// CHECK:         [[TMP0:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float {{.*}}, metadata !"round.upward", metadata !"fpexcept.ignore")
+// CHECK:         [[TMP0:%.*]] = call half @llvm.fptrunc.f16.f32(float {{.*}}) {{.*}}[ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
 // CHECK-NEXT:    store half [[TMP0]], ptr {{.*}}, align 2
 // CHECK-NEXT:    ret void
 //
@@ -11,7 +11,7 @@ __kernel void test_store_float(float foo, __global half* bar) {
 }
 
 // CHECK-LABEL: @test_store_double(
-// CHECK:         [[TMP0:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f64(double {{.*}}, metadata !"round.downward", metadata !"fpexcept.ignore")
+// CHECK:         [[TMP0:%.*]] = call half @llvm.fptrunc.f16.f64(double {{.*}}) {{.*}}[ "fp.control"(metadata !"rtn"), "fp.except"(metadata !"ignore") ]
 // CHECK-NEXT:    store half [[TMP0]], ptr {{.*}}, align 2
 // CHECK-NEXT:    ret void
 //
diff --git a/clang/test/PCH/pragma-floatcontrol.c b/clang/test/PCH/pragma-floatcontrol.c
index 95f587a7b52b8..49f35cfc71d15 100644
--- a/clang/test/PCH/pragma-floatcontrol.c
+++ b/clang/test/PCH/pragma-floatcontrol.c
@@ -46,9 +46,9 @@
 #ifdef SET
 float fun(float a, float b) {
   // CHECK-LABEL: define float @fun{{.*}}
-  //CHECK-EBSTRICT: llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
-  //CHECK-EBSTRICT: llvm.experimental.constrained.fadd{{.*}}tonearest{{.*}}strict
-  //CHECK-CONTRACT: llvm.experimental.constrained.fmuladd{{.*}}tonearest{{.*}}strict
+  //CHECK-EBSTRICT: call float @llvm.fmul.f32(float %{{.*}}, float %{{.*}}) {{.*}}[ "fp.control"(metadata !"rte") ]
+  //CHECK-EBSTRICT: call float @llvm.fadd.f32(float %{{.*}}, float %{{.*}}) {{.*}}[ "fp.control"(metadata !"rte") ]
+  //CHECK-CONTRACT: call float @llvm.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}) {{.*}}[ "fp.control"(metadata !"rte") ]
   return a * b + 2;
 }
 #pragma float_control(pop) // expected-warning {{#pragma float_control(pop, ...) failed: stack empty}}

>From 7c75d374538a9d1e402810d42a0ff251cd46f3e7 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Tue, 14 Apr 2026 01:04:05 -0700
Subject: [PATCH 11/12] Fix code_formatter CI failures: clang-format and undef
 deprecator

Fixes two issues flagged by the code_formatter CI check:

1. Clang-format: Rewrap overlong comment in
   getFloatingPointMemoryEffects() to stay within 80 columns.

2. Undef deprecator: Remove test cases that added net new `float undef`
   operand uses to the diff:
   - fp-undef-poison-strictfp.ll: Drop the _maytrap and _upward undef
     variants (26 functions). These tested pass-through behavior that
     is not specific to undef semantics; the _strict and _defaultfp
     variants that actually simplify undef to NaN are retained.
   - ldexp.ll: Remove the float-undef sub-tests from the NaN-strictfp
     functions and slim ldexp_f32_undef_strictfp down to poison-only
     cases (renaming it ldexp_f32_poison_strictfp). The undef-ldexp
     behavior is already covered by ldexp_f32_undef_undef et al.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 llvm/lib/IR/Instructions.cpp                  |   6 +-
 .../InstSimplify/fp-undef-poison-strictfp.ll  | 257 ------------------
 llvm/test/Transforms/InstSimplify/ldexp.ll    |  41 +--
 3 files changed, 8 insertions(+), 296 deletions(-)

diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 1482e7e240a1a..1541b3b61da6f 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -782,9 +782,9 @@ MemoryEffects CallBase::getFloatingPointMemoryEffects() const {
                 return MemoryEffects::none();
               // Dynamic rounding mode: the operation reads the current rounding
               // mode from the FP environment (e.g. MXCSR). Use
-              // inaccessibleMemOnly (not just Ref) so that EarlyCSE conservatively
-              // treats these as writes and prevents CSE across arbitrary function
-              // calls that might change the rounding mode.
+              // inaccessibleMemOnly (not just Ref) so that EarlyCSE
+              // conservatively treats these as writes and prevents CSE across
+              // arbitrary function calls that might change the rounding mode.
             }
             return MemoryEffects::inaccessibleMemOnly();
           }
diff --git a/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll b/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
index 8974e0fde8984..183451f049169 100644
--- a/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
+++ b/llvm/test/Transforms/InstSimplify/fp-undef-poison-strictfp.ll
@@ -16,26 +16,6 @@ define float @fadd_undef_op0_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fadd_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fadd_undef_op0_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fadd_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: define float @fadd_undef_op0_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fadd.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fadd_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fadd_undef_op0_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -91,26 +71,6 @@ define float @fadd_undef_op1_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fadd_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fadd_undef_op1_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fadd_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: define float @fadd_undef_op1_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fadd.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fadd.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fadd_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fadd_undef_op1_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -170,26 +130,6 @@ define float @fsub_undef_op0_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fsub_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fsub_undef_op0_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fsub_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: define float @fsub_undef_op0_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fsub.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fsub_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fsub_undef_op0_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -245,26 +185,6 @@ define float @fsub_undef_op1_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fsub_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fsub_undef_op1_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fsub_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: define float @fsub_undef_op1_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fsub.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fsub.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fsub_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fsub_undef_op1_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -324,26 +244,6 @@ define float @fmul_undef_op0_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fmul_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fmul_undef_op0_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fmul_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: define float @fmul_undef_op0_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fmul.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fmul_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fmul_undef_op0_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -399,26 +299,6 @@ define float @fmul_undef_op1_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fmul_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fmul_undef_op1_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fmul_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: define float @fmul_undef_op1_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fmul.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fmul.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fmul_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fmul_undef_op1_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -478,26 +358,6 @@ define float @fdiv_undef_op0_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fdiv_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fdiv_undef_op0_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fdiv_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: define float @fdiv_undef_op0_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fdiv.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fdiv_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fdiv_undef_op0_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -553,26 +413,6 @@ define float @fdiv_undef_op1_strict(float %x) #0 {
   ret float %r
 }
 
-define float @fdiv_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @fdiv_undef_op1_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fdiv_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: define float @fdiv_undef_op1_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.fdiv.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.fdiv.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fdiv_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @fdiv_undef_op1_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -632,26 +472,6 @@ define float @frem_undef_op0_strict(float %x) #0 {
   ret float %r
 }
 
-define float @frem_undef_op0_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @frem_undef_op0_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @frem_undef_op0_upward(float %x) #0 {
-; CHECK-LABEL: define float @frem_undef_op0_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float undef, float [[X]]) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.frem.f32(float undef, float %x, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @frem_undef_op0_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @frem_undef_op0_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -707,26 +527,6 @@ define float @frem_undef_op1_strict(float %x) #0 {
   ret float %r
 }
 
-define float @frem_undef_op1_maytrap(float %x) #0 {
-; CHECK-LABEL: define float @frem_undef_op1_maytrap(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X]], float undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @frem_undef_op1_upward(float %x) #0 {
-; CHECK-LABEL: define float @frem_undef_op1_upward(
-; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R:%.*]] = call float @llvm.frem.f32(float [[X]], float undef) [ "fp.control"(metadata !"rtp"), "fp.except"(metadata !"ignore") ]
-; CHECK-NEXT:    ret float [[R]]
-;
-  %r = call float @llvm.experimental.constrained.frem.f32(float %x, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @frem_undef_op1_defaultfp(float %x) #0 {
 ; CHECK-LABEL: define float @frem_undef_op1_defaultfp(
 ; CHECK-SAME: float [[X:%.*]]) #[[ATTR0]] {
@@ -787,25 +587,6 @@ define float @fma_undef_op0_strict(float %x, float %y) #0 {
   ret float %r
 }
 
-define float @fma_undef_op0_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: define float @fma_undef_op0_maytrap(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float undef, float [[X]], float [[Y]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float 0x7FF8000000000000
-;
-  %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fma_undef_op0_upward(float %x, float %y) #0 {
-; CHECK-LABEL: define float @fma_undef_op0_upward(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    ret float 0x7FF8000000000000
-;
-  %r = call float @llvm.experimental.constrained.fma.f32(float undef, float %x, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fma_undef_op0_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: define float @fma_undef_op0_defaultfp(
 ; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
@@ -863,25 +644,6 @@ define float @fma_undef_op1_strict(float %x, float %y) #0 {
   ret float %r
 }
 
-define float @fma_undef_op1_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: define float @fma_undef_op1_maytrap(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float undef, float [[Y]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float 0x7FF8000000000000
-;
-  %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fma_undef_op1_upward(float %x, float %y) #0 {
-; CHECK-LABEL: define float @fma_undef_op1_upward(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    ret float 0x7FF8000000000000
-;
-  %r = call float @llvm.experimental.constrained.fma.f32(float %x, float undef, float %y, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fma_undef_op1_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: define float @fma_undef_op1_defaultfp(
 ; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
@@ -939,25 +701,6 @@ define float @fma_undef_op2_strict(float %x, float %y) #0 {
   ret float %r
 }
 
-define float @fma_undef_op2_maytrap(float %x, float %y) #0 {
-; CHECK-LABEL: define float @fma_undef_op2_maytrap(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[R1:%.*]] = call float @llvm.fma.f32(float [[X]], float [[Y]], float undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    ret float 0x7FF8000000000000
-;
-  %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
-  ret float %r
-}
-
-define float @fma_undef_op2_upward(float %x, float %y) #0 {
-; CHECK-LABEL: define float @fma_undef_op2_upward(
-; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    ret float 0x7FF8000000000000
-;
-  %r = call float @llvm.experimental.constrained.fma.f32(float %x, float %y, float undef, metadata !"round.upward", metadata !"fpexcept.ignore")
-  ret float %r
-}
-
 define float @fma_undef_op2_defaultfp(float %x, float %y) #0 {
 ; CHECK-LABEL: define float @fma_undef_op2_defaultfp(
 ; CHECK-SAME: float [[X:%.*]], float [[Y:%.*]]) #[[ATTR0]] {
diff --git a/llvm/test/Transforms/InstSimplify/ldexp.ll b/llvm/test/Transforms/InstSimplify/ldexp.ll
index 2c87532f908d3..4fe49454cc365 100644
--- a/llvm/test/Transforms/InstSimplify/ldexp.ll
+++ b/llvm/test/Transforms/InstSimplify/ldexp.ll
@@ -150,8 +150,6 @@ define void @ldexp_f32_val_nan_strictfp_maytrap(i32 %y) #0 {
 ; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    [[NEG_SNAN4:%.*]] = call float @llvm.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0xFFFFFFFFE0000000, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float 0x7FF8000000000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
   %plus.qnan = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7ff0001000000000, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
@@ -166,9 +164,6 @@ define void @ldexp_f32_val_nan_strictfp_maytrap(i32 %y) #0 {
   %neg.snan = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
   store volatile float %neg.snan, ptr addrspace(1) undef
 
-  %undef = call float @llvm.experimental.constrained.ldexp.f32.i32(float undef, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
-  store volatile float %undef, ptr addrspace(1) undef
-
   ret void
 }
 
@@ -183,8 +178,6 @@ define void @ldexp_f32_val_nan_strictfp_strict(i32 %y) #0 {
 ; CHECK-NEXT:    store volatile float 0x7FF8000020000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    [[NEG_SNAN4:%.*]] = call float @llvm.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 [[Y]])
 ; CHECK-NEXT:    store volatile float 0xFFFFFFFFE0000000, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 [[Y]])
-; CHECK-NEXT:    store volatile float 0x7FF8000000000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
   %plus.qnan = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x7ff0001000000000, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
@@ -199,9 +192,6 @@ define void @ldexp_f32_val_nan_strictfp_strict(i32 %y) #0 {
   %neg.snan = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0xFFF7FFFFE0000000, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
   store volatile float %neg.snan, ptr addrspace(1) undef
 
-  %undef = call float @llvm.experimental.constrained.ldexp.f32.i32(float undef, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
-  store volatile float %undef, ptr addrspace(1) undef
-
   ret void
 }
 
@@ -240,34 +230,18 @@ define void @ldexp_f32_0() {
   ret void
 }
 
-define void @ldexp_f32_undef_strictfp(float %x, i32 %y) #0 {
-; CHECK-LABEL: @ldexp_f32_undef_strictfp(
-; CHECK-NEXT:    [[UNDEF_EXP1:%.*]] = call float @llvm.ldexp.f32.i32(float [[UNDEF_EXP:%.*]], i32 undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float [[UNDEF_EXP]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[POISON_EXP2:%.*]] = call float @llvm.ldexp.f32.i32(float [[UNDEF_EXP]], i32 poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float [[UNDEF_EXP]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNDEF_VAL3:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float 0x7FF8000000000000, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[POISON_VAL4:%.*]] = call float @llvm.ldexp.f32.i32(float poison, i32 [[Y]]) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float poison, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[POISON_UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float poison, i32 undef) [ "fp.except"(metadata !"maytrap") ]
+define void @ldexp_f32_poison_strictfp(float %x, i32 %y) #0 {
+; CHECK-LABEL: @ldexp_f32_poison_strictfp(
+; CHECK-NEXT:    [[POISON_EXP1:%.*]] = call float @llvm.ldexp.f32.i32(float [[POISON_EXP:%.*]], i32 poison) [ "fp.except"(metadata !"maytrap") ]
+; CHECK-NEXT:    store volatile float [[POISON_EXP]], ptr addrspace(1) undef, align 4
+; CHECK-NEXT:    [[POISON_VAL2:%.*]] = call float @llvm.ldexp.f32.i32(float poison, i32 [[Y:%.*]]) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float poison, ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNDEF_POISON6:%.*]] = call float @llvm.ldexp.f32.i32(float undef, i32 poison) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float undef, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    ret void
 ;
-  %undef.exp = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
-  store volatile float %undef.exp, ptr addrspace(1) undef
   %poison.exp = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
   store volatile float %poison.exp, ptr addrspace(1) undef
-  %undef.val = call float @llvm.experimental.constrained.ldexp.f32.i32(float undef, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
-  store volatile float %undef.val, ptr addrspace(1) undef
   %poison.val = call float @llvm.experimental.constrained.ldexp.f32.i32(float poison, i32 %y, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
   store volatile float %poison.val, ptr addrspace(1) undef
-  %poison.undef = call float @llvm.experimental.constrained.ldexp.f32.i32(float poison, i32 undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
-  store volatile float %poison.undef, ptr addrspace(1) undef
-  %undef.poison = call float @llvm.experimental.constrained.ldexp.f32.i32(float undef, i32 poison, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
-  store volatile float %undef.poison, ptr addrspace(1) undef
   ret void
 }
 
@@ -282,8 +256,6 @@ define void @ldexp_f32_0_strictfp(float %x) #0 {
 ; CHECK-NEXT:    store volatile float 0.000000e+00, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    [[UNKNOWN_ZERO4:%.*]] = call float @llvm.ldexp.f32.i32(float [[DENORMAL_0:%.*]], i32 0) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float [[DENORMAL_0]], ptr addrspace(1) undef, align 4
-; CHECK-NEXT:    [[UNKNOWN_UNDEF5:%.*]] = call float @llvm.ldexp.f32.i32(float [[DENORMAL_0]], i32 undef) [ "fp.except"(metadata !"maytrap") ]
-; CHECK-NEXT:    store volatile float [[DENORMAL_0]], ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    [[DENORMAL_06:%.*]] = call float @llvm.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 0) [ "fp.except"(metadata !"maytrap") ]
 ; CHECK-NEXT:    store volatile float 0x380FFFFFC0000000, ptr addrspace(1) undef, align 4
 ; CHECK-NEXT:    [[DENORMAL_17:%.*]] = call float @llvm.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 1) [ "fp.except"(metadata !"maytrap") ]
@@ -302,9 +274,6 @@ define void @ldexp_f32_0_strictfp(float %x) #0 {
   %unknown.zero = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 0, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
   store volatile float %unknown.zero, ptr addrspace(1) undef
 
-  %unknown.undef = call float @llvm.experimental.constrained.ldexp.f32.i32(float %x, i32 undef, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
-  store volatile float %unknown.undef, ptr addrspace(1) undef
-
   %denormal.0 = call float @llvm.experimental.constrained.ldexp.f32.i32(float 0x380FFFFFC0000000, i32 0, metadata !"round.dynamic", metadata !"fpexcept.maytrap") #0
   store volatile float %denormal.0, ptr addrspace(1) undef
 

>From a39aee89a1e6a68f5c70467da8e5f7bd2310d2a0 Mon Sep 17 00:00:00 2001
From: Princeton Ferro <pferro at nvidia.com>
Date: Wed, 15 Apr 2026 03:17:27 -0700
Subject: [PATCH 12/12] [Clang] Update remaining constrained FP test CHECK
 patterns for operand bundle IR

Update CHECK patterns in 27 Clang CodeGen test files across X86 (SSE/AVX/AVX512),
AArch64/ARM (NEON), SystemZ (zvector), and PowerPC targets. Patterns now match
the new operand bundle IR format: @llvm.OP.TYPE(args) #N [ "fp.control"(metadata !"rte") ]
instead of @llvm.experimental.constrained.OP.TYPE(args, metadata !"round.X", metadata !"fpexcept.Y").

Also updates SystemZ zvector assembly patterns where fcmps was consolidated into
fcmp (no longer generating signaling compare instructions under constrained mode),
and PowerPC xvnmsub patterns which now use ppc.fnmsub for both constrained and
unconstrained codegen.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
---
 .../AArch64/neon-intrinsics-constrained.c     |  192 +-
 .../CodeGen/AArch64/neon-misc-constrained.c   |    8 +-
 .../neon-scalar-x-indexed-elem-constrained.c  |   16 +-
 .../v8.2a-fp16-intrinsics-constrained.c       |  591 ++++--
 .../v8.2a-neon-intrinsics-constrained.c       |   68 +-
 .../PowerPC/builtins-ppc-fpconstrained.c      |   44 +-
 .../builtins-systemz-vector-constrained.c     |  104 +-
 .../builtins-systemz-vector2-constrained.c    |   36 +-
 .../builtins-systemz-zvector-constrained.c    |  100 +-
 .../builtins-systemz-zvector2-constrained.c   |  108 +-
 .../builtins-systemz-zvector3-constrained.c   |   16 +-
 .../CodeGen/X86/avx-builtins-constrained.c    |    8 +-
 .../X86/avx512dq-builtins-constrained.c       |   24 +-
 .../X86/avx512f-builtins-constrained.c        |   26 +-
 .../X86/avx512fp16-builtins-constrained.c     |   12 +-
 .../X86/avx512vl-builtins-constrained.c       |  293 ++-
 .../X86/avx512vlfp16-builtins-constrained.c   |    4 +-
 .../CodeGen/X86/f16c-builtins-constrained.c   |   31 +-
 .../CodeGen/X86/fma-builtins-constrained.c    |   48 +-
 .../CodeGen/X86/sse-builtins-constrained.c    |    4 +-
 .../CodeGen/X86/sse2-builtins-constrained.c   |    4 +-
 .../arm-neon-directed-rounding-constrained.c  |    4 +-
 clang/test/CodeGen/arm64-vrnd-constrained.c   |   10 +-
 clang/test/CodeGen/ffp-contract-option.c      |  203 ++-
 clang/test/CodeGen/ffp-model.c                |  647 +++++--
 clang/test/CodeGen/fp16-ops-strictfp.c        | 1617 ++++++++++++-----
 clang/test/CodeGen/pragma-fp-exc.cpp          |   49 +-
 27 files changed, 2980 insertions(+), 1287 deletions(-)

diff --git a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
index ba32cfb7f3bae..d0a96aefe1ac4 100644
--- a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
@@ -24,7 +24,7 @@
 // CONSTRAINED-LABEL: define dso_local <2 x float> @test_vadd_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0:[0-9]+]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3:[0-9]+]]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[V1]], <2 x float> [[V2]]) #[[ATTR3:[0-9]+]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[ADD_I]]
 //
 float32x2_t test_vadd_f32(float32x2_t v1, float32x2_t v2) {
@@ -40,7 +40,7 @@ float32x2_t test_vadd_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <4 x float> @test_vaddq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <4 x float> @llvm.fadd.v4f32(<4 x float> [[V1]], <4 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[ADD_I]]
 //
 float32x4_t test_vaddq_f32(float32x4_t v1, float32x4_t v2) {
@@ -56,7 +56,7 @@ float32x4_t test_vaddq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x float> @test_vsub_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> [[V1]], <2 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[SUB_I]]
 //
 float32x2_t test_vsub_f32(float32x2_t v1, float32x2_t v2) {
@@ -72,7 +72,7 @@ float32x2_t test_vsub_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <4 x float> @test_vsubq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fsub.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <4 x float> @llvm.fsub.v4f32(<4 x float> [[V1]], <4 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[SUB_I]]
 //
 float32x4_t test_vsubq_f32(float32x4_t v1, float32x4_t v2) {
@@ -88,7 +88,7 @@ float32x4_t test_vsubq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x double> @test_vsubq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fsub.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x double> @llvm.fsub.v2f64(<2 x double> [[V1]], <2 x double> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[SUB_I]]
 //
 float64x2_t test_vsubq_f64(float64x2_t v1, float64x2_t v2) {
@@ -104,7 +104,7 @@ float64x2_t test_vsubq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x float> @test_vmul_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x float> @llvm.fmul.v2f32(<2 x float> [[V1]], <2 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[MUL_I]]
 //
 float32x2_t test_vmul_f32(float32x2_t v1, float32x2_t v2) {
@@ -120,7 +120,7 @@ float32x2_t test_vmul_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <4 x float> @test_vmulq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <4 x float> @llvm.fmul.v4f32(<4 x float> [[V1]], <4 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[MUL_I]]
 //
 float32x4_t test_vmulq_f32(float32x4_t v1, float32x4_t v2) {
@@ -136,7 +136,7 @@ float32x4_t test_vmulq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x double> @test_vmulq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> [[V1]], <2 x double> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[MUL_I]]
 //
 float64x2_t test_vmulq_f64(float64x2_t v1, float64x2_t v2) {
@@ -153,8 +153,8 @@ float64x2_t test_vmulq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x float> @test_vmla_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]], <2 x float> noundef [[V3:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> [[V2]], <2 x float> [[V3]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> [[V1]], <2 x float> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x float> @llvm.fmul.v2f32(<2 x float> [[V2]], <2 x float> [[V3]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <2 x float> @llvm.fadd.v2f32(<2 x float> [[V1]], <2 x float> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[ADD_I]]
 //
 float32x2_t test_vmla_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
@@ -171,8 +171,8 @@ float32x2_t test_vmla_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
 // CONSTRAINED-LABEL: define dso_local <4 x float> @test_vmlaq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]], <4 x float> noundef [[V3:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float> [[V2]], <4 x float> [[V3]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float> [[V1]], <4 x float> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <4 x float> @llvm.fmul.v4f32(<4 x float> [[V2]], <4 x float> [[V3]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <4 x float> @llvm.fadd.v4f32(<4 x float> [[V1]], <4 x float> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[ADD_I]]
 //
 float32x4_t test_vmlaq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
@@ -189,8 +189,8 @@ float32x4_t test_vmlaq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
 // CONSTRAINED-LABEL: define dso_local <2 x double> @test_vmlaq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]], <2 x double> noundef [[V3:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[V2]], <2 x double> [[V3]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fadd.v2f64(<2 x double> [[V1]], <2 x double> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> [[V2]], <2 x double> [[V3]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <2 x double> @llvm.fadd.v2f64(<2 x double> [[V1]], <2 x double> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[ADD_I]]
 //
 float64x2_t test_vmlaq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
@@ -207,8 +207,8 @@ float64x2_t test_vmlaq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
 // CONSTRAINED-LABEL: define dso_local <2 x float> @test_vmls_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]], <2 x float> noundef [[V3:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> [[V2]], <2 x float> [[V3]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fsub.v2f32(<2 x float> [[V1]], <2 x float> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x float> @llvm.fmul.v2f32(<2 x float> [[V2]], <2 x float> [[V3]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x float> @llvm.fsub.v2f32(<2 x float> [[V1]], <2 x float> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[SUB_I]]
 //
 float32x2_t test_vmls_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
@@ -225,8 +225,8 @@ float32x2_t test_vmls_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
 // CONSTRAINED-LABEL: define dso_local <4 x float> @test_vmlsq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]], <4 x float> noundef [[V3:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float> [[V2]], <4 x float> [[V3]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fsub.v4f32(<4 x float> [[V1]], <4 x float> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <4 x float> @llvm.fmul.v4f32(<4 x float> [[V2]], <4 x float> [[V3]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <4 x float> @llvm.fsub.v4f32(<4 x float> [[V1]], <4 x float> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[SUB_I]]
 //
 float32x4_t test_vmlsq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
@@ -243,8 +243,8 @@ float32x4_t test_vmlsq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
 // CONSTRAINED-LABEL: define dso_local <2 x double> @test_vmlsq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]], <2 x double> noundef [[V3:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[V2]], <2 x double> [[V3]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fsub.v2f64(<2 x double> [[V1]], <2 x double> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> [[V2]], <2 x double> [[V3]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <2 x double> @llvm.fsub.v2f64(<2 x double> [[V1]], <2 x double> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[SUB_I]]
 //
 float64x2_t test_vmlsq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
@@ -278,7 +278,7 @@ float64x2_t test_vmlsq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <2 x float>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x float>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <2 x float>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x float> @llvm.experimental.constrained.fma.v2f32(<2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x float> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x float> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[TMP9]]
 //
 float32x2_t test_vfma_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
@@ -312,7 +312,7 @@ float32x2_t test_vfma_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <4 x float>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x float>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <4 x float>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x float> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x float> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[TMP9]]
 //
 float32x4_t test_vfmaq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
@@ -346,7 +346,7 @@ float32x4_t test_vfmaq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <2 x double>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x double>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <2 x double>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x double> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x double> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[TMP9]]
 //
 float64x2_t test_vfmaq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
@@ -382,7 +382,7 @@ float64x2_t test_vfmaq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <2 x float>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <2 x float>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <2 x float>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x float> @llvm.experimental.constrained.fma.v2f32(<2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x float> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x float> @llvm.fma.v2f32(<2 x float> [[TMP7]], <2 x float> [[TMP8]], <2 x float> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[TMP9]]
 //
 float32x2_t test_vfms_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
@@ -418,7 +418,7 @@ float32x2_t test_vfms_f32(float32x2_t v1, float32x2_t v2, float32x2_t v3) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <4 x float>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <4 x float>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <4 x float>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x float> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x float> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[TMP9]]
 //
 float32x4_t test_vfmsq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
@@ -454,7 +454,7 @@ float32x4_t test_vfmsq_f32(float32x4_t v1, float32x4_t v2, float32x4_t v3) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <2 x double>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <2 x double>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <2 x double>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x double> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x double> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[TMP9]]
 //
 float64x2_t test_vfmsq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
@@ -470,7 +470,7 @@ float64x2_t test_vfmsq_f64(float64x2_t v1, float64x2_t v2, float64x2_t v3) {
 // CONSTRAINED-LABEL: define dso_local <2 x double> @test_vdivq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <2 x double> @llvm.experimental.constrained.fdiv.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <2 x double> @llvm.fdiv.v2f64(<2 x double> [[V1]], <2 x double> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[DIV_I]]
 //
 float64x2_t test_vdivq_f64(float64x2_t v1, float64x2_t v2) {
@@ -486,7 +486,7 @@ float64x2_t test_vdivq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <4 x float> @test_vdivq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <4 x float> @llvm.experimental.constrained.fdiv.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <4 x float> @llvm.fdiv.v4f32(<4 x float> [[V1]], <4 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[DIV_I]]
 //
 float32x4_t test_vdivq_f32(float32x4_t v1, float32x4_t v2) {
@@ -502,7 +502,7 @@ float32x4_t test_vdivq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x float> @test_vdiv_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <2 x float> @llvm.experimental.constrained.fdiv.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <2 x float> @llvm.fdiv.v2f32(<2 x float> [[V1]], <2 x float> [[V2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x float> [[DIV_I]]
 //
 float32x2_t test_vdiv_f32(float32x2_t v1, float32x2_t v2) {
@@ -519,7 +519,7 @@ float32x2_t test_vdiv_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i32> @test_vceq_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmp.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i32>
 // CONSTRAINED-NEXT:    ret <2 x i32> [[SEXT_I]]
 //
@@ -537,7 +537,7 @@ uint32x2_t test_vceq_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <1 x i64> @test_vceq_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.experimental.constrained.fcmp.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.fcmp.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>
 // CONSTRAINED-NEXT:    ret <1 x i64> [[SEXT_I]]
 //
@@ -555,7 +555,7 @@ uint64x1_t test_vceq_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <4 x i32> @test_vceqq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <4 x i1> [[CMP_I]] to <4 x i32>
 // CONSTRAINED-NEXT:    ret <4 x i32> [[SEXT_I]]
 //
@@ -573,7 +573,7 @@ uint32x4_t test_vceqq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i64> @test_vceqq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>
 // CONSTRAINED-NEXT:    ret <2 x i64> [[SEXT_I]]
 //
@@ -591,7 +591,7 @@ uint64x2_t test_vceqq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i32> @test_vcge_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i32>
 // CONSTRAINED-NEXT:    ret <2 x i32> [[SEXT_I]]
 //
@@ -609,7 +609,7 @@ uint32x2_t test_vcge_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <1 x i64> @test_vcge_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.experimental.constrained.fcmps.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.fcmp.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>
 // CONSTRAINED-NEXT:    ret <1 x i64> [[SEXT_I]]
 //
@@ -627,7 +627,7 @@ uint64x1_t test_vcge_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <4 x i32> @test_vcgeq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <4 x i1> [[CMP_I]] to <4 x i32>
 // CONSTRAINED-NEXT:    ret <4 x i32> [[SEXT_I]]
 //
@@ -645,7 +645,7 @@ uint32x4_t test_vcgeq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i64> @test_vcgeq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>
 // CONSTRAINED-NEXT:    ret <2 x i64> [[SEXT_I]]
 //
@@ -663,7 +663,7 @@ uint64x2_t test_vcgeq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i32> @test_vcle_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i32>
 // CONSTRAINED-NEXT:    ret <2 x i32> [[SEXT_I]]
 //
@@ -681,7 +681,7 @@ uint32x2_t test_vcle_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <1 x i64> @test_vcle_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.experimental.constrained.fcmps.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.fcmp.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>
 // CONSTRAINED-NEXT:    ret <1 x i64> [[SEXT_I]]
 //
@@ -699,7 +699,7 @@ uint64x1_t test_vcle_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <4 x i32> @test_vcleq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <4 x i1> [[CMP_I]] to <4 x i32>
 // CONSTRAINED-NEXT:    ret <4 x i32> [[SEXT_I]]
 //
@@ -717,7 +717,7 @@ uint32x4_t test_vcleq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i64> @test_vcleq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>
 // CONSTRAINED-NEXT:    ret <2 x i64> [[SEXT_I]]
 //
@@ -735,7 +735,7 @@ uint64x2_t test_vcleq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i32> @test_vcgt_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i32>
 // CONSTRAINED-NEXT:    ret <2 x i32> [[SEXT_I]]
 //
@@ -753,7 +753,7 @@ uint32x2_t test_vcgt_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <1 x i64> @test_vcgt_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.experimental.constrained.fcmps.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.fcmp.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>
 // CONSTRAINED-NEXT:    ret <1 x i64> [[SEXT_I]]
 //
@@ -771,7 +771,7 @@ uint64x1_t test_vcgt_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <4 x i32> @test_vcgtq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <4 x i1> [[CMP_I]] to <4 x i32>
 // CONSTRAINED-NEXT:    ret <4 x i32> [[SEXT_I]]
 //
@@ -789,7 +789,7 @@ uint32x4_t test_vcgtq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i64> @test_vcgtq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>
 // CONSTRAINED-NEXT:    ret <2 x i64> [[SEXT_I]]
 //
@@ -807,7 +807,7 @@ uint64x2_t test_vcgtq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i32> @test_vclt_f32(
 // CONSTRAINED-SAME: <2 x float> noundef [[V1:%.*]], <2 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f32(<2 x float> [[V1]], <2 x float> [[V2]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i32>
 // CONSTRAINED-NEXT:    ret <2 x i32> [[SEXT_I]]
 //
@@ -825,7 +825,7 @@ uint32x2_t test_vclt_f32(float32x2_t v1, float32x2_t v2) {
 // CONSTRAINED-LABEL: define dso_local <1 x i64> @test_vclt_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.experimental.constrained.fcmps.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <1 x i1> @llvm.fcmp.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <1 x i1> [[CMP_I]] to <1 x i64>
 // CONSTRAINED-NEXT:    ret <1 x i64> [[SEXT_I]]
 //
@@ -843,7 +843,7 @@ uint64x1_t test_vclt_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <4 x i32> @test_vcltq_f32(
 // CONSTRAINED-SAME: <4 x float> noundef [[V1:%.*]], <4 x float> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <4 x i1> @llvm.fcmp.v4f32(<4 x float> [[V1]], <4 x float> [[V2]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <4 x i1> [[CMP_I]] to <4 x i32>
 // CONSTRAINED-NEXT:    ret <4 x i32> [[SEXT_I]]
 //
@@ -861,7 +861,7 @@ uint32x4_t test_vcltq_f32(float32x4_t v1, float32x4_t v2) {
 // CONSTRAINED-LABEL: define dso_local <2 x i64> @test_vcltq_f64(
 // CONSTRAINED-SAME: <2 x double> noundef [[V1:%.*]], <2 x double> noundef [[V2:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[CMP_I:%.*]] = call <2 x i1> @llvm.fcmp.v2f64(<2 x double> [[V1]], <2 x double> [[V2]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[SEXT_I:%.*]] = sext <2 x i1> [[CMP_I]] to <2 x i64>
 // CONSTRAINED-NEXT:    ret <2 x i64> [[SEXT_I]]
 //
@@ -882,7 +882,7 @@ uint64x2_t test_vcltq_f64(float64x2_t v1, float64x2_t v2) {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[LANE0_I:%.*]] = extractelement <2 x float> [[A]], i64 0
 // CONSTRAINED-NEXT:    [[LANE1_I:%.*]] = extractelement <2 x float> [[A]], i64 1
-// CONSTRAINED-NEXT:    [[VPADDD_I:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float [[LANE0_I]], float [[LANE1_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VPADDD_I:%.*]] = call float @llvm.fadd.f32(float [[LANE0_I]], float [[LANE1_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret float [[VPADDD_I]]
 //
 float32_t test_vpadds_f32(float32x2_t a) {
@@ -902,7 +902,7 @@ float32_t test_vpadds_f32(float32x2_t a) {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[LANE0_I:%.*]] = extractelement <2 x double> [[A]], i64 0
 // CONSTRAINED-NEXT:    [[LANE1_I:%.*]] = extractelement <2 x double> [[A]], i64 1
-// CONSTRAINED-NEXT:    [[VPADDD_I:%.*]] = call double @llvm.experimental.constrained.fadd.f64(double [[LANE0_I]], double [[LANE1_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VPADDD_I:%.*]] = call double @llvm.fadd.f64(double [[LANE0_I]], double [[LANE1_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret double [[VPADDD_I]]
 //
 float64_t test_vpaddd_f64(float64x2_t a) {
@@ -918,7 +918,7 @@ float64_t test_vpaddd_f64(float64x2_t a) {
 // CONSTRAINED-LABEL: define dso_local float @test_vcvts_f32_s32(
 // CONSTRAINED-SAME: i32 noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret float [[TMP0]]
 //
 float32_t test_vcvts_f32_s32(int32_t a) {
@@ -934,7 +934,7 @@ float32_t test_vcvts_f32_s32(int32_t a) {
 // CONSTRAINED-LABEL: define dso_local double @test_vcvtd_f64_s64(
 // CONSTRAINED-SAME: i64 noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.sitofp.f64.i64(i64 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.sitofp.f64.i64(i64 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret double [[TMP0]]
 //
 float64_t test_vcvtd_f64_s64(int64_t a) {
@@ -950,7 +950,7 @@ float64_t test_vcvtd_f64_s64(int64_t a) {
 // CONSTRAINED-LABEL: define dso_local float @test_vcvts_f32_u32(
 // CONSTRAINED-SAME: i32 noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret float [[TMP0]]
 //
 float32_t test_vcvts_f32_u32(uint32_t a) {
@@ -967,7 +967,7 @@ float32_t test_vcvts_f32_u32(uint32_t a) {
 // CONSTRAINED-LABEL: define dso_local double @test_vcvtd_f64_u64(
 // CONSTRAINED-SAME: i64 noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.uitofp.f64.i64(i64 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret double [[TMP0]]
 //
 float64_t test_vcvtd_f64_u64(uint64_t a) {
@@ -984,7 +984,7 @@ float64_t test_vcvtd_f64_u64(uint64_t a) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vceqs_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[A]], float [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float [[B]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCMPD_I]]
 //
@@ -1002,7 +1002,7 @@ uint32_t test_vceqs_f32(float32_t a, float32_t b) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vceqd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A]], double [[B]], metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCMPD_I]]
 //
@@ -1020,7 +1020,7 @@ uint64_t test_vceqd_f64(float64_t a, float64_t b) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vceqzs_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f32(float [[A]], float 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float 0.000000e+00, metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCEQZ_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCEQZ_I]]
 //
@@ -1038,7 +1038,7 @@ uint32_t test_vceqzs_f32(float32_t a) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vceqzd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[A]], double 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double 0.000000e+00, metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCEQZ_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCEQZ_I]]
 //
@@ -1056,7 +1056,7 @@ uint64_t test_vceqzd_f64(float64_t a) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vcges_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float [[B]], metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float [[B]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCMPD_I]]
 //
@@ -1074,7 +1074,7 @@ uint32_t test_vcges_f32(float32_t a, float32_t b) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcged_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCMPD_I]]
 //
@@ -1092,7 +1092,7 @@ uint64_t test_vcged_f64(float64_t a, float64_t b) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vcgezs_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float 0.000000e+00, metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float 0.000000e+00, metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCGEZ_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCGEZ_I]]
 //
@@ -1110,7 +1110,7 @@ uint32_t test_vcgezs_f32(float32_t a) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcgezd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double 0.000000e+00, metadata !"oge", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double 0.000000e+00, metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCGEZ_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCGEZ_I]]
 //
@@ -1128,7 +1128,7 @@ uint64_t test_vcgezd_f64(float64_t a) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vcgts_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float [[B]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float [[B]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCMPD_I]]
 //
@@ -1146,7 +1146,7 @@ uint32_t test_vcgts_f32(float32_t a, float32_t b) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcgtd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCMPD_I]]
 //
@@ -1164,7 +1164,7 @@ uint64_t test_vcgtd_f64(float64_t a, float64_t b) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vcgtzs_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float 0.000000e+00, metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float 0.000000e+00, metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCGTZ_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCGTZ_I]]
 //
@@ -1182,7 +1182,7 @@ uint32_t test_vcgtzs_f32(float32_t a) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcgtzd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double 0.000000e+00, metadata !"ogt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double 0.000000e+00, metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCGTZ_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCGTZ_I]]
 //
@@ -1200,7 +1200,7 @@ uint64_t test_vcgtzd_f64(float64_t a) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vcles_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float [[B]], metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float [[B]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCMPD_I]]
 //
@@ -1218,7 +1218,7 @@ uint32_t test_vcles_f32(float32_t a, float32_t b) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcled_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCMPD_I]]
 //
@@ -1236,7 +1236,7 @@ uint64_t test_vcled_f64(float64_t a, float64_t b) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vclezs_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float 0.000000e+00, metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCLEZ_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCLEZ_I]]
 //
@@ -1254,7 +1254,7 @@ uint32_t test_vclezs_f32(float32_t a) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vclezd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double 0.000000e+00, metadata !"ole", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double 0.000000e+00, metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCLEZ_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCLEZ_I]]
 //
@@ -1272,7 +1272,7 @@ uint64_t test_vclezd_f64(float64_t a) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vclts_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float [[B]], metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float [[B]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCMPD_I]]
 //
@@ -1290,7 +1290,7 @@ uint32_t test_vclts_f32(float32_t a, float32_t b) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcltd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double [[B]], metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double [[B]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCMPD_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCMPD_I]]
 //
@@ -1308,7 +1308,7 @@ uint64_t test_vcltd_f64(float64_t a, float64_t b) {
 // CONSTRAINED-LABEL: define dso_local i32 @test_vcltzs_f32(
 // CONSTRAINED-SAME: float noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f32(float [[A]], float 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f32(float [[A]], float 0.000000e+00, metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCLTZ_I:%.*]] = sext i1 [[TMP0]] to i32
 // CONSTRAINED-NEXT:    ret i32 [[VCLTZ_I]]
 //
@@ -1326,7 +1326,7 @@ uint32_t test_vcltzs_f32(float32_t a) {
 // CONSTRAINED-LABEL: define dso_local i64 @test_vcltzd_f64(
 // CONSTRAINED-SAME: double noundef [[A:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f64(double [[A]], double 0.000000e+00, metadata !"olt", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f64(double [[A]], double 0.000000e+00, metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[VCLTZ_I:%.*]] = sext i1 [[TMP0]] to i64
 // CONSTRAINED-NEXT:    ret i64 [[VCLTZ_I]]
 //
@@ -1343,7 +1343,7 @@ uint64_t test_vcltzd_f64(float64_t a) {
 // CONSTRAINED-LABEL: define dso_local <1 x double> @test_vadd_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fadd.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <1 x double> @llvm.fadd.v1f64(<1 x double> [[A]], <1 x double> [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[ADD_I]]
 //
 float64x1_t test_vadd_f64(float64x1_t a, float64x1_t b) {
@@ -1359,7 +1359,7 @@ float64x1_t test_vadd_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <1 x double> @test_vmul_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fmul.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <1 x double> @llvm.fmul.v1f64(<1 x double> [[A]], <1 x double> [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[MUL_I]]
 //
 float64x1_t test_vmul_f64(float64x1_t a, float64x1_t b) {
@@ -1375,7 +1375,7 @@ float64x1_t test_vmul_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <1 x double> @test_vdiv_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fdiv.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[DIV_I:%.*]] = call <1 x double> @llvm.fdiv.v1f64(<1 x double> [[A]], <1 x double> [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[DIV_I]]
 //
 float64x1_t test_vdiv_f64(float64x1_t a, float64x1_t b) {
@@ -1392,8 +1392,8 @@ float64x1_t test_vdiv_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-LABEL: define dso_local <1 x double> @test_vmla_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]], <1 x double> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fmul.v1f64(<1 x double> [[B]], <1 x double> [[C]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fadd.v1f64(<1 x double> [[A]], <1 x double> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <1 x double> @llvm.fmul.v1f64(<1 x double> [[B]], <1 x double> [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[ADD_I:%.*]] = call <1 x double> @llvm.fadd.v1f64(<1 x double> [[A]], <1 x double> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[ADD_I]]
 //
 float64x1_t test_vmla_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
@@ -1410,8 +1410,8 @@ float64x1_t test_vmla_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
 // CONSTRAINED-LABEL: define dso_local <1 x double> @test_vmls_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]], <1 x double> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fmul.v1f64(<1 x double> [[B]], <1 x double> [[C]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fsub.v1f64(<1 x double> [[A]], <1 x double> [[MUL_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[MUL_I:%.*]] = call <1 x double> @llvm.fmul.v1f64(<1 x double> [[B]], <1 x double> [[C]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <1 x double> @llvm.fsub.v1f64(<1 x double> [[A]], <1 x double> [[MUL_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[SUB_I]]
 //
 float64x1_t test_vmls_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
@@ -1451,7 +1451,7 @@ float64x1_t test_vmls_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <1 x double>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <1 x double>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <1 x double> @llvm.experimental.constrained.fma.v1f64(<1 x double> [[TMP7]], <1 x double> [[TMP8]], <1 x double> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <1 x double> @llvm.fma.v1f64(<1 x double> [[TMP7]], <1 x double> [[TMP8]], <1 x double> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[TMP9]]
 //
 float64x1_t test_vfma_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
@@ -1493,7 +1493,7 @@ float64x1_t test_vfma_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <1 x double>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <1 x double>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <1 x double> @llvm.experimental.constrained.fma.v1f64(<1 x double> [[TMP7]], <1 x double> [[TMP8]], <1 x double> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <1 x double> @llvm.fma.v1f64(<1 x double> [[TMP7]], <1 x double> [[TMP8]], <1 x double> [[TMP6]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[TMP9]]
 //
 float64x1_t test_vfms_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
@@ -1509,7 +1509,7 @@ float64x1_t test_vfms_f64(float64x1_t a, float64x1_t b, float64x1_t c) {
 // CONSTRAINED-LABEL: define dso_local <1 x double> @test_vsub_f64(
 // CONSTRAINED-SAME: <1 x double> noundef [[A:%.*]], <1 x double> noundef [[B:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <1 x double> @llvm.experimental.constrained.fsub.v1f64(<1 x double> [[A]], <1 x double> [[B]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[SUB_I:%.*]] = call <1 x double> @llvm.fsub.v1f64(<1 x double> [[A]], <1 x double> [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[SUB_I]]
 //
 float64x1_t test_vsub_f64(float64x1_t a, float64x1_t b) {
@@ -1533,7 +1533,7 @@ float64x1_t test_vsub_f64(float64x1_t a, float64x1_t b) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VCVTZ_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VCVTZ1_I:%.*]] = call <1 x i64> @llvm.aarch64.neon.fcvtzs.v1i64.v1f64(<1 x double> [[VCVTZ_I]]) #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VCVTZ1_I:%.*]] = call <1 x i64> @llvm.aarch64.neon.fcvtzs.v1i64.v1f64(<1 x double> [[VCVTZ_I]]) #[[ATTR4:[0-9]+]]
 // CONSTRAINED-NEXT:    ret <1 x i64> [[VCVTZ1_I]]
 //
 int64x1_t test_vcvt_s64_f64(float64x1_t a) {
@@ -1557,7 +1557,7 @@ int64x1_t test_vcvt_s64_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VCVTZ_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VCVTZ1_I:%.*]] = call <1 x i64> @llvm.aarch64.neon.fcvtzu.v1i64.v1f64(<1 x double> [[VCVTZ_I]]) #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VCVTZ1_I:%.*]] = call <1 x i64> @llvm.aarch64.neon.fcvtzu.v1i64.v1f64(<1 x double> [[VCVTZ_I]]) #[[ATTR4]]
 // CONSTRAINED-NEXT:    ret <1 x i64> [[VCVTZ1_I]]
 //
 uint64x1_t test_vcvt_u64_f64(float64x1_t a) {
@@ -1577,7 +1577,7 @@ uint64x1_t test_vcvt_u64_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <1 x i64> [[A]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
-// CONSTRAINED-NEXT:    [[VCVT_I:%.*]] = call <1 x double> @llvm.experimental.constrained.sitofp.v1f64.v1i64(<1 x i64> [[TMP1]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VCVT_I:%.*]] = call <1 x double> @llvm.sitofp.v1f64.v1i64(<1 x i64> [[TMP1]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VCVT_I]]
 //
 float64x1_t test_vcvt_f64_s64(int64x1_t a) {
@@ -1597,7 +1597,7 @@ float64x1_t test_vcvt_f64_s64(int64x1_t a) {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <1 x i64> [[A]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <1 x i64>
-// CONSTRAINED-NEXT:    [[VCVT_I:%.*]] = call <1 x double> @llvm.experimental.constrained.uitofp.v1f64.v1i64(<1 x i64> [[TMP1]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VCVT_I:%.*]] = call <1 x double> @llvm.uitofp.v1f64.v1i64(<1 x i64> [[TMP1]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VCVT_I]]
 //
 float64x1_t test_vcvt_f64_u64(uint64x1_t a) {
@@ -1621,7 +1621,7 @@ float64x1_t test_vcvt_f64_u64(uint64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VRNDA_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VRNDA1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.round.v1f64(<1 x double> [[VRNDA_I]], metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VRNDA1_I:%.*]] = call <1 x double> @llvm.round.v1f64(<1 x double> [[VRNDA_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VRNDA1_I]]
 //
 float64x1_t test_vrnda_f64(float64x1_t a) {
@@ -1645,7 +1645,7 @@ float64x1_t test_vrnda_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VRNDP_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VRNDP1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.ceil.v1f64(<1 x double> [[VRNDP_I]], metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VRNDP1_I:%.*]] = call <1 x double> @llvm.ceil.v1f64(<1 x double> [[VRNDP_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VRNDP1_I]]
 //
 float64x1_t test_vrndp_f64(float64x1_t a) {
@@ -1669,7 +1669,7 @@ float64x1_t test_vrndp_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VRNDM_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VRNDM1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.floor.v1f64(<1 x double> [[VRNDM_I]], metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VRNDM1_I:%.*]] = call <1 x double> @llvm.floor.v1f64(<1 x double> [[VRNDM_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VRNDM1_I]]
 //
 float64x1_t test_vrndm_f64(float64x1_t a) {
@@ -1693,7 +1693,7 @@ float64x1_t test_vrndm_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VRNDX_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VRNDX1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.rint.v1f64(<1 x double> [[VRNDX_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VRNDX1_I:%.*]] = call <1 x double> @llvm.rint.v1f64(<1 x double> [[VRNDX_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VRNDX1_I]]
 //
 float64x1_t test_vrndx_f64(float64x1_t a) {
@@ -1717,7 +1717,7 @@ float64x1_t test_vrndx_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VRNDZ_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.trunc.v1f64(<1 x double> [[VRNDZ_I]], metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> [[VRNDZ_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VRNDZ1_I]]
 //
 float64x1_t test_vrnd_f64(float64x1_t a) {
@@ -1741,7 +1741,7 @@ float64x1_t test_vrnd_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[VRNDI_V_I:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VRNDI_V1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.nearbyint.v1f64(<1 x double> [[VRNDI_V_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VRNDI_V1_I:%.*]] = call <1 x double> @llvm.nearbyint.v1f64(<1 x double> [[VRNDI_V_I]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VRNDI_V1_I]]
 //
 float64x1_t test_vrndi_f64(float64x1_t a) {
@@ -1765,7 +1765,7 @@ float64x1_t test_vrndi_f64(float64x1_t a) {
 // CONSTRAINED-NEXT:    [[__P0_ADDR_I_SROA_0_0_VEC_INSERT:%.*]] = insertelement <1 x i64> undef, i64 [[TMP0]], i32 0
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <1 x i64> [[__P0_ADDR_I_SROA_0_0_VEC_INSERT]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <1 x double>
-// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <1 x double> @llvm.experimental.constrained.sqrt.v1f64(<1 x double> [[TMP2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR3]]
+// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <1 x double> @llvm.sqrt.v1f64(<1 x double> [[TMP2]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[VSQRT_I]]
 //
 float64x1_t test_vsqrt_f64(float64x1_t a) {
diff --git a/clang/test/CodeGen/AArch64/neon-misc-constrained.c b/clang/test/CodeGen/AArch64/neon-misc-constrained.c
index 06ecfd91252a1..6983fcbca758b 100644
--- a/clang/test/CodeGen/AArch64/neon-misc-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-misc-constrained.c
@@ -28,7 +28,7 @@
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <2 x double> [[A]] to <2 x i64>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> [[TMP0]] to <16 x i8>
 // CONSTRAINED-NEXT:    [[VRNDA_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDA1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> [[VRNDA_I]], metadata !"fpexcept.strict") #[[ATTR2:[0-9]+]]
+// CONSTRAINED-NEXT:    [[VRNDA1_I:%.*]] = call <2 x double> @llvm.round.v2f64(<2 x double> [[VRNDA_I]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[VRNDA1_I]]
 //
 float64x2_t test_vrndaq_f64(float64x2_t a) {
@@ -51,7 +51,7 @@ float64x2_t test_vrndaq_f64(float64x2_t a) {
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <2 x double> [[A]] to <2 x i64>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> [[TMP0]] to <16 x i8>
 // CONSTRAINED-NEXT:    [[VRNDP_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDP1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> [[VRNDP_I]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VRNDP1_I:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[VRNDP_I]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[VRNDP1_I]]
 //
 float64x2_t test_vrndpq_f64(float64x2_t a) {
@@ -74,7 +74,7 @@ float64x2_t test_vrndpq_f64(float64x2_t a) {
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[A]] to <4 x i32>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <4 x i32> [[TMP0]] to <16 x i8>
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
-// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <4 x float> @llvm.experimental.constrained.sqrt.v4f32(<4 x float> [[TMP2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <4 x float> @llvm.sqrt.v4f32(<4 x float> [[TMP2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <4 x float> [[VSQRT_I]]
 //
 float32x4_t test_vsqrtq_f32(float32x4_t a) {
@@ -97,7 +97,7 @@ float32x4_t test_vsqrtq_f32(float32x4_t a) {
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <2 x double> [[A]] to <2 x i64>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> [[TMP0]] to <16 x i8>
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <2 x double> @llvm.experimental.constrained.sqrt.v2f64(<2 x double> [[TMP2]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <2 x double> [[VSQRT_I]]
 //
 float64x2_t test_vsqrtq_f64(float64x2_t a) {
diff --git a/clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c b/clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c
index d56dc193d7f1e..04056d3553eee 100644
--- a/clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-scalar-x-indexed-elem-constrained.c
@@ -24,7 +24,7 @@
 // CONSTRAINED-SAME: float noundef [[A:%.*]], float noundef [[B:%.*]], <2 x float> noundef [[C:%.*]]) #[[ATTR0:[0-9]+]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x float> [[C]], i32 1
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[B]], float [[EXTRACT]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2:[0-9]+]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.fma.f32(float [[B]], float [[EXTRACT]], float [[A]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret float [[TMP0]]
 //
 float32_t test_vfmas_lane_f32(float32_t a, float32_t b, float32x2_t c) {
@@ -42,7 +42,7 @@ float32_t test_vfmas_lane_f32(float32_t a, float32_t b, float32x2_t c) {
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]], <1 x double> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <1 x double> [[C]], i32 0
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.fma.f64(double [[B]], double [[EXTRACT]], double [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.fma.f64(double [[B]], double [[EXTRACT]], double [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret double [[TMP0]]
 //
 float64_t test_vfmad_lane_f64(float64_t a, float64_t b, float64x1_t c) {
@@ -60,7 +60,7 @@ float64_t test_vfmad_lane_f64(float64_t a, float64_t b, float64x1_t c) {
 // CONSTRAINED-SAME: double noundef [[A:%.*]], double noundef [[B:%.*]], <2 x double> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x double> [[C]], i32 1
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.experimental.constrained.fma.f64(double [[B]], double [[EXTRACT]], double [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call double @llvm.fma.f64(double [[B]], double [[EXTRACT]], double [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret double [[TMP0]]
 //
 float64_t test_vfmad_laneq_f64(float64_t a, float64_t b, float64x2_t c) {
@@ -80,7 +80,7 @@ float64_t test_vfmad_laneq_f64(float64_t a, float64_t b, float64x2_t c) {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[FNEG:%.*]] = fneg float [[B]]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x float> [[C]], i32 1
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.experimental.constrained.fma.f32(float [[FNEG]], float [[EXTRACT]], float [[A]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call float @llvm.fma.f32(float [[FNEG]], float [[EXTRACT]], float [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret float [[TMP0]]
 //
 float32_t test_vfmss_lane_f32(float32_t a, float32_t b, float32x2_t c) {
@@ -122,7 +122,7 @@ float32_t test_vfmss_lane_f32(float32_t a, float32_t b, float32x2_t c) {
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <1 x double> [[TMP6]], <1 x double> [[TMP6]], <1 x i32> zeroinitializer
 // CONSTRAINED-NEXT:    [[FMLA:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
 // CONSTRAINED-NEXT:    [[FMLA1:%.*]] = bitcast <8 x i8> [[TMP3]] to <1 x double>
-// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <1 x double> @llvm.experimental.constrained.fma.v1f64(<1 x double> [[FMLA]], <1 x double> [[LANE]], <1 x double> [[FMLA1]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <1 x double> @llvm.fma.v1f64(<1 x double> [[FMLA]], <1 x double> [[LANE]], <1 x double> [[FMLA1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[FMLA2]]
 //
 float64x1_t test_vfma_lane_f64(float64x1_t a, float64x1_t b, float64x1_t v) {
@@ -166,7 +166,7 @@ float64x1_t test_vfma_lane_f64(float64x1_t a, float64x1_t b, float64x1_t v) {
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <1 x double> [[TMP6]], <1 x double> [[TMP6]], <1 x i32> zeroinitializer
 // CONSTRAINED-NEXT:    [[FMLA:%.*]] = bitcast <8 x i8> [[TMP4]] to <1 x double>
 // CONSTRAINED-NEXT:    [[FMLA1:%.*]] = bitcast <8 x i8> [[TMP3]] to <1 x double>
-// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <1 x double> @llvm.experimental.constrained.fma.v1f64(<1 x double> [[FMLA]], <1 x double> [[LANE]], <1 x double> [[FMLA1]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <1 x double> @llvm.fma.v1f64(<1 x double> [[FMLA]], <1 x double> [[LANE]], <1 x double> [[FMLA1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    ret <1 x double> [[FMLA2]]
 //
 float64x1_t test_vfms_lane_f64(float64x1_t a, float64x1_t b, float64x1_t v) {
@@ -207,7 +207,7 @@ float64x1_t test_vfms_lane_f64(float64x1_t a, float64x1_t b, float64x1_t v) {
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to double
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <2 x double>
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP8]], i32 0
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fma.f64(double [[TMP7]], double [[EXTRACT]], double [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call double @llvm.fma.f64(double [[TMP7]], double [[EXTRACT]], double [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[TMP10:%.*]] = bitcast double [[TMP9]] to <1 x double>
 // CONSTRAINED-NEXT:    ret <1 x double> [[TMP10]]
 //
@@ -251,7 +251,7 @@ float64x1_t test_vfma_laneq_f64(float64x1_t a, float64x1_t b, float64x2_t v) {
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to double
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <2 x double>
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <2 x double> [[TMP8]], i32 0
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call double @llvm.experimental.constrained.fma.f64(double [[TMP7]], double [[EXTRACT]], double [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call double @llvm.fma.f64(double [[TMP7]], double [[EXTRACT]], double [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[TMP10:%.*]] = bitcast double [[TMP9]] to <1 x double>
 // CONSTRAINED-NEXT:    ret <1 x double> [[TMP10]]
 //
diff --git a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
index 4c19d75df96e2..658f00fefd501 100644
--- a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +fullfp16 \
 // RUN: -disable-O0-optnone \
 // RUN: -emit-llvm -o - %s | opt -S -passes=mem2reg \
@@ -11,288 +12,588 @@
 
 #include <arm_fp16.h>
 
-// COMMON-LABEL: test_vceqzh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp oeq half %a, 0xH0000
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f16(half %a, half 0xH0000, metadata !"oeq", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vceqzh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp oeq half [[A]], 0xH0000
+// UNCONSTRAINED-NEXT:    [[VCEQZ:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCEQZ]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vceqzh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half 0xH0000, metadata !"oeq") #[[ATTR3:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCEQZ:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCEQZ]]
+//
 uint16_t test_vceqzh_f16(float16_t a) {
   return vceqzh_f16(a);
 }
 
-// COMMON-LABEL: test_vcgezh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp oge half %a, 0xH0000
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half 0xH0000, metadata !"oge", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcgezh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp oge half [[A]], 0xH0000
+// UNCONSTRAINED-NEXT:    [[VCGEZ:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCGEZ]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcgezh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half 0xH0000, metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCGEZ:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCGEZ]]
+//
 uint16_t test_vcgezh_f16(float16_t a) {
   return vcgezh_f16(a);
 }
 
-// COMMON-LABEL: test_vcgtzh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp ogt half %a, 0xH0000
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half 0xH0000, metadata !"ogt", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcgtzh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp ogt half [[A]], 0xH0000
+// UNCONSTRAINED-NEXT:    [[VCGTZ:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCGTZ]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcgtzh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half 0xH0000, metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCGTZ:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCGTZ]]
+//
 uint16_t test_vcgtzh_f16(float16_t a) {
   return vcgtzh_f16(a);
 }
 
-// COMMON-LABEL: test_vclezh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp ole half %a, 0xH0000
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half 0xH0000, metadata !"ole", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vclezh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp ole half [[A]], 0xH0000
+// UNCONSTRAINED-NEXT:    [[VCLEZ:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCLEZ]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vclezh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half 0xH0000, metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCLEZ:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCLEZ]]
+//
 uint16_t test_vclezh_f16(float16_t a) {
   return vclezh_f16(a);
 }
 
-// COMMON-LABEL: test_vcltzh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp olt half %a, 0xH0000
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half 0xH0000, metadata !"olt", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcltzh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp olt half [[A]], 0xH0000
+// UNCONSTRAINED-NEXT:    [[VCLTZ:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCLTZ]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcltzh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half 0xH0000, metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCLTZ:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCLTZ]]
+//
 uint16_t test_vcltzh_f16(float16_t a) {
   return vcltzh_f16(a);
 }
 
-// COMMON-LABEL: test_vcvth_f16_s16
-// UNCONSTRAINED:  [[VCVT:%.*]] = sitofp i16 %a to half
-// CONSTRAINED:    [[VCVT:%.*]] = call half @llvm.experimental.constrained.sitofp.f16.i16(i16 %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_s16(
+// UNCONSTRAINED-SAME: i16 noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = sitofp i16 [[A]] to half
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_s16(
+// CONSTRAINED-SAME: i16 noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.sitofp.f16.i16(i16 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vcvth_f16_s16 (int16_t a) {
   return vcvth_f16_s16(a);
 }
 
-// COMMON-LABEL: test_vcvth_f16_s32
-// UNCONSTRAINED:  [[VCVT:%.*]] = sitofp i32 %a to half
-// CONSTRAINED:    [[VCVT:%.*]] = call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_s32(
+// UNCONSTRAINED-SAME: i32 noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = sitofp i32 [[A]] to half
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_s32(
+// CONSTRAINED-SAME: i32 noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vcvth_f16_s32 (int32_t a) {
   return vcvth_f16_s32(a);
 }
 
-// COMMON-LABEL: test_vcvth_f16_s64
-// UNCONSTRAINED:  [[VCVT:%.*]] = sitofp i64 %a to half
-// CONSTRAINED:    [[VCVT:%.*]] = call half @llvm.experimental.constrained.sitofp.f16.i64(i64 %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_s64(
+// UNCONSTRAINED-SAME: i64 noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = sitofp i64 [[A]] to half
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_s64(
+// CONSTRAINED-SAME: i64 noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.sitofp.f16.i64(i64 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vcvth_f16_s64 (int64_t a) {
   return vcvth_f16_s64(a);
 }
 
-// COMMON-LABEL: test_vcvth_f16_u16
-// UNCONSTRAINED:  [[VCVT:%.*]] = uitofp i16 %a to half
-// CONSTRAINED:  [[VCVT:%.*]] = call half @llvm.experimental.constrained.uitofp.f16.i16(i16 %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_u16(
+// UNCONSTRAINED-SAME: i16 noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = uitofp i16 [[A]] to half
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_u16(
+// CONSTRAINED-SAME: i16 noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.uitofp.f16.i16(i16 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vcvth_f16_u16 (uint16_t a) {
   return vcvth_f16_u16(a);
 }
 
-// COMMON-LABEL: test_vcvth_f16_u32
-// UNCONSTRAINED:  [[VCVT:%.*]] = uitofp i32 %a to half
-// CONSTRAINED:    [[VCVT:%.*]] = call half @llvm.experimental.constrained.uitofp.f16.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:  ret half [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_u32(
+// UNCONSTRAINED-SAME: i32 noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = uitofp i32 [[A]] to half
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_u32(
+// CONSTRAINED-SAME: i32 noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.uitofp.f16.i32(i32 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vcvth_f16_u32 (uint32_t a) {
   return vcvth_f16_u32(a);
 }
 
-// COMMON-LABEL: test_vcvth_f16_u64
-// UNCONSTRAINED:  [[VCVT:%.*]] = uitofp i64 %a to half
-// CONSTRAINED:    [[VCVT:%.*]] = call half @llvm.experimental.constrained.uitofp.f16.i64(i64 %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_u64(
+// UNCONSTRAINED-SAME: i64 noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = uitofp i64 [[A]] to half
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vcvth_f16_u64(
+// CONSTRAINED-SAME: i64 noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.uitofp.f16.i64(i64 [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vcvth_f16_u64 (uint64_t a) {
   return vcvth_f16_u64(a);
 }
 
-// COMMON-LABEL: test_vcvth_s16_f16
-// COMMONIR:       [[VCVT:%.*]] = call i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half %a)
-// COMMONIR:       ret i16 [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcvth_s16_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[FCVT:%.*]] = call i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret i16 [[FCVT]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcvth_s16_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[FCVT:%.*]] = call i16 @llvm.aarch64.neon.fcvtzs.i16.f16(half [[A]]) #[[ATTR4:[0-9]+]]
+// CONSTRAINED-NEXT:    ret i16 [[FCVT]]
+//
 int16_t test_vcvth_s16_f16 (float16_t a) {
   return vcvth_s16_f16(a);
 }
 
-// COMMON-LABEL: test_vcvth_s32_f16
-// COMMONIR:       [[VCVT:%.*]] = call i32 @llvm.aarch64.neon.fcvtzs.i32.f16(half %a)
-// COMMONIR:       ret i32 [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local i32 @test_vcvth_s32_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VCVTH_S32_F16:%.*]] = call i32 @llvm.aarch64.neon.fcvtzs.i32.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret i32 [[VCVTH_S32_F16]]
+//
+// CONSTRAINED-LABEL: define dso_local i32 @test_vcvth_s32_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VCVTH_S32_F16:%.*]] = call i32 @llvm.aarch64.neon.fcvtzs.i32.f16(half [[A]]) #[[ATTR4]]
+// CONSTRAINED-NEXT:    ret i32 [[VCVTH_S32_F16]]
+//
 int32_t test_vcvth_s32_f16 (float16_t a) {
   return vcvth_s32_f16(a);
 }
 
-// COMMON-LABEL: test_vcvth_s64_f16
-// COMMONIR:       [[VCVT:%.*]] = call i64 @llvm.aarch64.neon.fcvtzs.i64.f16(half %a)
-// COMMONIR:       ret i64 [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local i64 @test_vcvth_s64_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VCVTH_S64_F16:%.*]] = call i64 @llvm.aarch64.neon.fcvtzs.i64.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret i64 [[VCVTH_S64_F16]]
+//
+// CONSTRAINED-LABEL: define dso_local i64 @test_vcvth_s64_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VCVTH_S64_F16:%.*]] = call i64 @llvm.aarch64.neon.fcvtzs.i64.f16(half [[A]]) #[[ATTR4]]
+// CONSTRAINED-NEXT:    ret i64 [[VCVTH_S64_F16]]
+//
 int64_t test_vcvth_s64_f16 (float16_t a) {
   return vcvth_s64_f16(a);
 }
 
-// COMMON-LABEL: test_vcvth_u16_f16
-// COMMONIR:       [[VCVT:%.*]] = call i16 @llvm.aarch64.neon.fcvtzu.i16.f16(half %a)
-// COMMONIR:       ret i16 [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcvth_u16_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[FCVT:%.*]] = call i16 @llvm.aarch64.neon.fcvtzu.i16.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret i16 [[FCVT]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcvth_u16_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[FCVT:%.*]] = call i16 @llvm.aarch64.neon.fcvtzu.i16.f16(half [[A]]) #[[ATTR4]]
+// CONSTRAINED-NEXT:    ret i16 [[FCVT]]
+//
 uint16_t test_vcvth_u16_f16 (float16_t a) {
   return vcvth_u16_f16(a);
 }
 
-// COMMON-LABEL: test_vcvth_u32_f16
-// COMMONIR:       [[VCVT:%.*]] = call i32 @llvm.aarch64.neon.fcvtzu.i32.f16(half %a)
-// COMMONIR:       ret i32 [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local i32 @test_vcvth_u32_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VCVTH_U32_F16:%.*]] = call i32 @llvm.aarch64.neon.fcvtzu.i32.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret i32 [[VCVTH_U32_F16]]
+//
+// CONSTRAINED-LABEL: define dso_local i32 @test_vcvth_u32_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VCVTH_U32_F16:%.*]] = call i32 @llvm.aarch64.neon.fcvtzu.i32.f16(half [[A]]) #[[ATTR4]]
+// CONSTRAINED-NEXT:    ret i32 [[VCVTH_U32_F16]]
+//
 uint32_t test_vcvth_u32_f16 (float16_t a) {
   return vcvth_u32_f16(a);
 }
 
-// COMMON-LABEL: test_vcvth_u64_f16
-// COMMONIR:       [[VCVT:%.*]] = call i64 @llvm.aarch64.neon.fcvtzu.i64.f16(half %a)
-// COMMONIR:       ret i64 [[VCVT]]
+// UNCONSTRAINED-LABEL: define dso_local i64 @test_vcvth_u64_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VCVTH_U64_F16:%.*]] = call i64 @llvm.aarch64.neon.fcvtzu.i64.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret i64 [[VCVTH_U64_F16]]
+//
+// CONSTRAINED-LABEL: define dso_local i64 @test_vcvth_u64_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VCVTH_U64_F16:%.*]] = call i64 @llvm.aarch64.neon.fcvtzu.i64.f16(half [[A]]) #[[ATTR4]]
+// CONSTRAINED-NEXT:    ret i64 [[VCVTH_U64_F16]]
+//
 uint64_t test_vcvth_u64_f16 (float16_t a) {
   return vcvth_u64_f16(a);
 }
 
-// COMMON-LABEL: test_vrndh_f16
-// UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.trunc.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[RND]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vrndh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VRNDZ:%.*]] = call half @llvm.trunc.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VRNDZ]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vrndh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VRNDZ:%.*]] = call half @llvm.trunc.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VRNDZ]]
+//
 float16_t test_vrndh_f16(float16_t a) {
   return vrndh_f16(a);
 }
 
-// COMMON-LABEL: test_vrndah_f16
-// UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.round.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.round.f16(half %a, metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[RND]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vrndah_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VRNDA:%.*]] = call half @llvm.round.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VRNDA]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vrndah_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VRNDA:%.*]] = call half @llvm.round.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VRNDA]]
+//
 float16_t test_vrndah_f16(float16_t a) {
   return vrndah_f16(a);
 }
 
-// COMMON-LABEL: test_vrndih_f16
-// UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.nearbyint.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.nearbyint.f16(half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[RND]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vrndih_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VRNDI:%.*]] = call half @llvm.nearbyint.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VRNDI]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vrndih_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VRNDI:%.*]] = call half @llvm.nearbyint.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VRNDI]]
+//
 float16_t test_vrndih_f16(float16_t a) {
   return vrndih_f16(a);
 }
 
-// COMMON-LABEL: test_vrndmh_f16
-// UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.floor.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.floor.f16(half %a, metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[RND]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vrndmh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VRNDM:%.*]] = call half @llvm.floor.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VRNDM]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vrndmh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VRNDM:%.*]] = call half @llvm.floor.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VRNDM]]
+//
 float16_t test_vrndmh_f16(float16_t a) {
   return vrndmh_f16(a);
 }
 
-// COMMON-LABEL: test_vrndph_f16
-// UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.ceil.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.ceil.f16(half %a, metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[RND]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vrndph_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VRNDP:%.*]] = call half @llvm.ceil.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VRNDP]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vrndph_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VRNDP:%.*]] = call half @llvm.ceil.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VRNDP]]
+//
 float16_t test_vrndph_f16(float16_t a) {
   return vrndph_f16(a);
 }
 
-// COMMON-LABEL: test_vrndxh_f16
-// UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.rint.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.rint.f16(half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[RND]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vrndxh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VRNDX:%.*]] = call half @llvm.rint.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VRNDX]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vrndxh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VRNDX:%.*]] = call half @llvm.rint.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VRNDX]]
+//
 float16_t test_vrndxh_f16(float16_t a) {
   return vrndxh_f16(a);
 }
 
-// COMMON-LABEL: test_vsqrth_f16
-// UNCONSTRAINED:  [[SQR:%.*]] = call half @llvm.sqrt.f16(half %a)
-// CONSTRAINED:    [[SQR:%.*]] = call half @llvm.experimental.constrained.sqrt.f16(half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[SQR]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vsqrth_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VSQRT:%.*]] = call half @llvm.sqrt.f16(half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[VSQRT]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vsqrth_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VSQRT:%.*]] = call half @llvm.sqrt.f16(half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VSQRT]]
+//
 float16_t test_vsqrth_f16(float16_t a) {
   return vsqrth_f16(a);
 }
 
-// COMMON-LABEL: test_vaddh_f16
-// UNCONSTRAINED:  [[ADD:%.*]] = fadd half %a, %b
-// CONSTRAINED:    [[ADD:%.*]] = call half @llvm.experimental.constrained.fadd.f16(half %a, half %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[ADD]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vaddh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VADDH:%.*]] = fadd half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    ret half [[VADDH]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vaddh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VADDH:%.*]] = call half @llvm.fadd.f16(half [[A]], half [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VADDH]]
+//
 float16_t test_vaddh_f16(float16_t a, float16_t b) {
   return vaddh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vceqh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp oeq half %a, %b
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f16(half %a, half %b, metadata !"oeq", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vceqh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp oeq half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vceqh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half [[B]], metadata !"oeq") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
 uint16_t test_vceqh_f16(float16_t a, float16_t b) {
   return vceqh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vcgeh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp oge half %a, %b
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half %b, metadata !"oge", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcgeh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp oge half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcgeh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half [[B]], metadata !"oge") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
 uint16_t test_vcgeh_f16(float16_t a, float16_t b) {
   return vcgeh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vcgth_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp ogt half %a, %b
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half %b, metadata !"ogt", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcgth_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp ogt half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcgth_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half [[B]], metadata !"ogt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
 uint16_t test_vcgth_f16(float16_t a, float16_t b) {
   return vcgth_f16(a, b);
 }
 
-// COMMON-LABEL: test_vcleh_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp ole half %a, %b
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half %b, metadata !"ole", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vcleh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp ole half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vcleh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half [[B]], metadata !"ole") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
 uint16_t test_vcleh_f16(float16_t a, float16_t b) {
   return vcleh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vclth_f16
-// UNCONSTRAINED:  [[TMP1:%.*]] = fcmp olt half %a, %b
-// CONSTRAINED:    [[TMP1:%.*]] = call i1 @llvm.experimental.constrained.fcmps.f16(half %a, half %b, metadata !"olt", metadata !"fpexcept.strict")
-// COMMONIR:       [[TMP2:%.*]] = sext i1 [[TMP1]] to i16
-// COMMONIR:       ret i16 [[TMP2]]
+// UNCONSTRAINED-LABEL: define dso_local i16 @test_vclth_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fcmp olt half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// UNCONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
+// CONSTRAINED-LABEL: define dso_local i16 @test_vclth_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call i1 @llvm.fcmp.f16(half [[A]], half [[B]], metadata !"olt") #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[VCMPD:%.*]] = sext i1 [[TMP0]] to i16
+// CONSTRAINED-NEXT:    ret i16 [[VCMPD]]
+//
 uint16_t test_vclth_f16(float16_t a, float16_t b) {
   return vclth_f16(a, b);
 }
 
-// COMMON-LABEL: test_vdivh_f16
-// UNCONSTRAINED:  [[DIV:%.*]] = fdiv half %a, %b
-// CONSTRAINED:    [[DIV:%.*]] = call half @llvm.experimental.constrained.fdiv.f16(half %a, half %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[DIV]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vdivh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VDIVH:%.*]] = fdiv half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    ret half [[VDIVH]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vdivh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VDIVH:%.*]] = call half @llvm.fdiv.f16(half [[A]], half [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VDIVH]]
+//
 float16_t test_vdivh_f16(float16_t a, float16_t b) {
   return vdivh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vmulh_f16
-// UNCONSTRAINED:  [[MUL:%.*]] = fmul half %a, %b
-// CONSTRAINED:  [[MUL:%.*]] = call half @llvm.experimental.constrained.fmul.f16(half %a, half %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[MUL]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vmulh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VMULH:%.*]] = fmul half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    ret half [[VMULH]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vmulh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VMULH:%.*]] = call half @llvm.fmul.f16(half [[A]], half [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VMULH]]
+//
 float16_t test_vmulh_f16(float16_t a, float16_t b) {
   return vmulh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vsubh_f16
-// UNCONSTRAINED:  [[SUB:%.*]] = fsub half %a, %b
-// CONSTRAINED:    [[SUB:%.*]] = call half @llvm.experimental.constrained.fsub.f16(half %a, half %b, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[SUB]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vsubh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VSUBH:%.*]] = fsub half [[A]], [[B]]
+// UNCONSTRAINED-NEXT:    ret half [[VSUBH]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vsubh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VSUBH:%.*]] = call half @llvm.fsub.f16(half [[A]], half [[B]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[VSUBH]]
+//
 float16_t test_vsubh_f16(float16_t a, float16_t b) {
   return vsubh_f16(a, b);
 }
 
-// COMMON-LABEL: test_vfmah_f16
-// UNCONSTRAINED:  [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half %c, half %a)
-// CONSTRAINED:    [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half %b, half %c, half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[FMA]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vfmah_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], half noundef [[C:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[B]], half [[C]], half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vfmah_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], half noundef [[C:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[B]], half [[C]], half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vfmah_f16(float16_t a, float16_t b, float16_t c) {
   return vfmah_f16(a, b, c);
 }
 
-// COMMON-LABEL: test_vfmsh_f16
-// COMMONIR:  [[SUB:%.*]] = fneg half %b
-// UNCONSTRAINED:  [[ADD:%.*]] = call half @llvm.fma.f16(half [[SUB]], half %c, half %a)
-// CONSTRAINED:    [[ADD:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[SUB]], half %c, half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// COMMONIR:       ret half [[ADD]]
+// UNCONSTRAINED-LABEL: define dso_local half @test_vfmsh_f16(
+// UNCONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], half noundef [[C:%.*]]) #[[ATTR0]] {
+// UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// UNCONSTRAINED-NEXT:    [[VSUBH:%.*]] = fneg half [[B]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[VSUBH]], half [[C]], half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
+//
+// CONSTRAINED-LABEL: define dso_local half @test_vfmsh_f16(
+// CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], half noundef [[C:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[VSUBH:%.*]] = fneg half [[B]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[VSUBH]], half [[C]], half [[A]]) #[[ATTR3]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
+//
 float16_t test_vfmsh_f16(float16_t a, float16_t b, float16_t c) {
   return vfmsh_f16(a, b, c);
 }
 
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// COMMON: {{.*}}
+// COMMONIR: {{.*}}
diff --git a/clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c
index 02ddbf2950829..d810ee6b0528b 100644
--- a/clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/v8.2a-neon-intrinsics-constrained.c
@@ -36,7 +36,7 @@
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <4 x half> [[A]] to <4 x i16>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <4 x i16> [[TMP0]] to <8 x i8>
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
-// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <4 x half> @llvm.experimental.constrained.sqrt.v4f16(<4 x half> [[TMP2]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2:[0-9]+]]
+// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <4 x half> @llvm.sqrt.v4f16(<4 x half> [[TMP2]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[VSQRT_I]]
 //
 float16x4_t test_vsqrt_f16(float16x4_t a) {
@@ -58,7 +58,7 @@ float16x4_t test_vsqrt_f16(float16x4_t a) {
 // CONSTRAINED-NEXT:    [[TMP0:%.*]] = bitcast <8 x half> [[A]] to <8 x i16>
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = bitcast <8 x i16> [[TMP0]] to <16 x i8>
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
-// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <8 x half> @llvm.experimental.constrained.sqrt.v8f16(<8 x half> [[TMP2]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VSQRT_I:%.*]] = call <8 x half> @llvm.sqrt.v8f16(<8 x half> [[TMP2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[VSQRT_I]]
 //
 float16x8_t test_vsqrtq_f16(float16x8_t a) {
@@ -92,7 +92,7 @@ float16x8_t test_vsqrtq_f16(float16x8_t a) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[TMP9]]
 //
 float16x4_t test_vfma_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
@@ -126,7 +126,7 @@ float16x4_t test_vfma_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[TMP9]]
 //
 float16x8_t test_vfmaq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
@@ -162,7 +162,7 @@ float16x8_t test_vfmaq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[TMP9]]
 //
 float16x4_t test_vfms_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
@@ -198,7 +198,7 @@ float16x4_t test_vfms_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[TMP9]]
 //
 float16x8_t test_vfmsq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
@@ -234,7 +234,7 @@ float16x8_t test_vfmsq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <4 x half> [[TMP6]], <4 x half> [[TMP6]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
 // CONSTRAINED-NEXT:    [[FMLA:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[FMLA1:%.*]] = bitcast <8 x i8> [[TMP3]] to <4 x half>
-// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[FMLA]], <4 x half> [[LANE]], <4 x half> [[FMLA1]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[FMLA]], <4 x half> [[LANE]], <4 x half> [[FMLA1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[FMLA2]]
 //
 float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
@@ -270,7 +270,7 @@ float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <4 x half> [[TMP6]], <4 x half> [[TMP6]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
 // CONSTRAINED-NEXT:    [[FMLA:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[FMLA1:%.*]] = bitcast <16 x i8> [[TMP3]] to <8 x half>
-// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[FMLA]], <8 x half> [[LANE]], <8 x half> [[FMLA1]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[FMLA]], <8 x half> [[LANE]], <8 x half> [[FMLA1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[FMLA2]]
 //
 float16x8_t test_vfmaq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
@@ -306,7 +306,7 @@ float16x8_t test_vfmaq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <8 x half> [[TMP8]], <8 x half> [[TMP8]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[LANE]], <4 x half> [[TMP7]], <4 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[LANE]], <4 x half> [[TMP7]], <4 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[TMP9]]
 //
 float16x4_t test_vfma_laneq_f16(float16x4_t a, float16x4_t b, float16x8_t c) {
@@ -342,7 +342,7 @@ float16x4_t test_vfma_laneq_f16(float16x4_t a, float16x4_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <8 x half> [[TMP8]], <8 x half> [[TMP8]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP7]], <8 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP7]], <8 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[TMP9]]
 //
 float16x8_t test_vfmaq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
@@ -384,7 +384,7 @@ float16x8_t test_vfmaq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[TMP9]]
 //
 float16x4_t test_vfma_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
@@ -434,7 +434,7 @@ float16x4_t test_vfma_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[TMP9]]
 //
 float16x8_t test_vfmaq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
@@ -452,7 +452,7 @@ float16x8_t test_vfmaq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
 // CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], <4 x half> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x half> [[C]], i32 3
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[B]], half [[EXTRACT]], half [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[B]], half [[EXTRACT]], half [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret half [[TMP0]]
 //
 float16_t test_vfmah_lane_f16(float16_t a, float16_t b, float16x4_t c) {
@@ -470,7 +470,7 @@ float16_t test_vfmah_lane_f16(float16_t a, float16_t b, float16x4_t c) {
 // CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], <8 x half> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <8 x half> [[C]], i32 7
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[B]], half [[EXTRACT]], half [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[B]], half [[EXTRACT]], half [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret half [[TMP0]]
 //
 float16_t test_vfmah_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
@@ -508,7 +508,7 @@ float16_t test_vfmah_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <4 x half> [[TMP6]], <4 x half> [[TMP6]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
 // CONSTRAINED-NEXT:    [[FMLA:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[FMLA1:%.*]] = bitcast <8 x i8> [[TMP3]] to <4 x half>
-// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[FMLA]], <4 x half> [[LANE]], <4 x half> [[FMLA1]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[FMLA]], <4 x half> [[LANE]], <4 x half> [[FMLA1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[FMLA2]]
 //
 float16x4_t test_vfms_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
@@ -546,7 +546,7 @@ float16x4_t test_vfms_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <4 x half> [[TMP6]], <4 x half> [[TMP6]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
 // CONSTRAINED-NEXT:    [[FMLA:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[FMLA1:%.*]] = bitcast <16 x i8> [[TMP3]] to <8 x half>
-// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[FMLA]], <8 x half> [[LANE]], <8 x half> [[FMLA1]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[FMLA2:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[FMLA]], <8 x half> [[LANE]], <8 x half> [[FMLA1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[FMLA2]]
 //
 float16x8_t test_vfmsq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
@@ -584,7 +584,7 @@ float16x8_t test_vfmsq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <8 x half> [[TMP8]], <8 x half> [[TMP8]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[LANE]], <4 x half> [[TMP7]], <4 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[LANE]], <4 x half> [[TMP7]], <4 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[TMP9]]
 //
 float16x4_t test_vfms_laneq_f16(float16x4_t a, float16x4_t b, float16x8_t c) {
@@ -622,7 +622,7 @@ float16x4_t test_vfms_laneq_f16(float16x4_t a, float16x4_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
 // CONSTRAINED-NEXT:    [[LANE:%.*]] = shufflevector <8 x half> [[TMP8]], <8 x half> [[TMP8]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP7]], <8 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP7]], <8 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[TMP9]]
 //
 float16x8_t test_vfmsq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
@@ -666,7 +666,7 @@ float16x8_t test_vfmsq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i8> [[TMP3]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], <4 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <4 x half> [[TMP9]]
 //
 float16x4_t test_vfms_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
@@ -718,7 +718,7 @@ float16x4_t test_vfms_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
 // CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <16 x i8> [[TMP3]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
 // CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
-// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], <8 x half> [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
 // CONSTRAINED-NEXT:    ret <8 x half> [[TMP9]]
 //
 float16x8_t test_vfmsq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
@@ -730,20 +730,20 @@ float16x8_t test_vfmsq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
 // UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // UNCONSTRAINED-NEXT:    [[CONV:%.*]] = fpext half [[B]] to float
 // UNCONSTRAINED-NEXT:    [[FNEG:%.*]] = fneg float [[CONV]]
-// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fptrunc float [[FNEG]] to half
+// UNCONSTRAINED-NEXT:    [[CONV1:%.*]] = fptrunc float [[FNEG]] to half
 // UNCONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x half> [[C]], i32 3
-// UNCONSTRAINED-NEXT:    [[TMP1:%.*]] = call half @llvm.fma.f16(half [[TMP0]], half [[EXTRACT]], half [[A]])
-// UNCONSTRAINED-NEXT:    ret half [[TMP1]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[CONV1]], half [[EXTRACT]], half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
 //
 // CONSTRAINED-LABEL: define dso_local half @test_vfmsh_lane_f16(
 // CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], <4 x half> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CONV:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[B]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[CONV:%.*]] = call float @llvm.fpext.f32.f16(half [[B]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[FNEG:%.*]] = fneg float [[CONV]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float [[FNEG]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[CONV1:%.*]] = call half @llvm.fptrunc.f16.f32(float [[FNEG]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <4 x half> [[C]], i32 3
-// CONSTRAINED-NEXT:    [[TMP1:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[TMP0]], half [[EXTRACT]], half [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
-// CONSTRAINED-NEXT:    ret half [[TMP1]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[CONV1]], half [[EXTRACT]], half [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
 //
 float16_t test_vfmsh_lane_f16(float16_t a, float16_t b, float16x4_t c) {
   return vfmsh_lane_f16(a, b, c, 3);
@@ -754,20 +754,20 @@ float16_t test_vfmsh_lane_f16(float16_t a, float16_t b, float16x4_t c) {
 // UNCONSTRAINED-NEXT:  [[ENTRY:.*:]]
 // UNCONSTRAINED-NEXT:    [[CONV:%.*]] = fpext half [[B]] to float
 // UNCONSTRAINED-NEXT:    [[FNEG:%.*]] = fneg float [[CONV]]
-// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = fptrunc float [[FNEG]] to half
+// UNCONSTRAINED-NEXT:    [[CONV1:%.*]] = fptrunc float [[FNEG]] to half
 // UNCONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <8 x half> [[C]], i32 7
-// UNCONSTRAINED-NEXT:    [[TMP1:%.*]] = call half @llvm.fma.f16(half [[TMP0]], half [[EXTRACT]], half [[A]])
-// UNCONSTRAINED-NEXT:    ret half [[TMP1]]
+// UNCONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[CONV1]], half [[EXTRACT]], half [[A]])
+// UNCONSTRAINED-NEXT:    ret half [[TMP0]]
 //
 // CONSTRAINED-LABEL: define dso_local half @test_vfmsh_laneq_f16(
 // CONSTRAINED-SAME: half noundef [[A:%.*]], half noundef [[B:%.*]], <8 x half> noundef [[C:%.*]]) #[[ATTR0]] {
 // CONSTRAINED-NEXT:  [[ENTRY:.*:]]
-// CONSTRAINED-NEXT:    [[CONV:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half [[B]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[CONV:%.*]] = call float @llvm.fpext.f32.f16(half [[B]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[FNEG:%.*]] = fneg float [[CONV]]
-// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float [[FNEG]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[CONV1:%.*]] = call half @llvm.fptrunc.f16.f32(float [[FNEG]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    [[EXTRACT:%.*]] = extractelement <8 x half> [[C]], i32 7
-// CONSTRAINED-NEXT:    [[TMP1:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[TMP0]], half [[EXTRACT]], half [[A]], metadata !"round.tonearest", metadata !"fpexcept.maytrap") #[[ATTR2]]
-// CONSTRAINED-NEXT:    ret half [[TMP1]]
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = call half @llvm.fma.f16(half [[CONV1]], half [[EXTRACT]], half [[A]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CONSTRAINED-NEXT:    ret half [[TMP0]]
 //
 float16_t test_vfmsh_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
   return vfmsh_laneq_f16(a, b, c, 7);
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
index b46fa9f2cf157..6a4d46e584608 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
@@ -25,92 +25,92 @@ void test_float(void) {
   vf = __builtin_vsx_xvsqrtsp(vf);
   // CHECK-LABEL: try-xvsqrtsp
   // CHECK-UNCONSTRAINED: @llvm.sqrt.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.sqrt.v4f32(<4 x float> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.sqrt.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvsqrtsp
 
   vd = __builtin_vsx_xvsqrtdp(vd);
   // CHECK-LABEL: try-xvsqrtdp
   // CHECK-UNCONSTRAINED: @llvm.sqrt.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.sqrt.v2f64(<2 x double> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.sqrt.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvsqrtdp
 
   vf = __builtin_vsx_xvrspim(vf);
   // CHECK-LABEL: try-xvrspim
   // CHECK-UNCONSTRAINED: @llvm.floor.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.floor.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrspim
 
   vd = __builtin_vsx_xvrdpim(vd);
   // CHECK-LABEL: try-xvrdpim
   // CHECK-UNCONSTRAINED: @llvm.floor.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.floor.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrdpim
 
   vf = __builtin_vsx_xvrspi(vf);
   // CHECK-LABEL: try-xvrspi
   // CHECK-UNCONSTRAINED: @llvm.round.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.round.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.round.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrspi
 
   vd = __builtin_vsx_xvrdpi(vd);
   // CHECK-LABEL: try-xvrdpi
   // CHECK-UNCONSTRAINED: @llvm.round.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.round.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.round.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrdpi
 
   vf = __builtin_vsx_xvrspic(vf);
   // CHECK-LABEL: try-xvrspic
   // CHECK-UNCONSTRAINED: @llvm.rint.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.rint.v4f32(<4 x float> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.rint.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrspic
 
   vd = __builtin_vsx_xvrdpic(vd);
   // CHECK-LABEL: try-xvrdpic
   // CHECK-UNCONSTRAINED: @llvm.rint.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.rint.v2f64(<2 x double> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.rint.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrdpic
 
   vf = __builtin_vsx_xvrspip(vf);
   // CHECK-LABEL: try-xvrspip
   // CHECK-UNCONSTRAINED: @llvm.ceil.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.ceil.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrspip
 
   vd = __builtin_vsx_xvrdpip(vd);
   // CHECK-LABEL: try-xvrdpip
   // CHECK-UNCONSTRAINED: @llvm.ceil.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.ceil.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrdpip
 
   vf = __builtin_vsx_xvrspiz(vf);
   // CHECK-LABEL: try-xvrspiz
   // CHECK-UNCONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrspiz
 
   vd = __builtin_vsx_xvrdpiz(vd);
   // CHECK-LABEL: try-xvrdpiz
   // CHECK-UNCONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvrdpiz
 
   vf = __builtin_vsx_xvmaddasp(vf, vf, vf);
   // CHECK-LABEL: try-xvmaddasp
   // CHECK-UNCONSTRAINED: @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvmaddasp
 
   vd = __builtin_vsx_xvmaddadp(vd, vd, vd);
   // CHECK-LABEL: try-xvmaddadp
   // CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvmaddadp
 
   vf = __builtin_vsx_xvnmaddasp(vf, vf, vf);
   // CHECK-LABEL: try-xvnmaddasp
   // CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
   // CHECK-UNCONSTRAINED: fneg <4 x float> [[RESULT]]
-  // CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-CONSTRAINED: fneg <4 x float> [[RESULT]]
   // NOT-FIXME-CHECK: xvnmaddasp
   // FIXME-CHECK: xvmaddasp
@@ -120,7 +120,7 @@ void test_float(void) {
   // CHECK-LABEL: try-xvnmaddadp
   // CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
   // CHECK-UNCONSTRAINED: fneg <2 x double> [[RESULT]]
-  // CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-CONSTRAINED: fneg <2 x double> [[RESULT]]
   // CHECK-ASM: xvnmaddadp
 
@@ -129,7 +129,7 @@ void test_float(void) {
   // CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}
   // CHECK-UNCONSTRAINED: @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[RESULT]])
   // CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <4 x float> %{{.*}}
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[RESULT]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: xvmsubasp
 
   vd = __builtin_vsx_xvmsubadp(vd, vd, vd);
@@ -137,22 +137,18 @@ void test_float(void) {
   // CHECK-UNCONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
   // CHECK-UNCONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[RESULT]])
   // CHECK-CONSTRAINED: [[RESULT:%[^ ]+]] = fneg <2 x double> %{{.*}}
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[RESULT]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[RESULT]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM:  xvmsubadp
 
   vf = __builtin_vsx_xvnmsubasp(vf, vf, vf);
   // CHECK-LABEL: try-xvnmsubasp
   // CHECK-UNCONSTRAINED: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <4 x float> %{{.*}}
-  // CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK-CONSTRAINED: fneg <4 x float> [[RESULT1]]
+  // CHECK-CONSTRAINED: call <4 x float> @llvm.ppc.fnmsub.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
   // CHECK-ASM: xvnmsubasp
 
   vd = __builtin_vsx_xvnmsubadp(vd, vd, vd);
   // CHECK-LABEL: try-xvnmsubadp
   // CHECK-UNCONSTRAINED: call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: [[RESULT0:%[^ ]+]] = fneg <2 x double> %{{.*}}
-  // CHECK-CONSTRAINED: [[RESULT1:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[RESULT0]], metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
+  // CHECK-CONSTRAINED: call <2 x double> @llvm.ppc.fnmsub.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
   // CHECK-ASM: xvnmsubadp
 }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
index ff24ef9a091b7..e352184619708 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // REQUIRES: systemz-registered-target
 // RUN: %clang_cc1 -target-cpu z13 -triple s390x-ibm-linux -flax-vector-conversions=none \
 // RUN: -ffp-exception-behavior=strict -Wall -Wno-unused -Werror -emit-llvm %s -o - | FileCheck %s
@@ -10,48 +11,111 @@ volatile vec_double vd;
 
 int cc;
 
+// CHECK-LABEL: define dso_local void @test_float(
+// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*:]]
+// CHECK-NEXT:    [[TMP0:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP2:%.*]] = call { <2 x i64>, i32 } @llvm.s390.vfcedbs(<2 x double> [[TMP0]], <2 x double> [[TMP1]]) #[[ATTR3:[0-9]+]]
+// CHECK-NEXT:    [[TMP3:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP2]], 1
+// CHECK-NEXT:    store i32 [[TMP3]], ptr @cc, align 4
+// CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP2]], 0
+// CHECK-NEXT:    store volatile <2 x i64> [[TMP4]], ptr @vsl, align 8
+// CHECK-NEXT:    [[TMP5:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP6:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP7:%.*]] = call { <2 x i64>, i32 } @llvm.s390.vfchdbs(<2 x double> [[TMP5]], <2 x double> [[TMP6]]) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP8:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP7]], 1
+// CHECK-NEXT:    store i32 [[TMP8]], ptr @cc, align 4
+// CHECK-NEXT:    [[TMP9:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP7]], 0
+// CHECK-NEXT:    store volatile <2 x i64> [[TMP9]], ptr @vsl, align 8
+// CHECK-NEXT:    [[TMP10:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP11:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP12:%.*]] = call { <2 x i64>, i32 } @llvm.s390.vfchedbs(<2 x double> [[TMP10]], <2 x double> [[TMP11]]) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP13:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP12]], 1
+// CHECK-NEXT:    store i32 [[TMP13]], ptr @cc, align 4
+// CHECK-NEXT:    [[TMP14:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP12]], 0
+// CHECK-NEXT:    store volatile <2 x i64> [[TMP14]], ptr @vsl, align 8
+// CHECK-NEXT:    [[TMP15:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP16:%.*]] = call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> [[TMP15]], i32 0) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP17:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP16]], 1
+// CHECK-NEXT:    store i32 [[TMP17]], ptr @cc, align 4
+// CHECK-NEXT:    [[TMP18:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP16]], 0
+// CHECK-NEXT:    store volatile <2 x i64> [[TMP18]], ptr @vsl, align 8
+// CHECK-NEXT:    [[TMP19:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP20:%.*]] = call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> [[TMP19]], i32 4095) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP21:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP20]], 1
+// CHECK-NEXT:    store i32 [[TMP21]], ptr @cc, align 4
+// CHECK-NEXT:    [[TMP22:%.*]] = extractvalue { <2 x i64>, i32 } [[TMP20]], 0
+// CHECK-NEXT:    store volatile <2 x i64> [[TMP22]], ptr @vsl, align 8
+// CHECK-NEXT:    [[TMP23:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP24:%.*]] = call <2 x double> @llvm.sqrt.v2f64(<2 x double> [[TMP23]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP24]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP25:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP26:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP27:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP28:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[TMP25]], <2 x double> [[TMP26]], <2 x double> [[TMP27]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP28]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP29:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP30:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP31:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[NEG:%.*]] = fneg <2 x double> [[TMP31]]
+// CHECK-NEXT:    [[TMP32:%.*]] = call <2 x double> @llvm.fma.v2f64(<2 x double> [[TMP29]], <2 x double> [[TMP30]], <2 x double> [[NEG]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP32]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP33:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP34:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP33]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP34]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP35:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP36:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[TMP35]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    [[NEG1:%.*]] = fneg <2 x double> [[TMP36]]
+// CHECK-NEXT:    store volatile <2 x double> [[NEG1]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP37:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP38:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[TMP37]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP38]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP39:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP40:%.*]] = call <2 x double> @llvm.nearbyint.v2f64(<2 x double> [[TMP39]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP40]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP41:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP42:%.*]] = call <2 x double> @llvm.round.v2f64(<2 x double> [[TMP41]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP42]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP43:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP44:%.*]] = call <2 x double> @llvm.roundeven.v2f64(<2 x double> [[TMP43]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP44]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP45:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP46:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[TMP45]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP46]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP47:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP48:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[TMP47]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP48]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP49:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP50:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[TMP49]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP50]], ptr @vd, align 8
+// CHECK-NEXT:    [[TMP51:%.*]] = load volatile <2 x double>, ptr @vd, align 8
+// CHECK-NEXT:    [[TMP52:%.*]] = call <2 x double> @llvm.s390.vfidb(<2 x double> [[TMP51]], i32 4, i32 3) #[[ATTR3]]
+// CHECK-NEXT:    store volatile <2 x double> [[TMP52]], ptr @vd, align 8
+// CHECK-NEXT:    ret void
+//
 void test_float(void) {
   vsl = __builtin_s390_vfcedbs(vd, vd, &cc);
-  // CHECK: call { <2 x i64>, i32 } @llvm.s390.vfcedbs(<2 x double> %{{.*}}, <2 x double> %{{.*}})
   vsl = __builtin_s390_vfchdbs(vd, vd, &cc);
-  // CHECK: call { <2 x i64>, i32 } @llvm.s390.vfchdbs(<2 x double> %{{.*}}, <2 x double> %{{.*}})
   vsl = __builtin_s390_vfchedbs(vd, vd, &cc);
-  // CHECK: call { <2 x i64>, i32 } @llvm.s390.vfchedbs(<2 x double> %{{.*}}, <2 x double> %{{.*}})
 
   vsl = __builtin_s390_vftcidb(vd, 0, &cc);
-  // CHECK: call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> %{{.*}}, i32 0)
   vsl = __builtin_s390_vftcidb(vd, 4095, &cc);
-  // CHECK: call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> %{{.*}}, i32 4095)
 
   vd = __builtin_s390_vfsqdb(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sqrt.v2f64(<2 x double> %{{.*}})
 
   vd = __builtin_s390_vfmadb(vd, vd, vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
   vd = __builtin_s390_vfmsdb(vd, vd, vd);
-  // CHECK: [[NEG:%[^ ]+]] = fneg <2 x double> {{.*}}
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]], {{.*}})
 
   vd = __builtin_s390_vflpdb(vd);
-  // CHECK: call <2 x double> @llvm.fabs.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vflndb(vd);
-  // CHECK: [[ABS:%[^ ]+]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> %{{.*}})
-  // CHECK: fneg <2 x double> [[ABS]]
 
   vd = __builtin_s390_vfidb(vd, 0, 0);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 0);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 1);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 4);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.roundeven.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 5);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 6);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 7);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 3);
-  // CHECK: call <2 x double> @llvm.s390.vfidb(<2 x double> %{{.*}}, i32 4, i32 3)
 }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
index 12c675041af76..da826773dd9db 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
@@ -10,41 +10,41 @@ volatile vec_float vf;
 
 void test_float(void) {
   vd = __builtin_s390_vfmaxdb(vd, vd, 4);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.maxnum.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}})
+  // CHECK: call <2 x double> @llvm.maxnum.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
 
   vd = __builtin_s390_vfmindb(vd, vd, 4);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.minnum.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}})
+  // CHECK: call <2 x double> @llvm.minnum.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vd = __builtin_s390_vfmindb(vd, vd, 0);
 
   vd = __builtin_s390_vfnmadb(vd, vd, vd);
-  // CHECK: [[RES:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <2 x double> [[RES]]
 
   vd = __builtin_s390_vfnmsdb(vd, vd, vd);
   // CHECK: [[NEG:%[^ ]+]] = fneg <2 x double> {{.*}}
-  // CHECK:  [[RES:%[^ ]+]] = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]], metadata !{{.*}})
+  // CHECK:  [[RES:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <2 x double> [[RES]]
 
   vf = __builtin_s390_vfmaxsb(vf, vf, 4);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.maxnum.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.maxnum.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
 
   vf = __builtin_s390_vfminsb(vf, vf, 4);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.minnum.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.minnum.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
 
   vf = __builtin_s390_vfsqsb(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.sqrt.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
 
   vf = __builtin_s390_vfmasb(vf, vf, vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfmssb(vf, vf, vf);
   // CHECK: [[NEG:%[^ ]+]] = fneg <4 x float> %{{.*}}
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]], metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfnmasb(vf, vf, vf);
-  // CHECK: [[RES:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <4 x float> [[RES]]
   vf = __builtin_s390_vfnmssb(vf, vf, vf);
   // CHECK: [[NEG:%[^ ]+]] = fneg <4 x float> %{{.*}}
-  // CHECK: [[RES:%[^ ]+]] = call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]], metadata !{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <4 x float> [[RES]]
 
   vf = __builtin_s390_vflpsb(vf);
@@ -54,19 +54,19 @@ void test_float(void) {
   // CHECK: fneg <4 x float> [[ABS]]
 
   vf = __builtin_s390_vfisb(vf, 0, 0);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.rint.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.rint.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 0);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.nearbyint.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.nearbyint.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 1);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.round.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.round.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 4);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.roundeven.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.roundeven.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 5);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 6);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.ceil.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 7);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.floor.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   vf = __builtin_s390_vfisb(vf, 4, 3);
   // CHECK: call <4 x float> @llvm.s390.vfisb(<4 x float> %{{.*}}, i32 4, i32 3)
 }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
index 4993df20df143..975f0b340bff5 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
@@ -102,31 +102,27 @@ void test_compare(void) {
   // CHECK-ASM-LABEL: test_compare
 
   vbl = vec_cmpeq(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq", metadata !{{.*}})
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfcedb
 
   vbl = vec_cmpge(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oge", metadata !{{.*}})
-  // CHECK-ASM: kdbr
-  // CHECK-ASM: kdbr
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchedb
   // CHECK-ASM: vst
 
   vbl = vec_cmpgt(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ogt", metadata !{{.*}})
-  // CHECK-ASM: kdbr
-  // CHECK-ASM: kdbr
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchdb
   // CHECK-ASM: vst
 
   vbl = vec_cmple(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole", metadata !{{.*}})
-  // CHECK-ASM: kdbr
-  // CHECK-ASM: kdbr
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchedb
   // CHECK-ASM: vst
 
   vbl = vec_cmplt(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt", metadata !{{.*}})
-  // CHECK-ASM: kdbr
-  // CHECK-ASM: kdbr
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchdb
   // CHECK-ASM: vst
 
   idx = vec_all_lt(vd, vd);
@@ -206,115 +202,115 @@ void test_float(void) {
   // CHECK-ASM: vflpdb
 
   vd = vec_nabs(vd);
-  // CHECK: [[ABS:%[^ ]+]] = tail call <2 x double> @llvm.fabs.v2f64(<2 x double> %{{.*}})
+  // CHECK: [[ABS:%[^ ]+]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-NEXT: fneg <2 x double> [[ABS]]
   // CHECK-ASM: vflndb
 
   vd = vec_madd(vd, vd, vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadb
   vd = vec_msub(vd, vd, vd);
   // CHECK: [[NEG:%[^ ]+]] = fneg <2 x double> %{{.*}}
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]], metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsdb
   vd = vec_sqrt(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sqrt.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.sqrt.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfsqdb
 
   vd = vec_ld2f(cptrf);
   // CHECK: [[VAL:%[^ ]+]] = load <2 x float>, ptr %{{.*}}
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float> [[VAL]], metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.fpext.v2f64.v2f32(<2 x float> [[VAL]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vec_st2f(vd, ptrf);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x float> @llvm.fptrunc.v2f32.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: store <2 x float> [[VAL]], ptr %{{.*}}
   // (emulated)
 
   vd = vec_ctd(vsl, 0);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vd = vec_ctd(vul, 0);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vd = vec_ctd(vsl, 1);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 5.000000e-01), metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x double> @llvm.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 5.000000e-01)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vd = vec_ctd(vul, 1);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 5.000000e-01), metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x double> @llvm.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 5.000000e-01)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vd = vec_ctd(vsl, 31);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 0x3E00000000000000), metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x double> @llvm.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 0x3E00000000000000)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vd = vec_ctd(vul, 31);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 0x3E00000000000000), metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x double> @llvm.fmul.v2f64(<2 x double> [[VAL]], <2 x double> splat (double 0x3E00000000000000)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
 
   vsl = vec_ctsl(vd, 0);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptosi.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vul = vec_ctul(vd, 0);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptoui.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vsl = vec_ctsl(vd, 1);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> {{.*}}, <2 x double> splat (double 2.000000e+00), metadata !{{.*}})
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double> [[VAL]], metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> {{.*}}, <2 x double> splat (double 2.000000e+00)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x i64> @llvm.fptosi.v2i64.v2f64(<2 x double> [[VAL]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vul = vec_ctul(vd, 1);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> %{{.*}}, <2 x double> splat (double 2.000000e+00), metadata !{{.*}})
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double> [[VAL]], metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> %{{.*}}, <2 x double> splat (double 2.000000e+00)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x i64> @llvm.fptoui.v2i64.v2f64(<2 x double> [[VAL]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vsl = vec_ctsl(vd, 31);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> %{{.*}}, <2 x double> splat (double 0x41E0000000000000), metadata !{{.*}})
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double> [[VAL]], metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> %{{.*}}, <2 x double> splat (double 0x41E0000000000000)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x i64> @llvm.fptosi.v2i64.v2f64(<2 x double> [[VAL]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
   vul = vec_ctul(vd, 31);
-  // CHECK: [[VAL:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double> %{{.*}}, <2 x double> splat (double 0x41E0000000000000), metadata !{{.*}})
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double> [[VAL]], metadata !{{.*}})
+  // CHECK: [[VAL:%[^ ]+]] = call <2 x double> @llvm.fmul.v2f64(<2 x double> %{{.*}}, <2 x double> splat (double 0x41E0000000000000)) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK: call <2 x i64> @llvm.fptoui.v2i64.v2f64(<2 x double> [[VAL]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // (emulated)
 
   vd = vec_double(vsl);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcdgb
   vd = vec_double(vul);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcdlgb
 
   vsl = vec_signed(vd);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptosi.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcgdb
   vul = vec_unsigned(vd);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptoui.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vclgdb
 
   vd = vec_roundp(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.ceil.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 6
   vd = vec_ceil(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.ceil.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 6
   vd = vec_roundm(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.floor.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
   vd = vec_floor(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.floor.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
   vd = vec_roundz(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_trunc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_roundc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.nearbyint.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 0
   vd = vec_rint(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.rint.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 0, 0
   vd = vec_round(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.roundeven.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.roundeven.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 4
 }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
index 25b3e0b68cd02..f11575fe8e8e1 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
@@ -168,39 +168,39 @@ void test_compare(void) {
   // CHECK-ASM-LABEL: test_compare
 
   vbi = vec_cmpeq(vf, vf);
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oeq", metadata !{{.*}})
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfcesb
   vbl = vec_cmpeq(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oeq", metadata !{{.*}})
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfcedb
 
   vbi = vec_cmpge(vf, vf);
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"oge", metadata !{{.*}})
-  // CHECK-ASM: vfkhesb
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchesb
   vbl = vec_cmpge(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"oge", metadata !{{.*}})
-  // CHECK-ASM: vfkhedb
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchedb
 
   vbi = vec_cmpgt(vf, vf);
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ogt", metadata !{{.*}})
-  // CHECK-ASM: vfkhsb
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchsb
   vbl = vec_cmpgt(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ogt", metadata !{{.*}})
-  // CHECK-ASM: vfkhdb
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchdb
 
   vbi = vec_cmple(vf, vf);
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"ole", metadata !{{.*}})
-  // CHECK-ASM: vfkhesb
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchesb
   vbl = vec_cmple(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"ole", metadata !{{.*}})
-  // CHECK-ASM: vfkhedb
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchedb
 
   vbi = vec_cmplt(vf, vf);
-  // CHECK: call <4 x i1> @llvm.experimental.constrained.fcmps.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !"olt", metadata !{{.*}})
-  // CHECK-ASM: vfkhsb
+  // CHECK: call <4 x i1> @llvm.fcmp.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchsb
   vbl = vec_cmplt(vd, vd);
-  // CHECK: call <2 x i1> @llvm.experimental.constrained.fcmps.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !"olt", metadata !{{.*}})
-  // CHECK-ASM: vfkhdb
+  // CHECK: call <2 x i1> @llvm.fcmp.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
+  // CHECK-ASM: vfchdb
 
   idx = vec_all_eq(vf, vf);
   // CHECK: call { <4 x i32>, i32 } @llvm.s390.vfcesbs(<4 x float> %{{.*}}, <4 x float> %{{.*}})
@@ -382,11 +382,11 @@ void test_float(void) {
   // CHECK-ASM: vflpdb
 
   vf = vec_nabs(vf);
-  // CHECK: [[ABS:%[^ ]+]] = tail call <4 x float> @llvm.fabs.v4f32(<4 x float> %{{.*}})
+  // CHECK: [[ABS:%[^ ]+]] = call <4 x float> @llvm.fabs.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-NEXT: fneg <4 x float> [[ABS]]
   // CHECK-ASM: vflnsb
   vd = vec_nabs(vd);
-  // CHECK: [[ABS:%[^ ]+]] = tail call <2 x double> @llvm.fabs.v2f64(<2 x double> %{{.*}})
+  // CHECK: [[ABS:%[^ ]+]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-NEXT: fneg <2 x double> [[ABS]]
   // CHECK-ASM: vflndb
 
@@ -405,127 +405,127 @@ void test_float(void) {
   // CHECK-ASM: vfmindb
 
   vf = vec_madd(vf, vf, vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmasb
   vd = vec_madd(vd, vd, vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadb
 
   vf = vec_msub(vf, vf, vf);
   // CHECK: [[NEG:%[^ ]+]] = fneg <4 x float> %{{.*}}
-  // CHECK: call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]], metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmssb
   vd = vec_msub(vd, vd, vd);
   // CHECK: [[NEG:%[^ ]+]] = fneg <2 x double> %{{.*}}
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]], metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsdb
 
   vf = vec_nmadd(vf, vf, vf);
-  // CHECK: [[RES:%[^ ]+]] = tail call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <4 x float> [[RES]]
   // CHECK-ASM: vfnmasb
   vd = vec_nmadd(vd, vd, vd);
-  // CHECK: [[RES:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <2 x double> [[RES]]
   // CHECK-ASM: vfnmadb
 
   vf = vec_nmsub(vf, vf, vf);
   // CHECK: [[NEG:%[^ ]+]] = fneg <4 x float> %{{.*}}
-  // CHECK: [[RES:%[^ ]+]] = tail call <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]], metadata !{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <4 x float> [[RES]]
   // CHECK-ASM: vfnmssb
   vd = vec_nmsub(vd, vd, vd);
   // CHECK: [[NEG:%[^ ]+]] = fneg <2 x double> %{{.*}}
-  // CHECK: [[RES:%[^ ]+]] = tail call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]], metadata !{{.*}})
+  // CHECK: [[RES:%[^ ]+]] = call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> [[NEG]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK: fneg <2 x double> [[RES]]
   // CHECK-ASM: vfnmsdb
 
   vf = vec_sqrt(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.sqrt.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfsqsb
   vd = vec_sqrt(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sqrt.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.sqrt.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfsqdb
 
   vd = vec_doublee(vf);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.fpext.v2f64.v2f32(<2 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vldeb
   vf = vec_floate(vd);
-  // CHECK: call <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x float> @llvm.fptrunc.v2f32.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vledb
 
   vd = vec_double(vsl);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcdgb
   vd = vec_double(vul);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcdlgb
 
   vsl = vec_signed(vd);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptosi.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcgdb
   vul = vec_unsigned(vd);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptoui.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vclgdb
 
   vf = vec_roundp(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.ceil.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 6
   vf = vec_ceil(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.ceil.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 6
   vd = vec_roundp(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.ceil.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 6
   vd = vec_ceil(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.ceil.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 6
 
   vf = vec_roundm(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.floor.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 7
   vf = vec_floor(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.floor.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 7
   vd = vec_roundm(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.floor.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
   vd = vec_floor(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.floor.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
 
   vf = vec_roundz(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
   vf = vec_trunc(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_roundz(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_trunc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
 
   vf = vec_roundc(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.nearbyint.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.nearbyint.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 0
   vd = vec_roundc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.nearbyint.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 0
 
   vf = vec_rint(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.rint.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.rint.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 0, 0
   vd = vec_rint(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.rint.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 0, 0
 
   vf = vec_round(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.roundeven.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.roundeven.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 4
   vd = vec_round(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.roundeven.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.roundeven.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 4
 
   vbi = vec_fp_test_data_class(vf, 0, &cc);
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector3-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector3-constrained.c
index 17af7b8a7fccf..347818818d362 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector3-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector3-constrained.c
@@ -80,30 +80,30 @@ void test_float(void) {
   // CHECK-ASM-LABEL: test_float
 
   vd = vec_double(vsl);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.sitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcdgb
   vd = vec_double(vul);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.uitofp.v2f64.v2i64(<2 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcdlgb
   vf = vec_float(vsi);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.sitofp.v4f32.v4i32(<4 x i32> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.sitofp.v4f32.v4i32(<4 x i32> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcefb
   vf = vec_float(vui);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.uitofp.v4f32.v4i32(<4 x i32> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.uitofp.v4f32.v4i32(<4 x i32> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcelfb
 
   vsl = vec_signed(vd);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptosi.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcgdb
   vsi = vec_signed(vf);
-  // CHECK: call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x i32> @llvm.fptosi.v4i32.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcfeb
   vul = vec_unsigned(vd);
-  // CHECK: call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x i64> @llvm.fptoui.v2i64.v2f64(<2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vclgdb
   vui = vec_unsigned(vf);
   // xHECK: fptoui <4 x float> %{{.*}} to <4 x i32>
-  // CHECK: call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x i32> @llvm.fptoui.v4i32.v4f32(<4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vclfeb
 }
 
diff --git a/clang/test/CodeGen/X86/avx-builtins-constrained.c b/clang/test/CodeGen/X86/avx-builtins-constrained.c
index 357b6e1c66339..218f88b9d7628 100644
--- a/clang/test/CodeGen/X86/avx-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/avx-builtins-constrained.c
@@ -21,7 +21,7 @@
 __m256 test_mm256_sqrt_ps(__m256 x) {
   // COMMON-LABEL: test_mm256_sqrt_ps
   // UNCONSTRAINED: call {{.*}}<8 x float> @llvm.sqrt.v8f32(<8 x float> {{.*}})
-  // CONSTRAINED: call {{.*}}<8 x float> @llvm.experimental.constrained.sqrt.v8f32(<8 x float> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<8 x float> @llvm.sqrt.v8f32(<8 x float> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtps %ymm{{.*}}, 
   return _mm256_sqrt_ps(x);
 }
@@ -29,7 +29,7 @@ __m256 test_mm256_sqrt_ps(__m256 x) {
 __m256d test_mm256_sqrt_pd(__m256d x) {
   // COMMON-LABEL: test_mm256_sqrt_pd
   // UNCONSTRAINED: call {{.*}}<4 x double> @llvm.sqrt.v4f64(<4 x double> {{.*}})
-  // CONSTRAINED: call {{.*}}<4 x double> @llvm.experimental.constrained.sqrt.v4f64(<4 x double> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x double> @llvm.sqrt.v4f64(<4 x double> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtpd %ymm{{.*}}, 
   return _mm256_sqrt_pd(x);
 }
@@ -48,7 +48,7 @@ __m256d test_mm256_round_pd_fround_no_exc(__m256d x) {
 
 __m256d test_mm256_round_pd_trunc(__m256d x) {
   // CONSTRAINED-LABEL: test_mm256_round_pd_trunc
-  // CONSTRAINED: %{{.*}} = call <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double> %{{.*}}, metadata !"fpexcept.ignore")
+  // CONSTRAINED: %{{.*}} = call <4 x double> @llvm.trunc.v4f64(<4 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   return _mm256_round_pd(x, 0b1011);
 }
 
@@ -66,6 +66,6 @@ __m256 test_mm256_round_ps_fround_no_exc(__m256 x) {
 
 __m256 test_mm256_round_ps_trunc(__m256 x) {
   // CONSTRAINED-LABEL: test_mm256_round_ps_trunc
-  // CONSTRAINED: %{{.*}} = call <8 x float> @llvm.experimental.constrained.trunc.v8f32(<8 x float> %{{.*}}, metadata !"fpexcept.ignore")
+  // CONSTRAINED: %{{.*}} = call <8 x float> @llvm.trunc.v8f32(<8 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   return _mm256_round_ps(x, 0b1011);
 }
diff --git a/clang/test/CodeGen/X86/avx512dq-builtins-constrained.c b/clang/test/CodeGen/X86/avx512dq-builtins-constrained.c
index fc7c3361c9b76..aea8798299afc 100644
--- a/clang/test/CodeGen/X86/avx512dq-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/avx512dq-builtins-constrained.c
@@ -19,7 +19,7 @@
 __m512d test_mm512_cvtepi64_pd(__m512i __A) {
   // COMMON-LABEL: test_mm512_cvtepi64_pd
   // UNCONSTRAINED: sitofp <8 x i64> %{{.*}} to <8 x double>
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.sitofp.v8f64.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x double> @llvm.sitofp.v8f64.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcvtqq2pd
   return _mm512_cvtepi64_pd(__A);
 }
@@ -27,7 +27,7 @@ __m512d test_mm512_cvtepi64_pd(__m512i __A) {
 __m512d test_mm512_mask_cvtepi64_pd(__m512d __W, __mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_mask_cvtepi64_pd
   // UNCONSTRAINED: sitofp <8 x i64> %{{.*}} to <8 x double>
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.sitofp.v8f64.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x double> @llvm.sitofp.v8f64.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x double> %{{.*}}, <8 x double> %{{.*}}
   // CHECK-ASM: vcvtqq2pd
   return _mm512_mask_cvtepi64_pd(__W, __U, __A);
@@ -36,7 +36,7 @@ __m512d test_mm512_mask_cvtepi64_pd(__m512d __W, __mmask8 __U, __m512i __A) {
 __m512d test_mm512_maskz_cvtepi64_pd(__mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_maskz_cvtepi64_pd
   // UNCONSTRAINED: sitofp <8 x i64> %{{.*}} to <8 x double>
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.sitofp.v8f64.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x double> @llvm.sitofp.v8f64.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x double> %{{.*}}, <8 x double> %{{.*}}
   // CHECK-ASM: vcvtqq2pd
   return _mm512_maskz_cvtepi64_pd(__U, __A);
@@ -68,7 +68,7 @@ __m512d test_mm512_maskz_cvt_roundepi64_pd(__mmask8 __U, __m512i __A) {
 __m256 test_mm512_cvtepi64_ps(__m512i __A) {
   // COMMON-LABEL: test_mm512_cvtepi64_ps
   // UNCONSTRAINED: sitofp <8 x i64> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.sitofp.v8f32.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x float> @llvm.sitofp.v8f32.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcvtqq2ps
   return _mm512_cvtepi64_ps(__A);
 }
@@ -76,7 +76,7 @@ __m256 test_mm512_cvtepi64_ps(__m512i __A) {
 __m256 test_mm512_mask_cvtepi64_ps(__m256 __W, __mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_mask_cvtepi64_ps
   // UNCONSTRAINED: sitofp <8 x i64> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.sitofp.v8f32.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x float> @llvm.sitofp.v8f32.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}
   // CHECK-ASM: vcvtqq2ps
   return _mm512_mask_cvtepi64_ps(__W, __U, __A);
@@ -85,7 +85,7 @@ __m256 test_mm512_mask_cvtepi64_ps(__m256 __W, __mmask8 __U, __m512i __A) {
 __m256 test_mm512_maskz_cvtepi64_ps(__mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_maskz_cvtepi64_ps
   // UNCONSTRAINED: sitofp <8 x i64> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.sitofp.v8f32.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x float> @llvm.sitofp.v8f32.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}
   // CHECK-ASM: vcvtqq2ps
   return _mm512_maskz_cvtepi64_ps(__U, __A);
@@ -117,7 +117,7 @@ __m256 test_mm512_maskz_cvt_roundepi64_ps(__mmask8 __U, __m512i __A) {
 __m512d test_mm512_cvtepu64_pd(__m512i __A) {
   // COMMON-LABEL: test_mm512_cvtepu64_pd
   // UNCONSTRAINED: uitofp <8 x i64> %{{.*}} to <8 x double>
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.uitofp.v8f64.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x double> @llvm.uitofp.v8f64.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcvtuqq2pd
   return _mm512_cvtepu64_pd(__A);
 }
@@ -125,7 +125,7 @@ __m512d test_mm512_cvtepu64_pd(__m512i __A) {
 __m512d test_mm512_mask_cvtepu64_pd(__m512d __W, __mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_mask_cvtepu64_pd
   // UNCONSTRAINED: uitofp <8 x i64> %{{.*}} to <8 x double>
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.uitofp.v8f64.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x double> @llvm.uitofp.v8f64.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x double> %{{.*}}, <8 x double> %{{.*}}
   // CHECK-ASM: vcvtuqq2pd
   return _mm512_mask_cvtepu64_pd(__W, __U, __A);
@@ -134,7 +134,7 @@ __m512d test_mm512_mask_cvtepu64_pd(__m512d __W, __mmask8 __U, __m512i __A) {
 __m512d test_mm512_maskz_cvtepu64_pd(__mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_maskz_cvtepu64_pd
   // UNCONSTRAINED: uitofp <8 x i64> %{{.*}} to <8 x double>
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.uitofp.v8f64.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x double> @llvm.uitofp.v8f64.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x double> %{{.*}}, <8 x double> %{{.*}}
   // CHECK-ASM: vcvtuqq2pd
   return _mm512_maskz_cvtepu64_pd(__U, __A);
@@ -166,7 +166,7 @@ __m512d test_mm512_maskz_cvt_roundepu64_pd(__mmask8 __U, __m512i __A) {
 __m256 test_mm512_cvtepu64_ps(__m512i __A) {
   // COMMON-LABEL: test_mm512_cvtepu64_ps
   // UNCONSTRAINED: uitofp <8 x i64> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.uitofp.v8f32.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x float> @llvm.uitofp.v8f32.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vcvtuqq2ps
   return _mm512_cvtepu64_ps(__A);
 }
@@ -174,7 +174,7 @@ __m256 test_mm512_cvtepu64_ps(__m512i __A) {
 __m256 test_mm512_mask_cvtepu64_ps(__m256 __W, __mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_mask_cvtepu64_ps
   // UNCONSTRAINED: uitofp <8 x i64> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.uitofp.v8f32.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x float> @llvm.uitofp.v8f32.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}
   // CHECK-ASM: vcvtuqq2ps
   return _mm512_mask_cvtepu64_ps(__W, __U, __A);
@@ -183,7 +183,7 @@ __m256 test_mm512_mask_cvtepu64_ps(__m256 __W, __mmask8 __U, __m512i __A) {
 __m256 test_mm512_maskz_cvtepu64_ps(__mmask8 __U, __m512i __A) {
   // COMMON-LABEL: test_mm512_maskz_cvtepu64_ps
   // UNCONSTRAINED: uitofp <8 x i64> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.uitofp.v8f32.v8i64(<8 x i64> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+  // CONSTRAINED: call <8 x float> @llvm.uitofp.v8f32.v8i64(<8 x i64> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}
   // CHECK-ASM: vcvtuqq2ps
   return _mm512_maskz_cvtepu64_ps(__U, __A);
diff --git a/clang/test/CodeGen/X86/avx512f-builtins-constrained.c b/clang/test/CodeGen/X86/avx512f-builtins-constrained.c
index 4044021a3f9e0..9c430b68eebe2 100644
--- a/clang/test/CodeGen/X86/avx512f-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/avx512f-builtins-constrained.c
@@ -19,7 +19,7 @@ __m512d test_mm512_sqrt_pd(__m512d a)
 {
   // COMMON-LABEL: test_mm512_sqrt_pd
   // UNCONSTRAINED: call <8 x double> @llvm.sqrt.v8f64(<8 x double> %{{.*}})
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.sqrt.v8f64(<8 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call <8 x double> @llvm.sqrt.v8f64(<8 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtpd
   return _mm512_sqrt_pd(a);
 }
@@ -28,7 +28,7 @@ __m512d test_mm512_mask_sqrt_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
   // COMMON-LABEL: test_mm512_mask_sqrt_pd
   // UNCONSTRAINED: call <8 x double> @llvm.sqrt.v8f64(<8 x double> %{{.*}})
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.sqrt.v8f64(<8 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call <8 x double> @llvm.sqrt.v8f64(<8 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtpd
   // COMMONIR: bitcast i8 %{{.*}} to <8 x i1>
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x double> %{{.*}}, <8 x double> %{{.*}}
@@ -39,7 +39,7 @@ __m512d test_mm512_maskz_sqrt_pd (__mmask8 __U, __m512d __A)
 {
   // COMMON-LABEL: test_mm512_maskz_sqrt_pd
   // UNCONSTRAINED: call <8 x double> @llvm.sqrt.v8f64(<8 x double> %{{.*}})
-  // CONSTRAINED: call <8 x double> @llvm.experimental.constrained.sqrt.v8f64(<8 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call <8 x double> @llvm.sqrt.v8f64(<8 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtpd
   // COMMONIR: bitcast i8 %{{.*}} to <8 x i1>
   // COMMONIR: select <8 x i1> %{{.*}}, <8 x double> %{{.*}}, <8 x double> {{.*}}
@@ -50,7 +50,7 @@ __m512 test_mm512_sqrt_ps(__m512 a)
 {
   // COMMON-LABEL: test_mm512_sqrt_ps
   // UNCONSTRAINED: call <16 x float> @llvm.sqrt.v16f32(<16 x float> %{{.*}})
-  // CONSTRAINED: call <16 x float> @llvm.experimental.constrained.sqrt.v16f32(<16 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call <16 x float> @llvm.sqrt.v16f32(<16 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtps
   return _mm512_sqrt_ps(a);
 }
@@ -59,7 +59,7 @@ __m512 test_mm512_mask_sqrt_ps(__m512 __W, __mmask16 __U, __m512 __A)
 {
   // COMMON-LABEL: test_mm512_mask_sqrt_ps
   // UNCONSTRAINED: call <16 x float> @llvm.sqrt.v16f32(<16 x float> %{{.*}})
-  // CONSTRAINED: call <16 x float> @llvm.experimental.constrained.sqrt.v16f32(<16 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call <16 x float> @llvm.sqrt.v16f32(<16 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtps
   // COMMONIR: bitcast i16 %{{.*}} to <16 x i1>
   // COMMONIR: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
@@ -70,7 +70,7 @@ __m512 test_mm512_maskz_sqrt_ps( __mmask16 __U, __m512 __A)
 {
   // COMMON-LABEL: test_mm512_maskz_sqrt_ps
   // UNCONSTRAINED: call <16 x float> @llvm.sqrt.v16f32(<16 x float> %{{.*}})
-  // CONSTRAINED: call <16 x float> @llvm.experimental.constrained.sqrt.v16f32(<16 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call <16 x float> @llvm.sqrt.v16f32(<16 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtps
   // COMMONIR: bitcast i16 %{{.*}} to <16 x i1>
   // COMMONIR: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> {{.*}}
@@ -123,7 +123,7 @@ __m128d test_mm_mask_sqrt_sd(__m128d __W, __mmask8 __U, __m128d __A, __m128d __B
   // COMMON-LABEL: test_mm_mask_sqrt_sd
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // UNCONSTRAINED-NEXT: call double @llvm.sqrt.f64(double %{{.*}})
-  // CONSTRAINED-NEXT: call double @llvm.experimental.constrained.sqrt.f64(double %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED-NEXT: call double @llvm.sqrt.f64(double %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtsd
   // COMMONIR-NEXT: extractelement <2 x double> %{{.*}}, i64 0
   // COMMONIR-NEXT: bitcast i8 %{{.*}} to <8 x i1>
@@ -137,7 +137,7 @@ __m128d test_mm_maskz_sqrt_sd(__mmask8 __U, __m128d __A, __m128d __B){
   // COMMON-LABEL: test_mm_maskz_sqrt_sd
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // UNCONSTRAINED-NEXT: call double @llvm.sqrt.f64(double %{{.*}})
-  // CONSTRAINED-NEXT: call double @llvm.experimental.constrained.sqrt.f64(double %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED-NEXT: call double @llvm.sqrt.f64(double %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtsd
   // COMMONIR-NEXT: extractelement <2 x double> %{{.*}}, i64 0
   // COMMONIR-NEXT: bitcast i8 %{{.*}} to <8 x i1>
@@ -151,7 +151,7 @@ __m128 test_mm_mask_sqrt_ss(__m128 __W, __mmask8 __U, __m128 __A, __m128 __B){
   // COMMON-LABEL: test_mm_mask_sqrt_ss
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // UNCONSTRAINED-NEXT: call float @llvm.sqrt.f32(float %{{.*}})
-  // CONSTRAINED-NEXT: call float @llvm.experimental.constrained.sqrt.f32(float %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED-NEXT: call float @llvm.sqrt.f32(float %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtss
   // COMMONIR-NEXT: extractelement <4 x float> %{{.*}}, i64 0
   // COMMONIR-NEXT: bitcast i8 %{{.*}} to <8 x i1>
@@ -165,7 +165,7 @@ __m128 test_mm_maskz_sqrt_ss(__mmask8 __U, __m128 __A, __m128 __B){
   // COMMON-LABEL: test_mm_maskz_sqrt_ss
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // UNCONSTRAINED-NEXT: call float @llvm.sqrt.f32(float %{{.*}})
-  // CONSTRAINED-NEXT: call float @llvm.experimental.constrained.sqrt.f32(float %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED-NEXT: call float @llvm.sqrt.f32(float %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtss
   // COMMONIR-NEXT: extractelement <4 x float> %{{.*}}, i64 0
   // COMMONIR-NEXT: bitcast i8 %{{.*}} to <8 x i1>
@@ -181,7 +181,7 @@ __m512 test_mm512_cvtph_ps (__m256i __A)
   // COMMONIR: bitcast <4 x i64> %{{.*}} to <16 x i16>
   // COMMONIR: bitcast <16 x i16> %{{.*}} to <16 x half>
   // UNCONSTRAINED: fpext <16 x half> %{{.*}} to <16 x float>
-  // CONSTRAINED: call <16 x float> @llvm.experimental.constrained.fpext.v16f32.v16f16(<16 x half> %{{.*}}, metadata !"fpexcept.strict")
+  // CONSTRAINED: call <16 x float> @llvm.fpext.v16f32.v16f16(<16 x half> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   return _mm512_cvtph_ps (__A);
 }
 
@@ -191,7 +191,7 @@ __m512 test_mm512_mask_cvtph_ps (__m512 __W, __mmask16 __U, __m256i __A)
   // COMMONIR: bitcast <4 x i64> %{{.*}} to <16 x i16>
   // COMMONIR: bitcast <16 x i16> %{{.*}} to <16 x half>
   // UNCONSTRAINED: fpext <16 x half> %{{.*}} to <16 x float>
-  // CONSTRAINED: call <16 x float> @llvm.experimental.constrained.fpext.v16f32.v16f16(<16 x half> %{{.*}}, metadata !"fpexcept.strict")
+  // CONSTRAINED: call <16 x float> @llvm.fpext.v16f32.v16f16(<16 x half> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
   return _mm512_mask_cvtph_ps (__W,__U,__A);
 }
@@ -202,7 +202,7 @@ __m512 test_mm512_maskz_cvtph_ps (__mmask16 __U, __m256i __A)
   // COMMONIR: bitcast <4 x i64> %{{.*}} to <16 x i16>
   // COMMONIR: bitcast <16 x i16> %{{.*}} to <16 x half>
   // UNCONSTRAINED: fpext <16 x half> %{{.*}} to <16 x float>
-  // CONSTRAINED: call <16 x float> @llvm.experimental.constrained.fpext.v16f32.v16f16(<16 x half> %{{.*}}, metadata !"fpexcept.strict")
+  // CONSTRAINED: call <16 x float> @llvm.fpext.v16f32.v16f16(<16 x half> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // COMMONIR: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
   return _mm512_maskz_cvtph_ps (__U,__A);
 }
diff --git a/clang/test/CodeGen/X86/avx512fp16-builtins-constrained.c b/clang/test/CodeGen/X86/avx512fp16-builtins-constrained.c
index 95403aecae9bd..e7ff3957fdc9e 100644
--- a/clang/test/CodeGen/X86/avx512fp16-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/avx512fp16-builtins-constrained.c
@@ -21,7 +21,7 @@
 __m128h test_mm_sqrt_sh(__m128h x, __m128h y) {
   // COMMON-LABEL: test_mm_sqrt_sh
   // UNCONSTRAINED: call {{.*}}half @llvm.sqrt.f16(half {{.*}})
-  // CONSTRAINED: call {{.*}}half @llvm.experimental.constrained.sqrt.f16(half {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}half @llvm.sqrt.f16(half {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtsh %xmm{{.*}},
   return _mm_sqrt_sh(x, y);
 }
@@ -30,7 +30,7 @@ __m128h test_mm_mask_sqrt_sh(__m128h __W, __mmask8 __U, __m128h __A, __m128h __B
   // COMMON-LABEL: test_mm_mask_sqrt_sh
   // COMMONIR: extractelement <8 x half> %{{.*}}, i64 0
   // UNCONSTRAINED: call {{.*}}half @llvm.sqrt.f16(half %{{.*}})
-  // CONSTRAINED: call {{.*}}half @llvm.experimental.constrained.sqrt.f16(half %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}half @llvm.sqrt.f16(half %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtsh %xmm{{.*}},
   // COMMONIR-NEXT: extractelement <8 x half> %{{.*}}, i64 0
   // COMMONIR-NEXT: bitcast i8 %{{.*}} to <8 x i1>
@@ -44,7 +44,7 @@ __m128h test_mm_maskz_sqrt_sh(__mmask8 __U, __m128h __A, __m128h __B){
   // COMMON-LABEL: test_mm_maskz_sqrt_sh
   // COMMONIR: extractelement <2 x half> %{{.*}}, i64 0
   // UNCONSTRAINED: call {{.*}}half @llvm.sqrt.f16(half %{{.*}})
-  // CONSTRAINED: call {{.*}}half @llvm.experimental.constrained.sqrt.f16(half %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}half @llvm.sqrt.f16(half %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtsh %xmm{{.*}},
   // COMMONIR-NEXT: extractelement <2 x half> %{{.*}}, i64 0
   // COMMONIR-NEXT: bitcast i8 %{{.*}} to <8 x i1>
@@ -57,7 +57,7 @@ __m128h test_mm_maskz_sqrt_sh(__mmask8 __U, __m128h __A, __m128h __B){
 __m512h test_mm512_sqrt_ph(__m512h x) {
   // COMMON-LABEL: test_mm512_sqrt_ph
   // UNCONSTRAINED: call {{.*}}<32 x half> @llvm.sqrt.v32f16(<32 x half> {{.*}})
-  // CONSTRAINED: call {{.*}}<32 x half> @llvm.experimental.constrained.sqrt.v32f16(<32 x half> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<32 x half> @llvm.sqrt.v32f16(<32 x half> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtph %zmm{{.*}},
   return _mm512_sqrt_ph(x);
 }
@@ -66,7 +66,7 @@ __m512h test_mm512_mask_sqrt_ph (__m512h __W, __mmask32 __U, __m512h __A)
 {
   // COMMON-LABEL: test_mm512_mask_sqrt_ph
   // UNCONSTRAINED: call {{.*}}<32 x half> @llvm.sqrt.v32f16(<32 x half> %{{.*}})
-  // CONSTRAINED: call {{.*}}<32 x half> @llvm.experimental.constrained.sqrt.v32f16(<32 x half> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<32 x half> @llvm.sqrt.v32f16(<32 x half> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtph %zmm{{.*}},
   // COMMONIR: bitcast i32 %{{.*}} to <32 x i1>
   // COMMONIR: select <32 x i1> %{{.*}}, <32 x half> %{{.*}}, <32 x half> %{{.*}}
@@ -77,7 +77,7 @@ __m512h test_mm512_maskz_sqrt_ph (__mmask32 __U, __m512h __A)
 {
   // COMMON-LABEL: test_mm512_maskz_sqrt_ph
   // UNCONSTRAINED: call {{.*}}<32 x half> @llvm.sqrt.v32f16(<32 x half> %{{.*}})
-  // CONSTRAINED: call {{.*}}<32 x half> @llvm.experimental.constrained.sqrt.v32f16(<32 x half> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<32 x half> @llvm.sqrt.v32f16(<32 x half> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtph %zmm{{.*}},
   // COMMONIR: bitcast i32 %{{.*}} to <32 x i1>
   // COMMONIR: select <32 x i1> %{{.*}}, <32 x half> %{{.*}}, <32 x half> {{.*}}
diff --git a/clang/test/CodeGen/X86/avx512vl-builtins-constrained.c b/clang/test/CodeGen/X86/avx512vl-builtins-constrained.c
index 162bec534b3dd..61d7a9c946785 100644
--- a/clang/test/CodeGen/X86/avx512vl-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/avx512vl-builtins-constrained.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // REQUIRES: x86-registered-target
 // RUN: %clang_cc1 -flax-vector-conversions=none -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +avx512f -target-feature +avx512vl -emit-llvm -o - -Wall -Werror | FileCheck --check-prefix=COMMON --check-prefix=COMMONIR --check-prefix=UNCONSTRAINED %s
 // RUN: %clang_cc1 -flax-vector-conversions=none -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +avx512f -target-feature +avx512vl -emit-llvm -o - -Wall -Werror | FileCheck --check-prefix=COMMON --check-prefix=COMMONIR --check-prefix=UNCONSTRAINED %s
@@ -6,92 +7,300 @@
 
 #include <immintrin.h>
 
+// CONSTRAINED-LABEL: define dso_local <4 x float> @test_mm_mask_cvtph_ps(
+// CONSTRAINED-SAME: <4 x float> noundef [[__W:%.*]], i8 noundef zeroext [[__U:%.*]], <2 x i64> noundef [[__A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[__W_ADDR_I:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR_I:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__W_ADDR:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    store <4 x float> [[__W]], ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x i64>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    store <4 x float> [[TMP0]], ptr [[__W_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    store i8 [[TMP1]], ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[TMP2]], ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = load <4 x float>, ptr [[__W_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP6:%.*]] = load i8, ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP4]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast <4 x i16> [[TMP7]] to <4 x half>
+// CONSTRAINED-NEXT:    [[CVTPH2PS_I:%.*]] = call <4 x float> @llvm.fpext.v4f32.v4f16(<4 x half> [[TMP8]]) #[[ATTR4:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = bitcast i8 [[TMP6]] to <8 x i1>
+// CONSTRAINED-NEXT:    [[EXTRACT_I:%.*]] = shufflevector <8 x i1> [[TMP9]], <8 x i1> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// CONSTRAINED-NEXT:    [[TMP10:%.*]] = select <4 x i1> [[EXTRACT_I]], <4 x float> [[CVTPH2PS_I]], <4 x float> [[TMP5]]
+// CONSTRAINED-NEXT:    ret <4 x float> [[TMP10]]
+//
 __m128 test_mm_mask_cvtph_ps(__m128 __W, __mmask8 __U, __m128i __A) {
-  // COMMON-LABEL: @test_mm_mask_cvtph_ps
-  // COMMONIR: bitcast <2 x i64> %{{.*}} to <8 x i16>
-  // COMMONIR: shufflevector <8 x i16> %{{.*}}, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-  // COMMONIR: bitcast <4 x i16> %{{.*}} to <4 x half>
-  // UNCONSTRAINED: fpext <4 x half> %{{.*}} to <4 x float>
-  // CONSTRAINED: call <4 x float> @llvm.experimental.constrained.fpext.v4f32.v4f16(<4 x half> %{{.*}}, metadata !"fpexcept.strict")
-  // COMMONIR: select <4 x i1> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}
   return _mm_mask_cvtph_ps(__W, __U, __A);
 }
 
+// CONSTRAINED-LABEL: define dso_local <4 x float> @test_mm_maskz_cvtph_ps(
+// CONSTRAINED-SAME: i8 noundef zeroext [[__U:%.*]], <2 x i64> noundef [[__A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[DOTCOMPOUNDLITERAL_I:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR_I:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[TMP0]], ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[TMP1]], ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x i64>, ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = bitcast <2 x i64> [[TMP2]] to <8 x i16>
+// CONSTRAINED-NEXT:    store <4 x float> zeroinitializer, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = load <4 x float>, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = load i8, ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    [[TMP6:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <4 x i16> [[TMP6]] to <4 x half>
+// CONSTRAINED-NEXT:    [[CVTPH2PS_I:%.*]] = call <4 x float> @llvm.fpext.v4f32.v4f16(<4 x half> [[TMP7]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast i8 [[TMP5]] to <8 x i1>
+// CONSTRAINED-NEXT:    [[EXTRACT_I:%.*]] = shufflevector <8 x i1> [[TMP8]], <8 x i1> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = select <4 x i1> [[EXTRACT_I]], <4 x float> [[CVTPH2PS_I]], <4 x float> [[TMP4]]
+// CONSTRAINED-NEXT:    ret <4 x float> [[TMP9]]
+//
 __m128 test_mm_maskz_cvtph_ps(__mmask8 __U, __m128i __A) {
-  // COMMON-LABEL: @test_mm_maskz_cvtph_ps
-  // COMMONIR: bitcast <2 x i64> %{{.*}} to <8 x i16>
-  // COMMONIR: shufflevector <8 x i16> %{{.*}}, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-  // COMMONIR: bitcast <4 x i16> %{{.*}} to <4 x half>
-  // UNCONSTRAINED: fpext <4 x half> %{{.*}} to <4 x float>
-  // CONSTRAINED: call <4 x float> @llvm.experimental.constrained.fpext.v4f32.v4f16(<4 x half> %{{.*}}, metadata !"fpexcept.strict")
-  // COMMONIR: select <4 x i1> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}
   return _mm_maskz_cvtph_ps(__U, __A);
 }
 
+// CONSTRAINED-LABEL: define dso_local <8 x float> @test_mm256_mask_cvtph_ps(
+// CONSTRAINED-SAME: <8 x float> noundef [[__W:%.*]], i8 noundef zeroext [[__U:%.*]], <2 x i64> noundef [[__A:%.*]]) #[[ATTR1:[0-9]+]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[__W_ADDR_I:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    [[__U_ADDR_I:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__W_ADDR:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    store <8 x float> [[__W]], ptr [[__W_ADDR]], align 32
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[__W_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x i64>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    store <8 x float> [[TMP0]], ptr [[__W_ADDR_I]], align 32
+// CONSTRAINED-NEXT:    store i8 [[TMP1]], ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[TMP2]], ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load <2 x i64>, ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = bitcast <2 x i64> [[TMP3]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = load <8 x float>, ptr [[__W_ADDR_I]], align 32
+// CONSTRAINED-NEXT:    [[TMP6:%.*]] = load i8, ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast <8 x i16> [[TMP4]] to <8 x half>
+// CONSTRAINED-NEXT:    [[CVTPH2PS_I:%.*]] = call <8 x float> @llvm.fpext.v8f32.v8f16(<8 x half> [[TMP7]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[TMP8:%.*]] = bitcast i8 [[TMP6]] to <8 x i1>
+// CONSTRAINED-NEXT:    [[TMP9:%.*]] = select <8 x i1> [[TMP8]], <8 x float> [[CVTPH2PS_I]], <8 x float> [[TMP5]]
+// CONSTRAINED-NEXT:    ret <8 x float> [[TMP9]]
+//
 __m256 test_mm256_mask_cvtph_ps(__m256 __W, __mmask8 __U, __m128i __A) {
-  // COMMON-LABEL: @test_mm256_mask_cvtph_ps
-  // COMMONIR: bitcast <2 x i64> %{{.*}} to <8 x i16>
-  // COMMONIR: bitcast <8 x i16> %{{.*}} to <8 x half>
-  // UNCONSTRAINED: fpext <8 x half> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.fpext.v8f32.v8f16(<8 x half> %{{.*}}, metadata !"fpexcept.strict") 
-  // COMMONIR: select <8 x i1> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}
   return _mm256_mask_cvtph_ps(__W, __U, __A);
 }
 
+// CONSTRAINED-LABEL: define dso_local <8 x float> @test_mm256_maskz_cvtph_ps(
+// CONSTRAINED-SAME: i8 noundef zeroext [[__U:%.*]], <2 x i64> noundef [[__A:%.*]]) #[[ATTR1]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[DOTCOMPOUNDLITERAL_I:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    [[__U_ADDR_I:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[TMP0]], ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    store <2 x i64> [[TMP1]], ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x i64>, ptr [[__A_ADDR_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = bitcast <2 x i64> [[TMP2]] to <8 x i16>
+// CONSTRAINED-NEXT:    store <8 x float> zeroinitializer, ptr [[DOTCOMPOUNDLITERAL_I]], align 32
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = load <8 x float>, ptr [[DOTCOMPOUNDLITERAL_I]], align 32
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = load i8, ptr [[__U_ADDR_I]], align 1
+// CONSTRAINED-NEXT:    [[TMP6:%.*]] = bitcast <8 x i16> [[TMP3]] to <8 x half>
+// CONSTRAINED-NEXT:    [[CVTPH2PS_I:%.*]] = call <8 x float> @llvm.fpext.v8f32.v8f16(<8 x half> [[TMP6]]) #[[ATTR4]] [ "fp.control"(metadata !"rte") ]
+// CONSTRAINED-NEXT:    [[TMP7:%.*]] = bitcast i8 [[TMP5]] to <8 x i1>
+// CONSTRAINED-NEXT:    [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x float> [[CVTPH2PS_I]], <8 x float> [[TMP4]]
+// CONSTRAINED-NEXT:    ret <8 x float> [[TMP8]]
+//
 __m256 test_mm256_maskz_cvtph_ps(__mmask8 __U, __m128i __A) {
-  // COMMON-LABEL: @test_mm256_maskz_cvtph_ps
-  // COMMONIR: bitcast <2 x i64> %{{.*}} to <8 x i16>
-  // COMMONIR: bitcast <8 x i16> %{{.*}} to <8 x half>
-  // UNCONSTRAINED: fpext <8 x half> %{{.*}} to <8 x float>
-  // CONSTRAINED: call <8 x float> @llvm.experimental.constrained.fpext.v8f32.v8f16(<8 x half> %{{.*}}, metadata !"fpexcept.strict") 
-  // COMMONIR: select <8 x i1> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}
   return _mm256_maskz_cvtph_ps(__U, __A);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm_mask_cvtps_ph(
+// CONSTRAINED-SAME: <2 x i64> noundef [[__W:%.*]], i8 noundef zeroext [[__U:%.*]], <4 x float> noundef [[__A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[__W_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    store <2 x i64> [[__W]], ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <4 x float> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128(<4 x float> [[TMP0]], i32 11, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5:[0-9]+]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm_mask_cvtps_ph(__m128i __W, __mmask8 __U, __m128 __A) {
-  // COMMON-LABEL: @test_mm_mask_cvtps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.128
   return _mm_mask_cvtps_ph(__W, __U, __A, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm_maskz_cvtps_ph(
+// CONSTRAINED-SAME: i8 noundef zeroext [[__U:%.*]], <4 x float> noundef [[__A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[DOTCOMPOUNDLITERAL_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <4 x float> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    store <2 x i64> zeroinitializer, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128(<4 x float> [[TMP0]], i32 11, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm_maskz_cvtps_ph(__mmask8 __U, __m128 __A) {
-  // COMMON-LABEL: @test_mm_maskz_cvtps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.128
   return _mm_maskz_cvtps_ph(__U, __A, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm256_mask_cvtps_ph(
+// CONSTRAINED-SAME: <2 x i64> noundef [[__W:%.*]], i8 noundef zeroext [[__U:%.*]], <8 x float> noundef [[__A:%.*]]) #[[ATTR1]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[__W_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    store <2 x i64> [[__W]], ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <8 x float> [[__A]], ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256(<8 x float> [[TMP0]], i32 11, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm256_mask_cvtps_ph(__m128i __W, __mmask8 __U, __m256 __A) {
-  // COMMON-LABEL: @test_mm256_mask_cvtps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.256
   return _mm256_mask_cvtps_ph(__W, __U, __A, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm256_maskz_cvtps_ph(
+// CONSTRAINED-SAME: i8 noundef zeroext [[__U:%.*]], <8 x float> noundef [[__A:%.*]]) #[[ATTR1]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[DOTCOMPOUNDLITERAL_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <8 x float> [[__A]], ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    store <2 x i64> zeroinitializer, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256(<8 x float> [[TMP0]], i32 11, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm256_maskz_cvtps_ph(__mmask8 __U, __m256 __A) {
-  // COMMON-LABEL: @test_mm256_maskz_cvtps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.256
   return _mm256_maskz_cvtps_ph(__U, __A, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm_mask_cvt_roundps_ph(
+// CONSTRAINED-SAME: <2 x i64> noundef [[__W:%.*]], i8 noundef zeroext [[__U:%.*]], <4 x float> noundef [[__A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[__W_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    store <2 x i64> [[__W]], ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <4 x float> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128(<4 x float> [[TMP0]], i32 3, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm_mask_cvt_roundps_ph(__m128i __W, __mmask8 __U, __m128 __A) {
-  // COMMON-LABEL: @test_mm_mask_cvt_roundps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.128
   return _mm_mask_cvt_roundps_ph(__W, __U, __A, _MM_FROUND_TO_ZERO);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm_maskz_cvt_roundps_ph(
+// CONSTRAINED-SAME: i8 noundef zeroext [[__U:%.*]], <4 x float> noundef [[__A:%.*]]) #[[ATTR0]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[DOTCOMPOUNDLITERAL_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <4 x float>, align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <4 x float> [[__A]], ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[__A_ADDR]], align 16
+// CONSTRAINED-NEXT:    store <2 x i64> zeroinitializer, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128(<4 x float> [[TMP0]], i32 3, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm_maskz_cvt_roundps_ph(__mmask8 __U, __m128 __A) {
-  // COMMON-LABEL: @test_mm_maskz_cvt_roundps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.128
   return _mm_maskz_cvt_roundps_ph(__U, __A, _MM_FROUND_TO_ZERO);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm256_mask_cvt_roundps_ph(
+// CONSTRAINED-SAME: <2 x i64> noundef [[__W:%.*]], i8 noundef zeroext [[__U:%.*]], <8 x float> noundef [[__A:%.*]]) #[[ATTR1]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[__W_ADDR:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    store <2 x i64> [[__W]], ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <8 x float> [[__A]], ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[__W_ADDR]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256(<8 x float> [[TMP0]], i32 3, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm256_mask_cvt_roundps_ph(__m128i __W, __mmask8 __U, __m256 __A) {
-  // COMMON-LABEL: @test_mm256_mask_cvt_roundps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.256
   return _mm256_mask_cvt_roundps_ph(__W, __U, __A, _MM_FROUND_TO_ZERO);
 }
 
+// CONSTRAINED-LABEL: define dso_local <2 x i64> @test_mm256_maskz_cvt_roundps_ph(
+// CONSTRAINED-SAME: i8 noundef zeroext [[__U:%.*]], <8 x float> noundef [[__A:%.*]]) #[[ATTR1]] {
+// CONSTRAINED-NEXT:  [[ENTRY:.*:]]
+// CONSTRAINED-NEXT:    [[DOTCOMPOUNDLITERAL_I:%.*]] = alloca <2 x i64>, align 16
+// CONSTRAINED-NEXT:    [[__U_ADDR:%.*]] = alloca i8, align 1
+// CONSTRAINED-NEXT:    [[__A_ADDR:%.*]] = alloca <8 x float>, align 32
+// CONSTRAINED-NEXT:    store i8 [[__U]], ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    store <8 x float> [[__A]], ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[__A_ADDR]], align 32
+// CONSTRAINED-NEXT:    store <2 x i64> zeroinitializer, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <2 x i64>, ptr [[DOTCOMPOUNDLITERAL_I]], align 16
+// CONSTRAINED-NEXT:    [[TMP2:%.*]] = bitcast <2 x i64> [[TMP1]] to <8 x i16>
+// CONSTRAINED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[__U_ADDR]], align 1
+// CONSTRAINED-NEXT:    [[TMP4:%.*]] = call <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256(<8 x float> [[TMP0]], i32 3, <8 x i16> [[TMP2]], i8 [[TMP3]]) #[[ATTR5]]
+// CONSTRAINED-NEXT:    [[TMP5:%.*]] = bitcast <8 x i16> [[TMP4]] to <2 x i64>
+// CONSTRAINED-NEXT:    ret <2 x i64> [[TMP5]]
+//
 __m128i test_mm256_maskz_cvt_roundps_ph(__mmask8 __U, __m256 __A) {
-  // COMMON-LABEL: @test_mm256_maskz_cvt_roundps_ph
-  // COMMONIR: @llvm.x86.avx512.mask.vcvtps2ph.256
   return _mm256_maskz_cvt_roundps_ph(__U, __A, _MM_FROUND_TO_ZERO);
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// COMMON: {{.*}}
+// COMMONIR: {{.*}}
+// UNCONSTRAINED: {{.*}}
diff --git a/clang/test/CodeGen/X86/avx512vlfp16-builtins-constrained.c b/clang/test/CodeGen/X86/avx512vlfp16-builtins-constrained.c
index 0fdc899cb8640..d8f29f3d85f50 100644
--- a/clang/test/CodeGen/X86/avx512vlfp16-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/avx512vlfp16-builtins-constrained.c
@@ -21,7 +21,7 @@
 __m128h test_mm_sqrt_ph(__m128h x) {
   // COMMON-LABEL: test_mm_sqrt_ph
   // UNCONSTRAINED: call {{.*}}<8 x half> @llvm.sqrt.v8f16(<8 x half> {{.*}})
-  // CONSTRAINED: call {{.*}}<8 x half> @llvm.experimental.constrained.sqrt.v8f16(<8 x half> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<8 x half> @llvm.sqrt.v8f16(<8 x half> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtph %xmm{{.*}},
   return _mm_sqrt_ph(x);
 }
@@ -30,7 +30,7 @@ __m128h test_mm_sqrt_ph(__m128h x) {
 __m256h test_mm256_sqrt_ph(__m256h x) {
   // COMMON-LABEL: test_mm256_sqrt_ph
   // UNCONSTRAINED: call {{.*}}<16 x half> @llvm.sqrt.v16f16(<16 x half> {{.*}})
-  // CONSTRAINED: call {{.*}}<16 x half> @llvm.experimental.constrained.sqrt.v16f16(<16 x half> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<16 x half> @llvm.sqrt.v16f16(<16 x half> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vsqrtph %ymm{{.*}},
   return _mm256_sqrt_ph(x);
 }
diff --git a/clang/test/CodeGen/X86/f16c-builtins-constrained.c b/clang/test/CodeGen/X86/f16c-builtins-constrained.c
index 50afea8e5fc1d..0b3f72a0d105d 100644
--- a/clang/test/CodeGen/X86/f16c-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/f16c-builtins-constrained.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -x c -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +f16c -emit-llvm -ffp-exception-behavior=strict -o - -Wall -Werror | FileCheck %s
 // RUN: %clang_cc1 -x c -ffreestanding %s -triple=i386-apple-darwin -target-feature +f16c -emit-llvm -ffp-exception-behavior=strict -o - -Wall -Werror | FileCheck %s
 // RUN: %clang_cc1 -x c++ -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +f16c -emit-llvm -ffp-exception-behavior=strict -o - -Wall -Werror | FileCheck %s
@@ -6,48 +7,34 @@
 
 #include <immintrin.h>
 
+//
 float test_cvtsh_ss(unsigned short a) {
-  // CHECK-LABEL: test_cvtsh_ss
-  // CHECK: [[CONV:%.*]] = call {{.*}}float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: ret float [[CONV]]
   return _cvtsh_ss(a);
 }
 
+//
 unsigned short test_cvtss_sh(float a) {
-  // CHECK-LABEL: test_cvtss_sh
-  // CHECK: insertelement <4 x float> poison, float %{{.*}}, i32 0
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 0, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: insertelement <4 x float> %{{.*}}, float %{{.*}}, i32 1
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 0, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: insertelement <4 x float> %{{.*}}, float %{{.*}}, i32 2
-  // CHECK: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 0, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: insertelement <4 x float> %{{.*}}, float %{{.*}}, i32 3
-  // CHECK: call <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %{{.*}}, i32 0)
-  // CHECK: extractelement <8 x i16> %{{.*}}, i32 0
   return _cvtss_sh(a, 0);
 }
 
+//
 __m128 test_mm_cvtph_ps(__m128i a) {
-  // CHECK-LABEL: test_mm_cvtph_ps
-  // CHECK: shufflevector <8 x i16> %{{.*}}, <8 x i16> %{{.*}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-  // CHECK: call {{.*}}<4 x float> @llvm.experimental.constrained.fpext.v4f32.v4f16(<4 x half> %{{.*}}, metadata !"fpexcept.strict")
   return _mm_cvtph_ps(a);
 }
 
+//
 __m256 test_mm256_cvtph_ps(__m128i a) {
-  // CHECK-LABEL: test_mm256_cvtph_ps
-  // CHECK: call {{.*}}<8 x float> @llvm.experimental.constrained.fpext.v8f32.v8f16(<8 x half> %{{.*}}, metadata !"fpexcept.strict")
   return _mm256_cvtph_ps(a);
 }
 
+//
 __m128i test_mm_cvtps_ph(__m128 a) {
-  // CHECK-LABEL: test_mm_cvtps_ph
-  // CHECK: call <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %{{.*}}, i32 0)
   return _mm_cvtps_ph(a, 0);
 }
 
+//
 __m128i test_mm256_cvtps_ph(__m256 a) {
-  // CHECK-LABEL: test_mm256_cvtps_ph
-  // CHECK: call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %{{.*}}, i32 0)
   return _mm256_cvtps_ph(a, 0);
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
diff --git a/clang/test/CodeGen/X86/fma-builtins-constrained.c b/clang/test/CodeGen/X86/fma-builtins-constrained.c
index 019dde2e02514..e9fb678f70de2 100644
--- a/clang/test/CodeGen/X86/fma-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/fma-builtins-constrained.c
@@ -20,7 +20,7 @@
 __m128 test_mm_fmadd_ps(__m128 a, __m128 b, __m128 c) {
   // COMMON-LABEL: test_mm_fmadd_ps
   // UNCONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadd213ps
   return _mm_fmadd_ps(a, b, c);
 }
@@ -28,7 +28,7 @@ __m128 test_mm_fmadd_ps(__m128 a, __m128 b, __m128 c) {
 __m128d test_mm_fmadd_pd(__m128d a, __m128d b, __m128d c) {
   // COMMON-LABEL: test_mm_fmadd_pd
   // UNCONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadd213pd
   return _mm_fmadd_pd(a, b, c);
 }
@@ -39,7 +39,7 @@ __m128 test_mm_fmadd_ss(__m128 a, __m128 b, __m128 c) {
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // UNCONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}})
-  // CONSTRAINED: call float @llvm.experimental.constrained.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadd213ss
   // COMMONIR: insertelement <4 x float> %{{.*}}, float %{{.*}}, i64 0
   return _mm_fmadd_ss(a, b, c);
@@ -51,7 +51,7 @@ __m128d test_mm_fmadd_sd(__m128d a, __m128d b, __m128d c) {
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // UNCONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}})
-  // CONSTRAINED: call double @llvm.experimental.constrained.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadd213sd
   // COMMONIR: insertelement <2 x double> %{{.*}}, double %{{.*}}, i64 0
   return _mm_fmadd_sd(a, b, c);
@@ -61,7 +61,7 @@ __m128 test_mm_fmsub_ps(__m128 a, __m128 b, __m128 c) {
   // COMMON-LABEL: test_mm_fmsub_ps
   // COMMONIR: [[NEG:%.+]] = fneg <4 x float> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsub213ps
   return _mm_fmsub_ps(a, b, c);
 }
@@ -70,7 +70,7 @@ __m128d test_mm_fmsub_pd(__m128d a, __m128d b, __m128d c) {
   // COMMON-LABEL: test_mm_fmsub_pd
   // COMMONIR: [[NEG:%.+]] = fneg <2 x double> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsub213pd
   return _mm_fmsub_pd(a, b, c);
 }
@@ -82,7 +82,7 @@ __m128 test_mm_fmsub_ss(__m128 a, __m128 b, __m128 c) {
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // COMMONIR: [[NEG:%.+]] = fneg float %{{.+}}
   // UNCONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}})
-  // CONSTRAINED: call float @llvm.experimental.constrained.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsub213ss
   // COMMONIR: insertelement <4 x float> %{{.*}}, float %{{.*}}, i64 0
   return _mm_fmsub_ss(a, b, c);
@@ -95,7 +95,7 @@ __m128d test_mm_fmsub_sd(__m128d a, __m128d b, __m128d c) {
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // COMMONIR: [[NEG:%.+]] = fneg double %{{.+}}
   // UNCONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}})
-  // CONSTRAINED: call double @llvm.experimental.constrained.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsub213sd
   // COMMONIR: insertelement <2 x double> %{{.*}}, double %{{.*}}, i64 0
   return _mm_fmsub_sd(a, b, c);
@@ -105,7 +105,7 @@ __m128 test_mm_fnmadd_ps(__m128 a, __m128 b, __m128 c) {
   // COMMON-LABEL: test_mm_fnmadd_ps
   // COMMONIR: [[NEG:%.+]] = fneg <4 x float> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmadd213ps
   return _mm_fnmadd_ps(a, b, c);
 }
@@ -114,7 +114,7 @@ __m128d test_mm_fnmadd_pd(__m128d a, __m128d b, __m128d c) {
   // COMMON-LABEL: test_mm_fnmadd_pd
   // COMMONIR: [[NEG:%.+]] = fneg <2 x double> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmadd213pd
   return _mm_fnmadd_pd(a, b, c);
 }
@@ -126,7 +126,7 @@ __m128 test_mm_fnmadd_ss(__m128 a, __m128 b, __m128 c) {
   // COMMONIR: [[NEG:%.+]] = fneg float %{{.+}}
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // UNCONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}})
-  // CONSTRAINED: call float @llvm.experimental.constrained.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmadd213ss
   // COMMONIR: insertelement <4 x float> %{{.*}}, float %{{.*}}, i64 0
   return _mm_fnmadd_ss(a, b, c);
@@ -139,7 +139,7 @@ __m128d test_mm_fnmadd_sd(__m128d a, __m128d b, __m128d c) {
   // COMMONIR: [[NEG:%.+]] = fneg double %{{.+}}
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // UNCONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}})
-  // CONSTRAINED: call double @llvm.experimental.constrained.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmadd213sd
   // COMMONIR: insertelement <2 x double> %{{.*}}, double %{{.*}}, i64 0
   return _mm_fnmadd_sd(a, b, c);
@@ -150,7 +150,7 @@ __m128 test_mm_fnmsub_ps(__m128 a, __m128 b, __m128 c) {
   // COMMONIR: [[NEG:%.+]] = fneg <4 x float> %{{.+}}
   // COMMONIR: [[NEG2:%.+]] = fneg <4 x float> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x float> @llvm.fma.v4f32(<4 x float> %{{.*}}, <4 x float> %{{.*}}, <4 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmsub213ps
   return _mm_fnmsub_ps(a, b, c);
 }
@@ -160,7 +160,7 @@ __m128d test_mm_fnmsub_pd(__m128d a, __m128d b, __m128d c) {
   // COMMONIR: [[NEG:%.+]] = fneg <2 x double> %{{.+}}
   // COMMONIR: [[NEG2:%.+]] = fneg <2 x double> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<2 x double> @llvm.fma.v2f64(<2 x double> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmsub213pd
   return _mm_fnmsub_pd(a, b, c);
 }
@@ -173,7 +173,7 @@ __m128 test_mm_fnmsub_ss(__m128 a, __m128 b, __m128 c) {
   // COMMONIR: extractelement <4 x float> %{{.*}}, i64 0
   // COMMONIR: [[NEG2:%.+]] = fneg float %{{.+}}
   // UNCONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}})
-  // CONSTRAINED: call float @llvm.experimental.constrained.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call float @llvm.fma.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmsub213ss
   // COMMONIR: insertelement <4 x float> %{{.*}}, float %{{.*}}, i64 0
   return _mm_fnmsub_ss(a, b, c);
@@ -187,7 +187,7 @@ __m128d test_mm_fnmsub_sd(__m128d a, __m128d b, __m128d c) {
   // COMMONIR: extractelement <2 x double> %{{.*}}, i64 0
   // COMMONIR: [[NEG2:%.+]] = fneg double %{{.+}}
   // UNCONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}})
-  // CONSTRAINED: call double @llvm.experimental.constrained.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call double @llvm.fma.f64(double %{{.*}}, double %{{.*}}, double %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmsub213sd
   // COMMONIR: insertelement <2 x double> %{{.*}}, double %{{.*}}, i64 0
   return _mm_fnmsub_sd(a, b, c);
@@ -228,7 +228,7 @@ __m128d test_mm_fmsubadd_pd(__m128d a, __m128d b, __m128d c) {
 __m256 test_mm256_fmadd_ps(__m256 a, __m256 b, __m256 c) {
   // COMMON-LABEL: test_mm256_fmadd_ps
   // UNCONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<8 x float> @llvm.experimental.constrained.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadd213ps
   return _mm256_fmadd_ps(a, b, c);
 }
@@ -236,7 +236,7 @@ __m256 test_mm256_fmadd_ps(__m256 a, __m256 b, __m256 c) {
 __m256d test_mm256_fmadd_pd(__m256d a, __m256d b, __m256d c) {
   // COMMON-LABEL: test_mm256_fmadd_pd
   // UNCONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x double> @llvm.experimental.constrained.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmadd213pd
   return _mm256_fmadd_pd(a, b, c);
 }
@@ -245,7 +245,7 @@ __m256 test_mm256_fmsub_ps(__m256 a, __m256 b, __m256 c) {
   // COMMON-LABEL: test_mm256_fmsub_ps
   // COMMONIR: [[NEG:%.+]] = fneg <8 x float> %{{.*}}
   // UNCONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<8 x float> @llvm.experimental.constrained.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsub213ps
   return _mm256_fmsub_ps(a, b, c);
 }
@@ -254,7 +254,7 @@ __m256d test_mm256_fmsub_pd(__m256d a, __m256d b, __m256d c) {
   // COMMON-LABEL: test_mm256_fmsub_pd
   // COMMONIR: [[NEG:%.+]] = fneg <4 x double> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x double> @llvm.experimental.constrained.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfmsub213pd
   return _mm256_fmsub_pd(a, b, c);
 }
@@ -263,7 +263,7 @@ __m256 test_mm256_fnmadd_ps(__m256 a, __m256 b, __m256 c) {
   // COMMON-LABEL: test_mm256_fnmadd_ps
   // COMMONIR: [[NEG:%.+]] = fneg <8 x float> %{{.*}}
   // UNCONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<8 x float> @llvm.experimental.constrained.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmadd213ps
   return _mm256_fnmadd_ps(a, b, c);
 }
@@ -272,7 +272,7 @@ __m256d test_mm256_fnmadd_pd(__m256d a, __m256d b, __m256d c) {
   // COMMON-LABEL: test_mm256_fnmadd_pd
   // COMMONIR: [[NEG:%.+]] = fneg <4 x double> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x double> @llvm.experimental.constrained.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmadd213pd
   return _mm256_fnmadd_pd(a, b, c);
 }
@@ -282,7 +282,7 @@ __m256 test_mm256_fnmsub_ps(__m256 a, __m256 b, __m256 c) {
   // COMMONIR: [[NEG:%.+]] = fneg <8 x float> %{{.*}}
   // COMMONIR: [[NEG2:%.+]] = fneg <8 x float> %{{.*}}
   // UNCONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}})
-  // CONSTRAINED: call {{.*}}<8 x float> @llvm.experimental.constrained.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<8 x float> @llvm.fma.v8f32(<8 x float> %{{.*}}, <8 x float> %{{.*}}, <8 x float> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmsub213ps
   return _mm256_fnmsub_ps(a, b, c);
 }
@@ -292,7 +292,7 @@ __m256d test_mm256_fnmsub_pd(__m256d a, __m256d b, __m256d c) {
   // COMMONIR: [[NEG:%.+]] = fneg <4 x double> %{{.+}}
   // COMMONIR: [[NEG2:%.+]] = fneg <4 x double> %{{.+}}
   // UNCONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}})
-  // CONSTRAINED: call {{.*}}<4 x double> @llvm.experimental.constrained.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x double> @llvm.fma.v4f64(<4 x double> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: vfnmsub213pd
   return _mm256_fnmsub_pd(a, b, c);
 }
diff --git a/clang/test/CodeGen/X86/sse-builtins-constrained.c b/clang/test/CodeGen/X86/sse-builtins-constrained.c
index f3b8d20944bd4..906db511b13cf 100644
--- a/clang/test/CodeGen/X86/sse-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/sse-builtins-constrained.c
@@ -21,7 +21,7 @@
 __m128 test_mm_sqrt_ps(__m128 x) {
   // COMMON-LABEL: test_mm_sqrt_ps
   // UNCONSTRAINED: call {{.*}}<4 x float> @llvm.sqrt.v4f32(<4 x float> {{.*}})
-  // CONSTRAINED: call {{.*}}<4 x float> @llvm.experimental.constrained.sqrt.v4f32(<4 x float> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<4 x float> @llvm.sqrt.v4f32(<4 x float> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: sqrtps
   return _mm_sqrt_ps(x);
 }
@@ -30,7 +30,7 @@ __m128 test_sqrt_ss(__m128 x) {
   // COMMON-LABEL: test_sqrt_ss
   // COMMONIR: extractelement <4 x float> {{.*}}, i32 0
   // UNCONSTRAINED: call float @llvm.sqrt.f32(float {{.*}})
-  // CONSTRAINED: call float @llvm.experimental.constrained.sqrt.f32(float {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call float @llvm.sqrt.f32(float {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: sqrtss
   // COMMONIR: insertelement <4 x float> {{.*}}, float {{.*}}, i32 0
   return _mm_sqrt_ss(x);
diff --git a/clang/test/CodeGen/X86/sse2-builtins-constrained.c b/clang/test/CodeGen/X86/sse2-builtins-constrained.c
index a4a0829720501..b867dfe247298 100644
--- a/clang/test/CodeGen/X86/sse2-builtins-constrained.c
+++ b/clang/test/CodeGen/X86/sse2-builtins-constrained.c
@@ -21,7 +21,7 @@
 __m128d test_mm_sqrt_pd(__m128d x) {
   // COMMON-LABEL: test_mm_sqrt_pd
   // UNCONSTRAINED: call {{.*}}<2 x double> @llvm.sqrt.v2f64(<2 x double> {{.*}})
-  // CONSTRAINED: call {{.*}}<2 x double> @llvm.experimental.constrained.sqrt.v2f64(<2 x double> {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call {{.*}}<2 x double> @llvm.sqrt.v2f64(<2 x double> {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: sqrtpd
   return _mm_sqrt_pd(x);
 }
@@ -30,7 +30,7 @@ __m128d test_sqrt_sd(__m128d x, __m128d y) {
   // COMMON-LABEL: test_sqrt_sd
   // COMMONIR: extractelement <2 x double> {{.*}}, i32 0
   // UNCONSTRAINED: call double @llvm.sqrt.f64(double {{.*}})
-  // CONSTRAINED: call double @llvm.experimental.constrained.sqrt.f64(double {{.*}}, metadata !{{.*}})
+  // CONSTRAINED: call double @llvm.sqrt.f64(double {{.*}}) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
   // CHECK-ASM: sqrtsd
   // COMMONIR: insertelement <2 x double> {{.*}}, double {{.*}}, i32 0
   return _mm_sqrt_sd(x, y);
diff --git a/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c b/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c
index 836b41b9c4e55..4529616726a54 100644
--- a/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c
+++ b/clang/test/CodeGen/arm-neon-directed-rounding-constrained.c
@@ -38,7 +38,7 @@
 
 // COMMON-LABEL: test_vrndi_f32
 // UNCONSTRAINED: [[VRNDI1_I:%.*]] = call <2 x float> @llvm.nearbyint.v2f32(<2 x float> [[VRNDI_I:%.*]])
-// CONSTRAINED:   [[VRNDI1_I:%.*]] = call <2 x float> @llvm.experimental.constrained.nearbyint.v2f32(<2 x float> [[VRNDI_I:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CONSTRAINED:   [[VRNDI1_I:%.*]] = call <2 x float> @llvm.nearbyint.v2f32(<2 x float> [[VRNDI_I:%.*]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
 // CHECK-ASM32:   vrintr.f32 s{{[0-9]+}}, s{{[0-9]+}}
 // CHECK-ASM32:   vrintr.f32 s{{[0-9]+}}, s{{[0-9]+}}
 // CHECK-ASM64:   frinti v{{[0-9]+}}.2s, v{{[0-9]+}}.2s
@@ -49,7 +49,7 @@ float32x2_t test_vrndi_f32(float32x2_t a) {
 
 // COMMON-LABEL: test_vrndiq_f32
 // UNCONSTRAINED: [[VRNDI1_I:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[VRNDI_I:%.*]])
-// CONSTRAINED:   [[VRNDI1_I:%.*]] = call <4 x float> @llvm.experimental.constrained.nearbyint.v4f32(<4 x float> [[VRNDI_I:%.*]], metadata !"round.tonearest", metadata !"fpexcept.strict")
+// CONSTRAINED:   [[VRNDI1_I:%.*]] = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> [[VRNDI_I:%.*]]) #{{[0-9]+}} [ "fp.control"(metadata !"rte") ]
 // CHECK-ASM32:   vrintr.f32 s{{[0-9]+}}, s{{[0-9]+}}
 // CHECK-ASM32:   vrintr.f32 s{{[0-9]+}}, s{{[0-9]+}}
 // CHECK-ASM32:   vrintr.f32 s{{[0-9]+}}, s{{[0-9]+}}
diff --git a/clang/test/CodeGen/arm64-vrnd-constrained.c b/clang/test/CodeGen/arm64-vrnd-constrained.c
index 8e61f1ea6a3d0..5d245615587dc 100644
--- a/clang/test/CodeGen/arm64-vrnd-constrained.c
+++ b/clang/test/CodeGen/arm64-vrnd-constrained.c
@@ -39,7 +39,7 @@
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP0]], ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[VRNDZ_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDZ1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> [[VRNDZ_I]], metadata !"fpexcept.strict") #[[ATTR2:[0-9]+]]
+// CONSTRAINED-NEXT:    [[VRNDZ1_I:%.*]] = call <2 x double> @llvm.trunc.v2f64(<2 x double> [[VRNDZ_I]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    store <2 x double> [[VRNDZ1_I]], ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP2]], ptr [[__RET_I]], align 16
@@ -79,7 +79,7 @@ float64x2_t rnd5(float64x2_t a) { return vrndq_f64(a); }
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP0]], ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[VRNDM_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDM1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> [[VRNDM_I]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VRNDM1_I:%.*]] = call <2 x double> @llvm.floor.v2f64(<2 x double> [[VRNDM_I]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    store <2 x double> [[VRNDM1_I]], ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP2]], ptr [[__RET_I]], align 16
@@ -119,7 +119,7 @@ float64x2_t rnd13(float64x2_t a) { return vrndmq_f64(a); }
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP0]], ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[VRNDP_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDP1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> [[VRNDP_I]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VRNDP1_I:%.*]] = call <2 x double> @llvm.ceil.v2f64(<2 x double> [[VRNDP_I]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    store <2 x double> [[VRNDP1_I]], ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP2]], ptr [[__RET_I]], align 16
@@ -159,7 +159,7 @@ float64x2_t rnd18(float64x2_t a) { return vrndpq_f64(a); }
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP0]], ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[VRNDA_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDA1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> [[VRNDA_I]], metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VRNDA1_I:%.*]] = call <2 x double> @llvm.round.v2f64(<2 x double> [[VRNDA_I]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    store <2 x double> [[VRNDA1_I]], ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP2]], ptr [[__RET_I]], align 16
@@ -199,7 +199,7 @@ float64x2_t rnd22(float64x2_t a) { return vrndaq_f64(a); }
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP0]], ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP1:%.*]] = load <16 x i8>, ptr [[__P0_ADDR_I]], align 16
 // CONSTRAINED-NEXT:    [[VRNDX_I:%.*]] = bitcast <16 x i8> [[TMP1]] to <2 x double>
-// CONSTRAINED-NEXT:    [[VRNDX1_I:%.*]] = call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double> [[VRNDX_I]], metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR2]]
+// CONSTRAINED-NEXT:    [[VRNDX1_I:%.*]] = call <2 x double> @llvm.rint.v2f64(<2 x double> [[VRNDX_I]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
 // CONSTRAINED-NEXT:    store <2 x double> [[VRNDX1_I]], ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    [[TMP2:%.*]] = load <2 x double>, ptr [[REF_TMP_I]], align 16
 // CONSTRAINED-NEXT:    store <2 x double> [[TMP2]], ptr [[__RET_I]], align 16
diff --git a/clang/test/CodeGen/ffp-contract-option.c b/clang/test/CodeGen/ffp-contract-option.c
index 2a6443032a4e6..73d252f4c7dee 100644
--- a/clang/test/CodeGen/ffp-contract-option.c
+++ b/clang/test/CodeGen/ffp-contract-option.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // REQUIRES: x86-registered-target
 // UNSUPPORTED: target={{.*}}-zos{{.*}}
 // RUN: %clang_cc1 -triple=x86_64 %s -emit-llvm -o - \
@@ -57,65 +58,161 @@
 // RUN: %clang -S -emit-llvm -ffast-math -fno-fast-math \
 // RUN: %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-FPC-ON
 
+// CHECK-DEFAULT-LABEL: define dso_local float @mymuladd(
+// CHECK-DEFAULT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DEFAULT-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEFAULT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEFAULT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-DEFAULT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEFAULT-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-DEFAULT-NEXT:    ret float [[ADD]]
+//
+// CHECK-ON-LABEL: define dso_local float @mymuladd(
+// CHECK-ON-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-ON-NEXT:  [[ENTRY:.*:]]
+// CHECK-ON-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-ON-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-ON-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-ON-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-ON-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-ON-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-ON-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-ON-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-ON-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-ON-NEXT:    [[TMP3:%.*]] = call float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-ON-NEXT:    ret float [[TMP3]]
+//
+// CHECK-CONTRACTOFF-LABEL: define dso_local nofpclass(nan inf) float @mymuladd(
+// CHECK-CONTRACTOFF-SAME: float noundef nofpclass(nan inf) [[X:%.*]], float noundef nofpclass(nan inf) [[Y:%.*]], float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-CONTRACTOFF-NEXT:  [[ENTRY:.*:]]
+// CHECK-CONTRACTOFF-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-CONTRACTOFF-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-CONTRACTOFF-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-CONTRACTOFF-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-CONTRACTOFF-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-CONTRACTOFF-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-CONTRACTOFF-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-CONTRACTOFF-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-CONTRACTOFF-NEXT:    [[MUL:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[TMP0]], [[TMP1]]
+// CHECK-CONTRACTOFF-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-CONTRACTOFF-NEXT:    [[ADD:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[MUL]], [[TMP2]]
+// CHECK-CONTRACTOFF-NEXT:    ret float [[ADD]]
+//
+// CHECK-ONFAST-LABEL: define dso_local nofpclass(nan inf) float @mymuladd(
+// CHECK-ONFAST-SAME: float noundef nofpclass(nan inf) [[X:%.*]], float noundef nofpclass(nan inf) [[Y:%.*]], float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-ONFAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-ONFAST-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-ONFAST-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-ONFAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-ONFAST-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-ONFAST-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-ONFAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-ONFAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-ONFAST-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-ONFAST-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-ONFAST-NEXT:    [[TMP3:%.*]] = call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32(float [[TMP0]], float [[TMP1]], float [[TMP2]])
+// CHECK-ONFAST-NEXT:    ret float [[TMP3]]
+//
+// CHECK-FASTFAST-LABEL: define dso_local nofpclass(nan inf) float @mymuladd(
+// CHECK-FASTFAST-SAME: float noundef nofpclass(nan inf) [[X:%.*]], float noundef nofpclass(nan inf) [[Y:%.*]], float noundef nofpclass(nan inf) [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FASTFAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-FASTFAST-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-FASTFAST-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-FASTFAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-FASTFAST-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-FASTFAST-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-FASTFAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-FASTFAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-FASTFAST-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-FASTFAST-NEXT:    [[MUL:%.*]] = fmul fast float [[TMP0]], [[TMP1]]
+// CHECK-FASTFAST-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-FASTFAST-NEXT:    [[ADD:%.*]] = fadd fast float [[MUL]], [[TMP2]]
+// CHECK-FASTFAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-NOFAST-LABEL: define dso_local float @mymuladd(
+// CHECK-NOFAST-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NOFAST-NEXT:  [[ENTRY:.*:]]
+// CHECK-NOFAST-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOFAST-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOFAST-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-NOFAST-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NOFAST-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-NOFAST-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-NOFAST-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-NOFAST-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-NOFAST-NEXT:    [[MUL:%.*]] = fmul float [[TMP0]], [[TMP1]]
+// CHECK-NOFAST-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-NOFAST-NEXT:    [[ADD:%.*]] = fadd float [[MUL]], [[TMP2]]
+// CHECK-NOFAST-NEXT:    ret float [[ADD]]
+//
+// CHECK-FPC-ON-LABEL: define dso_local float @mymuladd(
+// CHECK-FPC-ON-SAME: float noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], float noundef [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FPC-ON-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-FPC-ON-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-FPC-ON-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FPC-ON-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-FPC-ON-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-FPC-ON-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-FPC-ON-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-FPC-ON-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-FPC-ON-NEXT:    [[TMP9:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FPC-ON-NEXT:    [[TMP10:%.*]] = call float @llvm.fmuladd.f32(float [[TMP7]], float [[TMP8]], float [[TMP9]])
+// CHECK-FPC-ON-NEXT:    ret float [[TMP10]]
+//
+// CHECK-FPC-OFF-LABEL: define dso_local float @mymuladd(
+// CHECK-FPC-OFF-SAME: float noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], float noundef [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FPC-OFF-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-FPC-OFF-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-FPC-OFF-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FPC-OFF-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-FPC-OFF-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-FPC-OFF-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-FPC-OFF-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-FPC-OFF-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-FPC-OFF-NEXT:    [[TMP9:%.*]] = fmul float [[TMP7]], [[TMP8]]
+// CHECK-FPC-OFF-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FPC-OFF-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+// CHECK-FPC-OFF-NEXT:    ret float [[TMP11]]
+//
+// CHECK-FPSC-OFF-LABEL: define dso_local float @mymuladd(
+// CHECK-FPSC-OFF-SAME: float noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], float noundef [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FPSC-OFF-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-FPSC-OFF-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-FPSC-OFF-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FPSC-OFF-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-FPSC-OFF-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-FPSC-OFF-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-FPSC-OFF-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-FPSC-OFF-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-FPSC-OFF-NEXT:    [[TMP9:%.*]] = fmul float [[TMP7]], [[TMP8]]
+// CHECK-FPSC-OFF-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FPSC-OFF-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+// CHECK-FPSC-OFF-NEXT:    ret float [[TMP11]]
+//
 float mymuladd(float x, float y, float z) {
-  // CHECK: define{{.*}} float @mymuladd
   return x * y + z;
   // expected-warning{{overriding '-ffp-contract=fast' option with '-ffp-contract=on'}}
 
-  // CHECK-DEFAULT: load float, ptr
-  // CHECK-DEFAULT: fmul float
-  // CHECK-DEFAULT: load float, ptr
-  // CHECK-DEFAULT: fadd float
-
-  // CHECK-ON: load float, ptr
-  // CHECK-ON: load float, ptr
-  // CHECK-ON: load float, ptr
-  // CHECK-ON: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
-
-  // CHECK-CONTRACTFAST: load float, ptr
-  // CHECK-CONTRACTFAST: load float, ptr
-  // CHECK-CONTRACTFAST: fmul contract float
-  // CHECK-CONTRACTFAST: load float, ptr
-  // CHECK-CONTRACTFAST: fadd contract float
-
-  // CHECK-CONTRACTOFF: load float, ptr
-  // CHECK-CONTRACTOFF: load float, ptr
-  // CHECK-CONTRACTOFF: fmul reassoc nnan ninf nsz arcp afn float
-  // CHECK-CONTRACTOFF: load float, ptr
-  // CHECK-CONTRACTOFF: fadd reassoc nnan ninf nsz arcp afn float {{.*}}, {{.*}}
-
-  // CHECK-ONFAST: load float, ptr
-  // CHECK-ONFAST: load float, ptr
-  // CHECK-ONFAST: load float, ptr
-  // CHECK-ONFAST: call reassoc nnan ninf nsz arcp afn float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
-
-  // CHECK-FASTFAST: load float, ptr
-  // CHECK-FASTFAST: load float, ptr
-  // CHECK-FASTFAST: fmul fast float
-  // CHECK-FASTFAST: load float, ptr
-  // CHECK-FASTFAST: fadd fast float {{.*}}, {{.*}}
-
-  // CHECK-NOFAST: load float, ptr
-  // CHECK-NOFAST: load float, ptr
-  // CHECK-NOFAST: fmul float {{.*}}, {{.*}}
-  // CHECK-NOFAST: load float, ptr
-  // CHECK-NOFAST: fadd float {{.*}}, {{.*}}
-
-  // CHECK-FPC-ON: load float, ptr
-  // CHECK-FPC-ON: load float, ptr
-  // CHECK-FPC-ON: load float, ptr
-  // CHECK-FPC-ON: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
-
-  // CHECK-FPC-OFF: load float, ptr
-  // CHECK-FPC-OFF: load float, ptr
-  // CHECK-FPC-OFF: fmul float
-  // CHECK-FPC-OFF: load float, ptr
-  // CHECK-FPC-OFF: fadd float {{.*}}, {{.*}}
+
+
+
+
+
+
+
+
 
   // CHECK-FFPC-OFF: load float, ptr
   // CHECK-FFPC-OFF: load float, ptr
-  // CHECK-FPSC-OFF: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}})
-  // CHECK-FPSC-OFF: load float, ptr
-  // CHECK-FPSC-OFF: [[RES:%.*]] = call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}})
 
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
+// CHECK-CONTRACTFAST: {{.*}}
diff --git a/clang/test/CodeGen/ffp-model.c b/clang/test/CodeGen/ffp-model.c
index 5516ccb218b03..da027d122428e 100644
--- a/clang/test/CodeGen/ffp-model.c
+++ b/clang/test/CodeGen/ffp-model.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // REQUIRES: x86-registered-target
 // UNSUPPORTED: target={{.*}}-zos{{.*}}
 // RUN: %clang -S -emit-llvm -fenable-matrix -ffp-model=fast %s -o - \
@@ -19,152 +20,550 @@
 // RUN: %clang -S -emit-llvm -fenable-matrix -ffp-model=precise -ffast-math \
 // RUN: %s -o - | FileCheck %s --check-prefixes CHECK,CHECK-FAST1
 
+// CHECK-FAST-LABEL: define dso_local float @mymuladd(
+// CHECK-FAST-SAME: float noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], float noundef [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FAST-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-FAST-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-FAST-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-FAST-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-FAST-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-FAST-NEXT:    [[TMP9:%.*]] = fmul reassoc nsz arcp contract afn float [[TMP7]], [[TMP8]]
+// CHECK-FAST-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FAST-NEXT:    [[TMP11:%.*]] = fadd reassoc nsz arcp contract afn float [[TMP9]], [[TMP10]]
+// CHECK-FAST-NEXT:    ret float [[TMP11]]
+//
+// CHECK-AGGRESSIVE-LABEL: define dso_local nofpclass(nan inf) float @mymuladd(
+// CHECK-AGGRESSIVE-SAME: float noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], float noundef nofpclass(nan inf) [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-AGGRESSIVE-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-AGGRESSIVE-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP9:%.*]] = fmul fast float [[TMP7]], [[TMP8]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP11:%.*]] = fadd fast float [[TMP9]], [[TMP10]]
+// CHECK-AGGRESSIVE-NEXT:    ret float [[TMP11]]
+//
+// CHECK-PRECISE-LABEL: define dso_local float @mymuladd(
+// CHECK-PRECISE-SAME: float noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], float noundef [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-PRECISE-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-PRECISE-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-PRECISE-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-PRECISE-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-PRECISE-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-PRECISE-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP9:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP10:%.*]] = call float @llvm.fmuladd.f32(float [[TMP7]], float [[TMP8]], float [[TMP9]])
+// CHECK-PRECISE-NEXT:    ret float [[TMP10]]
+//
+// CHECK-STRICT-LABEL: define dso_local float @mymuladd(
+// CHECK-STRICT-SAME: float noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], float noundef [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-STRICT-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-STRICT-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-STRICT-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-STRICT-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-STRICT-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-STRICT-NEXT:    [[TMP9:%.*]] = fmul float [[TMP7]], [[TMP8]]
+// CHECK-STRICT-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-STRICT-NEXT:    [[TMP11:%.*]] = fadd float [[TMP9]], [[TMP10]]
+// CHECK-STRICT-NEXT:    ret float [[TMP11]]
+//
+// CHECK-STRICT-FAST-LABEL: define dso_local nofpclass(nan inf) float @mymuladd(
+// CHECK-STRICT-FAST-SAME: float noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], float noundef nofpclass(nan inf) [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-STRICT-FAST-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-STRICT-FAST-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-STRICT-FAST-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-STRICT-FAST-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP9:%.*]] = fmul fast float [[TMP7]], [[TMP8]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP11:%.*]] = fadd fast float [[TMP9]], [[TMP10]]
+// CHECK-STRICT-FAST-NEXT:    ret float [[TMP11]]
+//
+// CHECK-FAST1-LABEL: define dso_local nofpclass(nan inf) float @mymuladd(
+// CHECK-FAST1-SAME: float noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], float noundef nofpclass(nan inf) [[TMP2:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-FAST1-NEXT:    [[TMP4:%.*]] = alloca float, align 4
+// CHECK-FAST1-NEXT:    [[TMP5:%.*]] = alloca float, align 4
+// CHECK-FAST1-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FAST1-NEXT:    store float [[TMP0]], ptr [[TMP4]], align 4
+// CHECK-FAST1-NEXT:    store float [[TMP1]], ptr [[TMP5]], align 4
+// CHECK-FAST1-NEXT:    store float [[TMP2]], ptr [[TMP6]], align 4
+// CHECK-FAST1-NEXT:    [[TMP7:%.*]] = load float, ptr [[TMP4]], align 4
+// CHECK-FAST1-NEXT:    [[TMP8:%.*]] = load float, ptr [[TMP5]], align 4
+// CHECK-FAST1-NEXT:    [[TMP9:%.*]] = fmul fast float [[TMP7]], [[TMP8]]
+// CHECK-FAST1-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FAST1-NEXT:    [[TMP11:%.*]] = fadd fast float [[TMP9]], [[TMP10]]
+// CHECK-FAST1-NEXT:    ret float [[TMP11]]
+//
 float mymuladd(float x, float y, float z) {
-  // CHECK: define{{.*}} float @mymuladd
   return x * y + z;
 
-  // CHECK-AGGRESSIVE: fmul fast float
-  // CHECK-AGGRESSIVE: load float, ptr
-  // CHECK-AGGRESSIVE: fadd fast float
-
-  // CHECK-FAST: fmul reassoc nsz arcp contract afn float
-  // CHECK-FAST: load float, ptr
-  // CHECK-FAST: fadd reassoc nsz arcp contract afn float
-
-  // CHECK-PRECISE: load float, ptr
-  // CHECK-PRECISE: load float, ptr
-  // CHECK-PRECISE: load float, ptr
-  // CHECK-PRECISE: call float @llvm.fmuladd.f32(float {{.*}}, float {{.*}}, float {{.*}})
-
-  // CHECK-STRICT: load float, ptr
-  // CHECK-STRICT: load float, ptr
-  // CHECK-STRICT: call float @llvm.experimental.constrained.fmul.f32(float {{.*}}, float {{.*}}, {{.*}})
-  // CHECK-STRICT: load float, ptr
-  // CHECK-STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, {{.*}})
-
-  // CHECK-STRICT-FAST: load float, ptr
-  // CHECK-STRICT-FAST: load float, ptr
-  // CHECK-STRICT-FAST: fmul fast float {{.*}}, {{.*}}
-  // CHECK-STRICT-FAST: load float, ptr
-  // CHECK-STRICT-FAST: fadd fast float {{.*}}, {{.*}}
-
-  // CHECK-FAST1: load float, ptr
-  // CHECK-FAST1: load float, ptr
-  // CHECK-FAST1: fmul fast float {{.*}}, {{.*}}
-  // CHECK-FAST1: load float, ptr {{.*}}
-  // CHECK-FAST1: fadd fast float {{.*}}, {{.*}}
+
+
+
+
+
 }
 
 typedef float __attribute__((ext_vector_type(2))) v2f;
 
+// CHECK-FAST-LABEL: define dso_local void @my_vec_muladd(
+// CHECK-FAST-SAME: double noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], double noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR0]] {
+// CHECK-FAST-NEXT:    [[TMP5:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST-NEXT:    [[TMP6:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST-NEXT:    [[TMP7:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST-NEXT:    [[TMP8:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    [[TMP9:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST-NEXT:    [[TMP10:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    store double [[TMP0]], ptr [[TMP5]], align 8
+// CHECK-FAST-NEXT:    [[TMP11:%.*]] = load <2 x float>, ptr [[TMP5]], align 8
+// CHECK-FAST-NEXT:    store double [[TMP2]], ptr [[TMP6]], align 8
+// CHECK-FAST-NEXT:    [[TMP12:%.*]] = load <2 x float>, ptr [[TMP6]], align 8
+// CHECK-FAST-NEXT:    store <2 x float> [[TMP11]], ptr [[TMP7]], align 8
+// CHECK-FAST-NEXT:    store float [[TMP1]], ptr [[TMP8]], align 4
+// CHECK-FAST-NEXT:    store <2 x float> [[TMP12]], ptr [[TMP9]], align 8
+// CHECK-FAST-NEXT:    store ptr [[TMP3]], ptr [[TMP10]], align 8
+// CHECK-FAST-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 8
+// CHECK-FAST-NEXT:    [[TMP14:%.*]] = load float, ptr [[TMP8]], align 4
+// CHECK-FAST-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i64 0
+// CHECK-FAST-NEXT:    [[TMP16:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-FAST-NEXT:    [[TMP17:%.*]] = fmul reassoc nsz arcp contract afn <2 x float> [[TMP13]], [[TMP16]]
+// CHECK-FAST-NEXT:    [[TMP18:%.*]] = load <2 x float>, ptr [[TMP9]], align 8
+// CHECK-FAST-NEXT:    [[TMP19:%.*]] = fadd reassoc nsz arcp contract afn <2 x float> [[TMP17]], [[TMP18]]
+// CHECK-FAST-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[TMP10]], align 8
+// CHECK-FAST-NEXT:    store <2 x float> [[TMP19]], ptr [[TMP20]], align 8
+// CHECK-FAST-NEXT:    ret void
+//
+// CHECK-AGGRESSIVE-LABEL: define dso_local void @my_vec_muladd(
+// CHECK-AGGRESSIVE-SAME: double noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], double noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR0]] {
+// CHECK-AGGRESSIVE-NEXT:    [[TMP5:%.*]] = alloca <2 x float>, align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP6:%.*]] = alloca <2 x float>, align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP7:%.*]] = alloca <2 x float>, align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP8:%.*]] = alloca float, align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP9:%.*]] = alloca <2 x float>, align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP10:%.*]] = alloca ptr, align 8
+// CHECK-AGGRESSIVE-NEXT:    store double [[TMP0]], ptr [[TMP5]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP11:%.*]] = load <2 x float>, ptr [[TMP5]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store double [[TMP2]], ptr [[TMP6]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP12:%.*]] = load <2 x float>, ptr [[TMP6]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store <2 x float> [[TMP11]], ptr [[TMP7]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store float [[TMP1]], ptr [[TMP8]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store <2 x float> [[TMP12]], ptr [[TMP9]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store ptr [[TMP3]], ptr [[TMP10]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP14:%.*]] = load float, ptr [[TMP8]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i64 0
+// CHECK-AGGRESSIVE-NEXT:    [[TMP16:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-AGGRESSIVE-NEXT:    [[TMP17:%.*]] = fmul fast <2 x float> [[TMP13]], [[TMP16]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP18:%.*]] = load <2 x float>, ptr [[TMP9]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP19:%.*]] = fadd fast <2 x float> [[TMP17]], [[TMP18]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[TMP10]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store <2 x float> [[TMP19]], ptr [[TMP20]], align 8
+// CHECK-AGGRESSIVE-NEXT:    ret void
+//
+// CHECK-PRECISE-LABEL: define dso_local void @my_vec_muladd(
+// CHECK-PRECISE-SAME: double noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], double noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR0]] {
+// CHECK-PRECISE-NEXT:    [[TMP5:%.*]] = alloca <2 x float>, align 8
+// CHECK-PRECISE-NEXT:    [[TMP6:%.*]] = alloca <2 x float>, align 8
+// CHECK-PRECISE-NEXT:    [[TMP7:%.*]] = alloca <2 x float>, align 8
+// CHECK-PRECISE-NEXT:    [[TMP8:%.*]] = alloca float, align 4
+// CHECK-PRECISE-NEXT:    [[TMP9:%.*]] = alloca <2 x float>, align 8
+// CHECK-PRECISE-NEXT:    [[TMP10:%.*]] = alloca ptr, align 8
+// CHECK-PRECISE-NEXT:    store double [[TMP0]], ptr [[TMP5]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP11:%.*]] = load <2 x float>, ptr [[TMP5]], align 8
+// CHECK-PRECISE-NEXT:    store double [[TMP2]], ptr [[TMP6]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP12:%.*]] = load <2 x float>, ptr [[TMP6]], align 8
+// CHECK-PRECISE-NEXT:    store <2 x float> [[TMP11]], ptr [[TMP7]], align 8
+// CHECK-PRECISE-NEXT:    store float [[TMP1]], ptr [[TMP8]], align 4
+// CHECK-PRECISE-NEXT:    store <2 x float> [[TMP12]], ptr [[TMP9]], align 8
+// CHECK-PRECISE-NEXT:    store ptr [[TMP3]], ptr [[TMP10]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP14:%.*]] = load float, ptr [[TMP8]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i64 0
+// CHECK-PRECISE-NEXT:    [[TMP16:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-PRECISE-NEXT:    [[TMP17:%.*]] = load <2 x float>, ptr [[TMP9]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP18:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP13]], <2 x float> [[TMP16]], <2 x float> [[TMP17]])
+// CHECK-PRECISE-NEXT:    [[TMP19:%.*]] = load ptr, ptr [[TMP10]], align 8
+// CHECK-PRECISE-NEXT:    store <2 x float> [[TMP18]], ptr [[TMP19]], align 8
+// CHECK-PRECISE-NEXT:    ret void
+//
+// CHECK-STRICT-LABEL: define dso_local void @my_vec_muladd(
+// CHECK-STRICT-SAME: double noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], double noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR0]] {
+// CHECK-STRICT-NEXT:    [[TMP5:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-NEXT:    [[TMP6:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-NEXT:    [[TMP7:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-NEXT:    [[TMP8:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[TMP9:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-NEXT:    [[TMP10:%.*]] = alloca ptr, align 8
+// CHECK-STRICT-NEXT:    store double [[TMP0]], ptr [[TMP5]], align 8
+// CHECK-STRICT-NEXT:    [[TMP11:%.*]] = load <2 x float>, ptr [[TMP5]], align 8
+// CHECK-STRICT-NEXT:    store double [[TMP2]], ptr [[TMP6]], align 8
+// CHECK-STRICT-NEXT:    [[TMP12:%.*]] = load <2 x float>, ptr [[TMP6]], align 8
+// CHECK-STRICT-NEXT:    store <2 x float> [[TMP11]], ptr [[TMP7]], align 8
+// CHECK-STRICT-NEXT:    store float [[TMP1]], ptr [[TMP8]], align 4
+// CHECK-STRICT-NEXT:    store <2 x float> [[TMP12]], ptr [[TMP9]], align 8
+// CHECK-STRICT-NEXT:    store ptr [[TMP3]], ptr [[TMP10]], align 8
+// CHECK-STRICT-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 8
+// CHECK-STRICT-NEXT:    [[TMP14:%.*]] = load float, ptr [[TMP8]], align 4
+// CHECK-STRICT-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i64 0
+// CHECK-STRICT-NEXT:    [[TMP16:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP17:%.*]] = fmul <2 x float> [[TMP13]], [[TMP16]]
+// CHECK-STRICT-NEXT:    [[TMP18:%.*]] = load <2 x float>, ptr [[TMP9]], align 8
+// CHECK-STRICT-NEXT:    [[TMP19:%.*]] = fadd <2 x float> [[TMP17]], [[TMP18]]
+// CHECK-STRICT-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[TMP10]], align 8
+// CHECK-STRICT-NEXT:    store <2 x float> [[TMP19]], ptr [[TMP20]], align 8
+// CHECK-STRICT-NEXT:    ret void
+//
+// CHECK-STRICT-FAST-LABEL: define dso_local void @my_vec_muladd(
+// CHECK-STRICT-FAST-SAME: double noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], double noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR0]] {
+// CHECK-STRICT-FAST-NEXT:    [[TMP5:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP6:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP7:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP8:%.*]] = alloca float, align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP9:%.*]] = alloca <2 x float>, align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP10:%.*]] = alloca ptr, align 8
+// CHECK-STRICT-FAST-NEXT:    store double [[TMP0]], ptr [[TMP5]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP11:%.*]] = load <2 x float>, ptr [[TMP5]], align 8
+// CHECK-STRICT-FAST-NEXT:    store double [[TMP2]], ptr [[TMP6]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP12:%.*]] = load <2 x float>, ptr [[TMP6]], align 8
+// CHECK-STRICT-FAST-NEXT:    store <2 x float> [[TMP11]], ptr [[TMP7]], align 8
+// CHECK-STRICT-FAST-NEXT:    store float [[TMP1]], ptr [[TMP8]], align 4
+// CHECK-STRICT-FAST-NEXT:    store <2 x float> [[TMP12]], ptr [[TMP9]], align 8
+// CHECK-STRICT-FAST-NEXT:    store ptr [[TMP3]], ptr [[TMP10]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP14:%.*]] = load float, ptr [[TMP8]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i64 0
+// CHECK-STRICT-FAST-NEXT:    [[TMP16:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-STRICT-FAST-NEXT:    [[TMP17:%.*]] = fmul fast <2 x float> [[TMP13]], [[TMP16]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP18:%.*]] = load <2 x float>, ptr [[TMP9]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP19:%.*]] = fadd fast <2 x float> [[TMP17]], [[TMP18]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[TMP10]], align 8
+// CHECK-STRICT-FAST-NEXT:    store <2 x float> [[TMP19]], ptr [[TMP20]], align 8
+// CHECK-STRICT-FAST-NEXT:    ret void
+//
+// CHECK-FAST1-LABEL: define dso_local void @my_vec_muladd(
+// CHECK-FAST1-SAME: double noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], double noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR0]] {
+// CHECK-FAST1-NEXT:    [[TMP5:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST1-NEXT:    [[TMP6:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST1-NEXT:    [[TMP7:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST1-NEXT:    [[TMP8:%.*]] = alloca float, align 4
+// CHECK-FAST1-NEXT:    [[TMP9:%.*]] = alloca <2 x float>, align 8
+// CHECK-FAST1-NEXT:    [[TMP10:%.*]] = alloca ptr, align 8
+// CHECK-FAST1-NEXT:    store double [[TMP0]], ptr [[TMP5]], align 8
+// CHECK-FAST1-NEXT:    [[TMP11:%.*]] = load <2 x float>, ptr [[TMP5]], align 8
+// CHECK-FAST1-NEXT:    store double [[TMP2]], ptr [[TMP6]], align 8
+// CHECK-FAST1-NEXT:    [[TMP12:%.*]] = load <2 x float>, ptr [[TMP6]], align 8
+// CHECK-FAST1-NEXT:    store <2 x float> [[TMP11]], ptr [[TMP7]], align 8
+// CHECK-FAST1-NEXT:    store float [[TMP1]], ptr [[TMP8]], align 4
+// CHECK-FAST1-NEXT:    store <2 x float> [[TMP12]], ptr [[TMP9]], align 8
+// CHECK-FAST1-NEXT:    store ptr [[TMP3]], ptr [[TMP10]], align 8
+// CHECK-FAST1-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 8
+// CHECK-FAST1-NEXT:    [[TMP14:%.*]] = load float, ptr [[TMP8]], align 4
+// CHECK-FAST1-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> poison, float [[TMP14]], i64 0
+// CHECK-FAST1-NEXT:    [[TMP16:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-FAST1-NEXT:    [[TMP17:%.*]] = fmul fast <2 x float> [[TMP13]], [[TMP16]]
+// CHECK-FAST1-NEXT:    [[TMP18:%.*]] = load <2 x float>, ptr [[TMP9]], align 8
+// CHECK-FAST1-NEXT:    [[TMP19:%.*]] = fadd fast <2 x float> [[TMP17]], [[TMP18]]
+// CHECK-FAST1-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[TMP10]], align 8
+// CHECK-FAST1-NEXT:    store <2 x float> [[TMP19]], ptr [[TMP20]], align 8
+// CHECK-FAST1-NEXT:    ret void
+//
 void my_vec_muladd(v2f x, float y, v2f z, v2f *res) {
-  // CHECK: define{{.*}}@my_vec_muladd
   *res = x * y + z;
 
-  // CHECK-AGGRESSIVE: fmul fast <2 x float>
-  // CHECK-AGGRESSIVE: load <2 x float>, ptr
-  // CHECK-AGGRESSIVE: fadd fast <2 x float>
-
-  // CHECK-FAST: fmul reassoc nsz arcp contract afn <2 x float>
-  // CHECK-FAST: load <2 x float>, ptr
-  // CHECK-FAST: fadd reassoc nsz arcp contract afn <2 x float>
-
-  // CHECK-PRECISE: load <2 x float>, ptr
-  // CHECK-PRECISE: load float, ptr
-  // CHECK-PRECISE: load <2 x float>, ptr
-  // CHECK-PRECISE: call <2 x float> @llvm.fmuladd.v2f32(<2 x float> {{.*}}, <2 x float> {{.*}}, <2 x float> {{.*}})
-
-  // CHECK-STRICT: load <2 x float>, ptr
-  // CHECK-STRICT: load float, ptr
-  // CHECK-STRICT: call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> {{.*}}, <2 x float> {{.*}}, {{.*}})
-  // CHECK-STRICT: load <2 x float>, ptr
-  // CHECK-STRICT: call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> {{.*}}, <2 x float> {{.*}}, {{.*}})
-
-  // CHECK-STRICT-FAST: load <2 x float>, ptr
-  // CHECK-STRICT-FAST: load float, ptr
-  // CHECK-STRICT-FAST: fmul fast <2 x float> {{.*}}, {{.*}}
-  // CHECK-STRICT-FAST: load <2 x float>, ptr
-  // CHECK-STRICT-FAST: fadd fast <2 x float> {{.*}}, {{.*}}
-
-  // CHECK-FAST1: load <2 x float>, ptr
-  // CHECK-FAST1: load float, ptr
-  // CHECK-FAST1: fmul fast <2 x float> {{.*}}, {{.*}}
-  // CHECK-FAST1: load <2 x float>, ptr {{.*}}
-  // CHECK-FAST1: fadd fast <2 x float> {{.*}}, {{.*}}
+
+
+
+
+
 }
 
 typedef float __attribute__((matrix_type(2, 1))) m21f;
 
+// CHECK-FAST-LABEL: define dso_local void @my_m21_muladd(
+// CHECK-FAST-SAME: <2 x float> noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], <2 x float> noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-FAST-NEXT:    [[TMP5:%.*]] = alloca [2 x float], align 4
+// CHECK-FAST-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    [[TMP7:%.*]] = alloca [2 x float], align 4
+// CHECK-FAST-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    store <2 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-FAST-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-FAST-NEXT:    store <2 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-FAST-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-FAST-NEXT:    [[TMP9:%.*]] = load <2 x float>, ptr [[TMP5]], align 4
+// CHECK-FAST-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FAST-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i64 0
+// CHECK-FAST-NEXT:    [[TMP12:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-FAST-NEXT:    [[TMP13:%.*]] = fmul reassoc nsz arcp contract afn <2 x float> [[TMP9]], [[TMP12]]
+// CHECK-FAST-NEXT:    [[TMP14:%.*]] = load <2 x float>, ptr [[TMP7]], align 4
+// CHECK-FAST-NEXT:    [[TMP15:%.*]] = fadd reassoc nsz arcp contract afn <2 x float> [[TMP13]], [[TMP14]]
+// CHECK-FAST-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-FAST-NEXT:    store <2 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-FAST-NEXT:    ret void
+//
+// CHECK-AGGRESSIVE-LABEL: define dso_local void @my_m21_muladd(
+// CHECK-AGGRESSIVE-SAME: <2 x float> noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], <2 x float> noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-AGGRESSIVE-NEXT:    [[TMP5:%.*]] = alloca [2 x float], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP7:%.*]] = alloca [2 x float], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-AGGRESSIVE-NEXT:    store <2 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store <2 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP9:%.*]] = load <2 x float>, ptr [[TMP5]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i64 0
+// CHECK-AGGRESSIVE-NEXT:    [[TMP12:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-AGGRESSIVE-NEXT:    [[TMP13:%.*]] = fmul fast <2 x float> [[TMP9]], [[TMP12]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP14:%.*]] = load <2 x float>, ptr [[TMP7]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP15:%.*]] = fadd fast <2 x float> [[TMP13]], [[TMP14]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store <2 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-AGGRESSIVE-NEXT:    ret void
+//
+// CHECK-PRECISE-LABEL: define dso_local void @my_m21_muladd(
+// CHECK-PRECISE-SAME: <2 x float> noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], <2 x float> noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-PRECISE-NEXT:    [[TMP5:%.*]] = alloca [2 x float], align 4
+// CHECK-PRECISE-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-PRECISE-NEXT:    [[TMP7:%.*]] = alloca [2 x float], align 4
+// CHECK-PRECISE-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-PRECISE-NEXT:    store <2 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-PRECISE-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-PRECISE-NEXT:    store <2 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-PRECISE-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP9:%.*]] = load <2 x float>, ptr [[TMP5]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i64 0
+// CHECK-PRECISE-NEXT:    [[TMP12:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-PRECISE-NEXT:    [[TMP13:%.*]] = load <2 x float>, ptr [[TMP7]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP14:%.*]] = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> [[TMP9]], <2 x float> [[TMP12]], <2 x float> [[TMP13]])
+// CHECK-PRECISE-NEXT:    [[TMP15:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-PRECISE-NEXT:    store <2 x float> [[TMP14]], ptr [[TMP15]], align 4
+// CHECK-PRECISE-NEXT:    ret void
+//
+// CHECK-STRICT-LABEL: define dso_local void @my_m21_muladd(
+// CHECK-STRICT-SAME: <2 x float> noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], <2 x float> noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-STRICT-NEXT:    [[TMP5:%.*]] = alloca [2 x float], align 4
+// CHECK-STRICT-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[TMP7:%.*]] = alloca [2 x float], align 4
+// CHECK-STRICT-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-STRICT-NEXT:    store <2 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-STRICT-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-STRICT-NEXT:    store <2 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-STRICT-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-STRICT-NEXT:    [[TMP9:%.*]] = load <2 x float>, ptr [[TMP5]], align 4
+// CHECK-STRICT-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-STRICT-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i64 0
+// CHECK-STRICT-NEXT:    [[TMP12:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP13:%.*]] = fmul <2 x float> [[TMP9]], [[TMP12]]
+// CHECK-STRICT-NEXT:    [[TMP14:%.*]] = load <2 x float>, ptr [[TMP7]], align 4
+// CHECK-STRICT-NEXT:    [[TMP15:%.*]] = fadd <2 x float> [[TMP13]], [[TMP14]]
+// CHECK-STRICT-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-STRICT-NEXT:    store <2 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-STRICT-NEXT:    ret void
+//
+// CHECK-STRICT-FAST-LABEL: define dso_local void @my_m21_muladd(
+// CHECK-STRICT-FAST-SAME: <2 x float> noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], <2 x float> noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-STRICT-FAST-NEXT:    [[TMP5:%.*]] = alloca [2 x float], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP7:%.*]] = alloca [2 x float], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-STRICT-FAST-NEXT:    store <2 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-STRICT-FAST-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-STRICT-FAST-NEXT:    store <2 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-STRICT-FAST-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP9:%.*]] = load <2 x float>, ptr [[TMP5]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i64 0
+// CHECK-STRICT-FAST-NEXT:    [[TMP12:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-STRICT-FAST-NEXT:    [[TMP13:%.*]] = fmul fast <2 x float> [[TMP9]], [[TMP12]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP14:%.*]] = load <2 x float>, ptr [[TMP7]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP15:%.*]] = fadd fast <2 x float> [[TMP13]], [[TMP14]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-STRICT-FAST-NEXT:    store <2 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-STRICT-FAST-NEXT:    ret void
+//
+// CHECK-FAST1-LABEL: define dso_local void @my_m21_muladd(
+// CHECK-FAST1-SAME: <2 x float> noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], <2 x float> noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR1:[0-9]+]] {
+// CHECK-FAST1-NEXT:    [[TMP5:%.*]] = alloca [2 x float], align 4
+// CHECK-FAST1-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FAST1-NEXT:    [[TMP7:%.*]] = alloca [2 x float], align 4
+// CHECK-FAST1-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-FAST1-NEXT:    store <2 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-FAST1-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-FAST1-NEXT:    store <2 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-FAST1-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-FAST1-NEXT:    [[TMP9:%.*]] = load <2 x float>, ptr [[TMP5]], align 4
+// CHECK-FAST1-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FAST1-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i64 0
+// CHECK-FAST1-NEXT:    [[TMP12:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> zeroinitializer
+// CHECK-FAST1-NEXT:    [[TMP13:%.*]] = fmul fast <2 x float> [[TMP9]], [[TMP12]]
+// CHECK-FAST1-NEXT:    [[TMP14:%.*]] = load <2 x float>, ptr [[TMP7]], align 4
+// CHECK-FAST1-NEXT:    [[TMP15:%.*]] = fadd fast <2 x float> [[TMP13]], [[TMP14]]
+// CHECK-FAST1-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-FAST1-NEXT:    store <2 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-FAST1-NEXT:    ret void
+//
 void my_m21_muladd(m21f x, float y, m21f z, m21f *res) {
-  // CHECK: define{{.*}}@my_m21_muladd
   *res = x * y + z;
 
-  // CHECK-AGGRESSIVE: fmul fast <2 x float>
-  // CHECK-AGGRESSIVE: load <2 x float>, ptr
-  // CHECK-AGGRESSIVE: fadd fast <2 x float>
-
-  // CHECK-FAST: fmul reassoc nsz arcp contract afn <2 x float>
-  // CHECK-FAST: load <2 x float>, ptr
-  // CHECK-FAST: fadd reassoc nsz arcp contract afn <2 x float>
-
-  // CHECK-PRECISE: load <2 x float>, ptr
-  // CHECK-PRECISE: load float, ptr
-  // CHECK-PRECISE: load <2 x float>, ptr
-  // CHECK-PRECISE: call <2 x float> @llvm.fmuladd.v2f32(<2 x float> {{.*}}, <2 x float> {{.*}}, <2 x float> {{.*}})
-
-  // CHECK-STRICT: load <2 x float>, ptr
-  // CHECK-STRICT: load float, ptr
-  // CHECK-STRICT: call <2 x float> @llvm.experimental.constrained.fmul.v2f32(<2 x float> {{.*}}, <2 x float> {{.*}}, {{.*}})
-  // CHECK-STRICT: load <2 x float>, ptr
-  // CHECK-STRICT: call <2 x float> @llvm.experimental.constrained.fadd.v2f32(<2 x float> {{.*}}, <2 x float> {{.*}}, {{.*}})
-
-  // CHECK-STRICT-FAST: load <2 x float>, ptr
-  // CHECK-STRICT-FAST: load float, ptr
-  // CHECK-STRICT-FAST: fmul fast <2 x float> {{.*}}, {{.*}}
-  // CHECK-STRICT-FAST: load <2 x float>, ptr
-  // CHECK-STRICT-FAST: fadd fast <2 x float> {{.*}}, {{.*}}
-
-  // CHECK-FAST1: load <2 x float>, ptr
-  // CHECK-FAST1: load float, ptr
-  // CHECK-FAST1: fmul fast <2 x float> {{.*}}, {{.*}}
-  // CHECK-FAST1: load <2 x float>, ptr {{.*}}
-  // CHECK-FAST1: fadd fast <2 x float> {{.*}}, {{.*}}
+
+
+
+
+
 }
 
 typedef float __attribute__((matrix_type(2, 2))) m22f;
 
+// CHECK-FAST-LABEL: define dso_local void @my_m22_muladd(
+// CHECK-FAST-SAME: <4 x float> noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], <4 x float> noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-FAST-NEXT:    [[TMP5:%.*]] = alloca [4 x float], align 4
+// CHECK-FAST-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FAST-NEXT:    [[TMP7:%.*]] = alloca [4 x float], align 4
+// CHECK-FAST-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-FAST-NEXT:    store <4 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-FAST-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-FAST-NEXT:    store <4 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-FAST-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-FAST-NEXT:    [[TMP9:%.*]] = load <4 x float>, ptr [[TMP5]], align 4
+// CHECK-FAST-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FAST-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP10]], i64 0
+// CHECK-FAST-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-FAST-NEXT:    [[TMP13:%.*]] = fmul reassoc nsz arcp contract afn <4 x float> [[TMP9]], [[TMP12]]
+// CHECK-FAST-NEXT:    [[TMP14:%.*]] = load <4 x float>, ptr [[TMP7]], align 4
+// CHECK-FAST-NEXT:    [[TMP15:%.*]] = fadd reassoc nsz arcp contract afn <4 x float> [[TMP13]], [[TMP14]]
+// CHECK-FAST-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-FAST-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-FAST-NEXT:    ret void
+//
+// CHECK-AGGRESSIVE-LABEL: define dso_local void @my_m22_muladd(
+// CHECK-AGGRESSIVE-SAME: <4 x float> noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], <4 x float> noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-AGGRESSIVE-NEXT:    [[TMP5:%.*]] = alloca [4 x float], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP7:%.*]] = alloca [4 x float], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-AGGRESSIVE-NEXT:    store <4 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store <4 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-AGGRESSIVE-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-AGGRESSIVE-NEXT:    [[TMP9:%.*]] = load <4 x float>, ptr [[TMP5]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP10]], i64 0
+// CHECK-AGGRESSIVE-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-AGGRESSIVE-NEXT:    [[TMP13:%.*]] = fmul fast <4 x float> [[TMP9]], [[TMP12]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP14:%.*]] = load <4 x float>, ptr [[TMP7]], align 4
+// CHECK-AGGRESSIVE-NEXT:    [[TMP15:%.*]] = fadd fast <4 x float> [[TMP13]], [[TMP14]]
+// CHECK-AGGRESSIVE-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-AGGRESSIVE-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-AGGRESSIVE-NEXT:    ret void
+//
+// CHECK-PRECISE-LABEL: define dso_local void @my_m22_muladd(
+// CHECK-PRECISE-SAME: <4 x float> noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], <4 x float> noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR3:[0-9]+]] {
+// CHECK-PRECISE-NEXT:    [[TMP5:%.*]] = alloca [4 x float], align 4
+// CHECK-PRECISE-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-PRECISE-NEXT:    [[TMP7:%.*]] = alloca [4 x float], align 4
+// CHECK-PRECISE-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-PRECISE-NEXT:    store <4 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-PRECISE-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-PRECISE-NEXT:    store <4 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-PRECISE-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-PRECISE-NEXT:    [[TMP9:%.*]] = load <4 x float>, ptr [[TMP5]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP10]], i64 0
+// CHECK-PRECISE-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-PRECISE-NEXT:    [[TMP13:%.*]] = load <4 x float>, ptr [[TMP7]], align 4
+// CHECK-PRECISE-NEXT:    [[TMP14:%.*]] = call <4 x float> @llvm.fmuladd.v4f32(<4 x float> [[TMP9]], <4 x float> [[TMP12]], <4 x float> [[TMP13]])
+// CHECK-PRECISE-NEXT:    [[TMP15:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-PRECISE-NEXT:    store <4 x float> [[TMP14]], ptr [[TMP15]], align 4
+// CHECK-PRECISE-NEXT:    ret void
+//
+// CHECK-STRICT-LABEL: define dso_local void @my_m22_muladd(
+// CHECK-STRICT-SAME: <4 x float> noundef [[TMP0:%.*]], float noundef [[TMP1:%.*]], <4 x float> noundef [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-STRICT-NEXT:    [[TMP5:%.*]] = alloca [4 x float], align 4
+// CHECK-STRICT-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[TMP7:%.*]] = alloca [4 x float], align 4
+// CHECK-STRICT-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-STRICT-NEXT:    store <4 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-STRICT-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-STRICT-NEXT:    store <4 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-STRICT-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-STRICT-NEXT:    [[TMP9:%.*]] = load <4 x float>, ptr [[TMP5]], align 4
+// CHECK-STRICT-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-STRICT-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP10]], i64 0
+// CHECK-STRICT-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-STRICT-NEXT:    [[TMP13:%.*]] = fmul <4 x float> [[TMP9]], [[TMP12]]
+// CHECK-STRICT-NEXT:    [[TMP14:%.*]] = load <4 x float>, ptr [[TMP7]], align 4
+// CHECK-STRICT-NEXT:    [[TMP15:%.*]] = fadd <4 x float> [[TMP13]], [[TMP14]]
+// CHECK-STRICT-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-STRICT-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-STRICT-NEXT:    ret void
+//
+// CHECK-STRICT-FAST-LABEL: define dso_local void @my_m22_muladd(
+// CHECK-STRICT-FAST-SAME: <4 x float> noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], <4 x float> noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-STRICT-FAST-NEXT:    [[TMP5:%.*]] = alloca [4 x float], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP7:%.*]] = alloca [4 x float], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-STRICT-FAST-NEXT:    store <4 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-STRICT-FAST-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-STRICT-FAST-NEXT:    store <4 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-STRICT-FAST-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-STRICT-FAST-NEXT:    [[TMP9:%.*]] = load <4 x float>, ptr [[TMP5]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP10]], i64 0
+// CHECK-STRICT-FAST-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-STRICT-FAST-NEXT:    [[TMP13:%.*]] = fmul fast <4 x float> [[TMP9]], [[TMP12]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP14:%.*]] = load <4 x float>, ptr [[TMP7]], align 4
+// CHECK-STRICT-FAST-NEXT:    [[TMP15:%.*]] = fadd fast <4 x float> [[TMP13]], [[TMP14]]
+// CHECK-STRICT-FAST-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-STRICT-FAST-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-STRICT-FAST-NEXT:    ret void
+//
+// CHECK-FAST1-LABEL: define dso_local void @my_m22_muladd(
+// CHECK-FAST1-SAME: <4 x float> noundef nofpclass(nan inf) [[TMP0:%.*]], float noundef nofpclass(nan inf) [[TMP1:%.*]], <4 x float> noundef nofpclass(nan inf) [[TMP2:%.*]], ptr noundef [[TMP3:%.*]]) #[[ATTR2:[0-9]+]] {
+// CHECK-FAST1-NEXT:    [[TMP5:%.*]] = alloca [4 x float], align 4
+// CHECK-FAST1-NEXT:    [[TMP6:%.*]] = alloca float, align 4
+// CHECK-FAST1-NEXT:    [[TMP7:%.*]] = alloca [4 x float], align 4
+// CHECK-FAST1-NEXT:    [[TMP8:%.*]] = alloca ptr, align 8
+// CHECK-FAST1-NEXT:    store <4 x float> [[TMP0]], ptr [[TMP5]], align 4
+// CHECK-FAST1-NEXT:    store float [[TMP1]], ptr [[TMP6]], align 4
+// CHECK-FAST1-NEXT:    store <4 x float> [[TMP2]], ptr [[TMP7]], align 4
+// CHECK-FAST1-NEXT:    store ptr [[TMP3]], ptr [[TMP8]], align 8
+// CHECK-FAST1-NEXT:    [[TMP9:%.*]] = load <4 x float>, ptr [[TMP5]], align 4
+// CHECK-FAST1-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+// CHECK-FAST1-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP10]], i64 0
+// CHECK-FAST1-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> poison, <4 x i32> zeroinitializer
+// CHECK-FAST1-NEXT:    [[TMP13:%.*]] = fmul fast <4 x float> [[TMP9]], [[TMP12]]
+// CHECK-FAST1-NEXT:    [[TMP14:%.*]] = load <4 x float>, ptr [[TMP7]], align 4
+// CHECK-FAST1-NEXT:    [[TMP15:%.*]] = fadd fast <4 x float> [[TMP13]], [[TMP14]]
+// CHECK-FAST1-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[TMP8]], align 8
+// CHECK-FAST1-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP16]], align 4
+// CHECK-FAST1-NEXT:    ret void
+//
 void my_m22_muladd(m22f x, float y, m22f z, m22f *res) {
-  // CHECK: define{{.*}}@my_m22_muladd
   *res = x * y + z;
 
-  // CHECK-AGGRESSIVE: fmul fast <4 x float>
-  // CHECK-AGGRESSIVE: load <4 x float>, ptr
-  // CHECK-AGGRESSIVE: fadd fast <4 x float>
-
-  // CHECK-FAST: fmul reassoc nsz arcp contract afn <4 x float>
-  // CHECK-FAST: load <4 x float>, ptr
-  // CHECK-FAST: fadd reassoc nsz arcp contract afn <4 x float>
-
-  // CHECK-PRECISE: load <4 x float>, ptr
-  // CHECK-PRECISE: load float, ptr
-  // CHECK-PRECISE: load <4 x float>, ptr
-  // CHECK-PRECISE: call <4 x float> @llvm.fmuladd.v4f32(<4 x float> {{.*}}, <4 x float> {{.*}}, <4 x float> {{.*}})
-
-  // CHECK-STRICT: load <4 x float>, ptr
-  // CHECK-STRICT: load float, ptr
-  // CHECK-STRICT: call <4 x float> @llvm.experimental.constrained.fmul.v4f32(<4 x float> {{.*}}, <4 x float> {{.*}}, {{.*}})
-  // CHECK-STRICT: load <4 x float>, ptr
-  // CHECK-STRICT: call <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float> {{.*}}, <4 x float> {{.*}}, {{.*}})
-
-  // CHECK-STRICT-FAST: load <4 x float>, ptr
-  // CHECK-STRICT-FAST: load float, ptr
-  // CHECK-STRICT-FAST: fmul fast <4 x float> {{.*}}, {{.*}}
-  // CHECK-STRICT-FAST: load <4 x float>, ptr
-  // CHECK-STRICT-FAST: fadd fast <4 x float> {{.*}}, {{.*}}
-
-  // CHECK-FAST1: load <4 x float>, ptr
-  // CHECK-FAST1: load float, ptr
-  // CHECK-FAST1: fmul fast <4 x float> {{.*}}, {{.*}}
-  // CHECK-FAST1: load <4 x float>, ptr {{.*}}
-  // CHECK-FAST1: fadd fast <4 x float> {{.*}}, {{.*}}
+
+
+
+
+
 }
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
diff --git a/clang/test/CodeGen/fp16-ops-strictfp.c b/clang/test/CodeGen/fp16-ops-strictfp.c
index 830be6256456e..f9a061db987a6 100644
--- a/clang/test/CodeGen/fp16-ops-strictfp.c
+++ b/clang/test/CodeGen/fp16-ops-strictfp.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // REQUIRES: arm-registered-target
 // RUN: %clang_cc1 -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -emit-llvm -o - -triple arm-none-linux-gnueabi %s | FileCheck %s --check-prefix=NOTNATIVE --check-prefix=CHECK -vv -dump-input=fail
 // RUN: %clang_cc1 -ffp-exception-behavior=maytrap -emit-llvm -o - -triple aarch64 %s | FileCheck %s --check-prefix=NOTNATIVE --check-prefix=CHECK
@@ -22,717 +23,1323 @@ volatile float f0, f1, f2;
 volatile double d0;
 short s0;
 
+// NOTNATIVE-LABEL: define dso_local void @foo(
+// NOTNATIVE-SAME: ) #[[ATTR0:[0-9]+]] {
+// NOTNATIVE-NEXT:  [[ENTRY:.*:]]
+// NOTNATIVE-NEXT:    [[TMP0:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP0]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV1:%.*]] = call i32 @llvm.fptoui.i32.f32(float [[CONV]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV1]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP1:%.*]] = load volatile i32, ptr @test, align 4
+// NOTNATIVE-NEXT:    [[CONV2:%.*]] = call float @llvm.uitofp.f32.i32(i32 [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV3:%.*]] = call half @llvm.fptrunc.f16.f32(float [[CONV2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV3]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP2:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV4:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV4]], float 0.000000e+00, metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[LNOT:%.*]] = xor i1 [[TOBOOL]], true
+// NOTNATIVE-NEXT:    [[LNOT_EXT:%.*]] = zext i1 [[LNOT]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[LNOT_EXT]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP3:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV5:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP3]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[FNEG:%.*]] = fneg float [[CONV5]]
+// NOTNATIVE-NEXT:    [[CONV6:%.*]] = call half @llvm.fptrunc.f16.f32(float [[FNEG]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV6]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP4:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV7:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP4]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV8:%.*]] = call half @llvm.fptrunc.f16.f32(float [[CONV7]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV8]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP5:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[INCDEC_CONV:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP5]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[INC:%.*]] = call float @llvm.fadd.f32(float [[INCDEC_CONV]], float 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[INCDEC_CONV9:%.*]] = call half @llvm.fptrunc.f16.f32(float [[INC]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[INCDEC_CONV9]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP6:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[INCDEC_CONV10:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP6]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[INC11:%.*]] = call float @llvm.fadd.f32(float [[INCDEC_CONV10]], float 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[INCDEC_CONV12:%.*]] = call half @llvm.fptrunc.f16.f32(float [[INC11]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[INCDEC_CONV12]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP7:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[INCDEC_CONV13:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP7]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DEC:%.*]] = call float @llvm.fadd.f32(float [[INCDEC_CONV13]], float -1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[INCDEC_CONV14:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DEC]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[INCDEC_CONV14]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP8:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[INCDEC_CONV15:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP8]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DEC16:%.*]] = call float @llvm.fadd.f32(float [[INCDEC_CONV15]], float -1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[INCDEC_CONV17:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DEC16]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[INCDEC_CONV17]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP9:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV18:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP9]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP10:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV19:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP10]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL:%.*]] = call float @llvm.fmul.f32(float [[CONV18]], float [[CONV19]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV20:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV20]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP11:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV21:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP11]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV22:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV23:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV22]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL24:%.*]] = call float @llvm.fmul.f32(float [[CONV21]], float [[CONV23]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV25:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL24]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV25]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP12:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV26:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP12]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP13:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[MUL27:%.*]] = call float @llvm.fmul.f32(float [[CONV26]], float [[TMP13]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV28:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL27]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV28]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP14:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[TMP15:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV29:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP15]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL30:%.*]] = call float @llvm.fmul.f32(float [[TMP14]], float [[CONV29]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV31:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL30]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV31]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP16:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV32:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP16]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP17:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV33:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP17]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL34:%.*]] = call float @llvm.fmul.f32(float [[CONV32]], float [[CONV33]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV35:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL34]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV35]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP18:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV36:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP18]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP19:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV37:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP19]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV:%.*]] = call float @llvm.fdiv.f32(float [[CONV36]], float [[CONV37]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV38:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV38]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP20:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV39:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP20]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV40:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV41:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV40]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV42:%.*]] = call float @llvm.fdiv.f32(float [[CONV39]], float [[CONV41]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV43:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV42]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV43]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP21:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV44:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP21]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP22:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[DIV45:%.*]] = call float @llvm.fdiv.f32(float [[CONV44]], float [[TMP22]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV46:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV45]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV46]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP23:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[TMP24:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV47:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP24]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV48:%.*]] = call float @llvm.fdiv.f32(float [[TMP23]], float [[CONV47]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV49:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV48]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV49]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP25:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV50:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP25]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP26:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV51:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP26]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV52:%.*]] = call float @llvm.fdiv.f32(float [[CONV50]], float [[CONV51]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV53:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV52]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV53]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP27:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV54:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP27]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP28:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV55:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP28]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[CONV54]], float [[CONV55]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV56:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV56]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV57:%.*]] = call half @llvm.fptrunc.f16.f64(double -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV58:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV57]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP29:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV59:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP29]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD60:%.*]] = call float @llvm.fadd.f32(float [[CONV58]], float [[CONV59]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV61:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD60]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV61]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP30:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV62:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP30]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP31:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[ADD63:%.*]] = call float @llvm.fadd.f32(float [[CONV62]], float [[TMP31]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV64:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD63]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV64]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP32:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP33:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV65:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP33]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD66:%.*]] = call float @llvm.fadd.f32(float [[TMP32]], float [[CONV65]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV67:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD66]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV67]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP34:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV68:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP34]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP35:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV69:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP35]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD70:%.*]] = call float @llvm.fadd.f32(float [[CONV68]], float [[CONV69]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV71:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD70]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV71]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP36:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV72:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP36]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP37:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV73:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP37]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB:%.*]] = call float @llvm.fsub.f32(float [[CONV72]], float [[CONV73]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV74:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV74]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV75:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV76:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV75]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP38:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV77:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP38]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB78:%.*]] = call float @llvm.fsub.f32(float [[CONV76]], float [[CONV77]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV79:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB78]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV79]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP39:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV80:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP39]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP40:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[SUB81:%.*]] = call float @llvm.fsub.f32(float [[CONV80]], float [[TMP40]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV82:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB81]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV82]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP41:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP42:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV83:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP42]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB84:%.*]] = call float @llvm.fsub.f32(float [[TMP41]], float [[CONV83]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV85:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB84]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV85]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP43:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV86:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP43]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP44:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV87:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP44]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB88:%.*]] = call float @llvm.fsub.f32(float [[CONV86]], float [[CONV87]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV89:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB88]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV89]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP45:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV90:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP45]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP46:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV91:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP46]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV90]], float [[CONV91]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV92:%.*]] = zext i1 [[CMP]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV92]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP47:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV93:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP47]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV94:%.*]] = call half @llvm.fptrunc.f16.f64(double 4.200000e+01) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV95:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV94]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP96:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV93]], float [[CONV95]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV97:%.*]] = zext i1 [[CMP96]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV97]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP48:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV98:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP48]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP49:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[CMP99:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV98]], float [[TMP49]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV100:%.*]] = zext i1 [[CMP99]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV100]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP50:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP51:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV101:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP51]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP102:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP50]], float [[CONV101]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV103:%.*]] = zext i1 [[CMP102]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV103]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP52:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV104:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP52]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP53:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV105:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP53]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP106:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV104]], float [[CONV105]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV107:%.*]] = zext i1 [[CMP106]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV107]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP54:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV108:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP54]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP55:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV109:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP55]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP110:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV108]], float [[CONV109]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV111:%.*]] = zext i1 [[CMP110]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV111]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP56:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV112:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP56]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP57:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV113:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP57]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP114:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV112]], float [[CONV113]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV115:%.*]] = zext i1 [[CMP114]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV115]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[CONV116:%.*]] = call half @llvm.fptrunc.f16.f64(double 4.200000e+01) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV117:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV116]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP58:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV118:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP58]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP119:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV117]], float [[CONV118]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV120:%.*]] = zext i1 [[CMP119]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV120]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP59:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV121:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP59]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP60:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[CMP122:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV121]], float [[TMP60]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV123:%.*]] = zext i1 [[CMP122]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV123]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP61:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[TMP62:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV124:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP62]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP125:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP61]], float [[CONV124]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV126:%.*]] = zext i1 [[CMP125]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV126]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP63:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV127:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP63]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP64:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV128:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP64]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP129:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV127]], float [[CONV128]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV130:%.*]] = zext i1 [[CMP129]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV130]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP65:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV131:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP65]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP66:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV132:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP66]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP133:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV131]], float [[CONV132]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV134:%.*]] = zext i1 [[CMP133]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV134]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP67:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV135:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP67]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP68:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV136:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP68]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP137:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV135]], float [[CONV136]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV138:%.*]] = zext i1 [[CMP137]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV138]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP69:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV139:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP69]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV140:%.*]] = call half @llvm.fptrunc.f16.f64(double 4.200000e+01) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV141:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV140]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP142:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV139]], float [[CONV141]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV143:%.*]] = zext i1 [[CMP142]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV143]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP70:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV144:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP70]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP71:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[CMP145:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV144]], float [[TMP71]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV146:%.*]] = zext i1 [[CMP145]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV146]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP72:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP73:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV147:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP73]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP148:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP72]], float [[CONV147]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV149:%.*]] = zext i1 [[CMP148]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV149]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP74:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV150:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP74]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP75:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV151:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP75]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP152:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV150]], float [[CONV151]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV153:%.*]] = zext i1 [[CMP152]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV153]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP76:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV154:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP76]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP77:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV155:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP77]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP156:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV154]], float [[CONV155]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV157:%.*]] = zext i1 [[CMP156]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV157]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP78:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV158:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP78]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP79:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV159:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP79]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP160:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV158]], float [[CONV159]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV161:%.*]] = zext i1 [[CMP160]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV161]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP80:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV162:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP80]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV163:%.*]] = call half @llvm.fptrunc.f16.f64(double -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV164:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV163]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP165:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV162]], float [[CONV164]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV166:%.*]] = zext i1 [[CMP165]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV166]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP81:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV167:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP81]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP82:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[CMP168:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV167]], float [[TMP82]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV169:%.*]] = zext i1 [[CMP168]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV169]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP83:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[TMP84:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV170:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP84]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP171:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP83]], float [[CONV170]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV172:%.*]] = zext i1 [[CMP171]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV172]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP85:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV173:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP85]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP86:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV174:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP86]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP175:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV173]], float [[CONV174]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV176:%.*]] = zext i1 [[CMP175]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV176]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP87:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV177:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP87]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP88:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV178:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP88]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP179:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV177]], float [[CONV178]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV180:%.*]] = zext i1 [[CMP179]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV180]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP89:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV181:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP89]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP90:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV182:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP90]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP183:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV181]], float [[CONV182]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV184:%.*]] = zext i1 [[CMP183]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV184]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP91:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV185:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP91]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV186:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV187:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV186]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP188:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV185]], float [[CONV187]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV189:%.*]] = zext i1 [[CMP188]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV189]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP92:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV190:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP92]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP93:%.*]] = load volatile float, ptr @f1, align 4
+// NOTNATIVE-NEXT:    [[CMP191:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV190]], float [[TMP93]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV192:%.*]] = zext i1 [[CMP191]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV192]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP94:%.*]] = load volatile float, ptr @f1, align 4
+// NOTNATIVE-NEXT:    [[TMP95:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV193:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP95]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP194:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP94]], float [[CONV193]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV195:%.*]] = zext i1 [[CMP194]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV195]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP96:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV196:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP96]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP97:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV197:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP97]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP198:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV196]], float [[CONV197]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV199:%.*]] = zext i1 [[CMP198]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV199]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP98:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV200:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP98]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP99:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV201:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP99]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP202:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV200]], float [[CONV201]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV203:%.*]] = zext i1 [[CMP202]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV203]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP100:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV204:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP100]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP101:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV205:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP101]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP206:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV204]], float [[CONV205]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV207:%.*]] = zext i1 [[CMP206]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV207]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP102:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV208:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP102]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV209:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV210:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV209]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP211:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV208]], float [[CONV210]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV212:%.*]] = zext i1 [[CMP211]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV212]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP103:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV213:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP103]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP104:%.*]] = load volatile float, ptr @f1, align 4
+// NOTNATIVE-NEXT:    [[CMP214:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV213]], float [[TMP104]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV215:%.*]] = zext i1 [[CMP214]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV215]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP105:%.*]] = load volatile float, ptr @f1, align 4
+// NOTNATIVE-NEXT:    [[TMP106:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV216:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP106]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP217:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP105]], float [[CONV216]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV218:%.*]] = zext i1 [[CMP217]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV218]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP107:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV219:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP107]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP108:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV220:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP108]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP221:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV219]], float [[CONV220]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV222:%.*]] = zext i1 [[CMP221]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV222]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP109:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV223:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP109]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP110:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV224:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP110]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CMP225:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV223]], float [[CONV224]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV226:%.*]] = zext i1 [[CMP225]] to i32
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV226]], ptr @test, align 4
+// NOTNATIVE-NEXT:    [[TMP111:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV227:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP111]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TOBOOL228:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV227]], float 0.000000e+00, metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    br i1 [[TOBOOL228]], label %[[COND_TRUE:.*]], label %[[COND_FALSE:.*]]
+// NOTNATIVE:       [[COND_TRUE]]:
+// NOTNATIVE-NEXT:    [[TMP112:%.*]] = load volatile half, ptr @h2, align 2
+// NOTNATIVE-NEXT:    [[CONV229:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP112]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    br label %[[COND_END:.*]]
+// NOTNATIVE:       [[COND_FALSE]]:
+// NOTNATIVE-NEXT:    [[TMP113:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV230:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP113]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    br label %[[COND_END]]
+// NOTNATIVE:       [[COND_END]]:
+// NOTNATIVE-NEXT:    [[COND:%.*]] = phi float [ [[CONV229]], %[[COND_TRUE]] ], [ [[CONV230]], %[[COND_FALSE]] ]
+// NOTNATIVE-NEXT:    [[CONV231:%.*]] = call half @llvm.fptrunc.f16.f32(float [[COND]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV231]], ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[TMP114:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    store volatile half [[TMP114]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV232:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV232]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP115:%.*]] = load volatile float, ptr @f0, align 4
+// NOTNATIVE-NEXT:    [[CONV233:%.*]] = call half @llvm.fptrunc.f16.f32(float [[TMP115]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV233]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP116:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV234:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP116]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV235:%.*]] = call half @llvm.fptrunc.f16.f32(float [[CONV234]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV235]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP117:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV236:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP117]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV237:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[CONV236]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV237]], ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[TMP118:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV238:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP118]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP119:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV239:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP119]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD240:%.*]] = call float @llvm.fadd.f32(float [[CONV239]], float [[CONV238]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV241:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD240]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV241]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV242:%.*]] = call half @llvm.fptrunc.f16.f32(float 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV243:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV242]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP120:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV244:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP120]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD245:%.*]] = call float @llvm.fadd.f32(float [[CONV244]], float [[CONV243]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV246:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD245]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV246]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP121:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP122:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV247:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP122]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD248:%.*]] = call float @llvm.fadd.f32(float [[CONV247]], float [[TMP121]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV249:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD248]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV249]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP123:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV250:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP123]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP124:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV251:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP124]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD252:%.*]] = call float @llvm.fadd.f32(float [[CONV251]], float [[CONV250]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV253:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[ADD252]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV253]], ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[TMP125:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV254:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP125]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP126:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV255:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP126]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD256:%.*]] = call float @llvm.fadd.f32(float [[CONV255]], float [[CONV254]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV257:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD256]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV257]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP127:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV258:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP127]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP128:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV259:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP128]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB260:%.*]] = call float @llvm.fsub.f32(float [[CONV259]], float [[CONV258]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV261:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB260]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV261]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV262:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV263:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV262]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP129:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV264:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP129]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB265:%.*]] = call float @llvm.fsub.f32(float [[CONV264]], float [[CONV263]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV266:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB265]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV266]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP130:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP131:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV267:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP131]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB268:%.*]] = call float @llvm.fsub.f32(float [[CONV267]], float [[TMP130]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV269:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB268]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV269]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP132:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV270:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP132]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP133:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV271:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP133]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB272:%.*]] = call float @llvm.fsub.f32(float [[CONV271]], float [[CONV270]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV273:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[SUB272]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV273]], ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[TMP134:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV274:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP134]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP135:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV275:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP135]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[SUB276:%.*]] = call float @llvm.fsub.f32(float [[CONV275]], float [[CONV274]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV277:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB276]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV277]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP136:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV278:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP136]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP137:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV279:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP137]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL280:%.*]] = call float @llvm.fmul.f32(float [[CONV279]], float [[CONV278]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV281:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL280]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV281]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV282:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV283:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV282]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP138:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV284:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP138]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL285:%.*]] = call float @llvm.fmul.f32(float [[CONV284]], float [[CONV283]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV286:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL285]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV286]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP139:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP140:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV287:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP140]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL288:%.*]] = call float @llvm.fmul.f32(float [[CONV287]], float [[TMP139]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV289:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL288]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV289]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP141:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV290:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP141]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP142:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV291:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP142]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL292:%.*]] = call float @llvm.fmul.f32(float [[CONV291]], float [[CONV290]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV293:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[MUL292]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV293]], ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[TMP143:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV294:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP143]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP144:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV295:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP144]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[MUL296:%.*]] = call float @llvm.fmul.f32(float [[CONV295]], float [[CONV294]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV297:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL296]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV297]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP145:%.*]] = load volatile half, ptr @h1, align 2
+// NOTNATIVE-NEXT:    [[CONV298:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP145]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP146:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV299:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP146]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV300:%.*]] = call float @llvm.fdiv.f32(float [[CONV299]], float [[CONV298]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV301:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV300]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV301]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV302:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV303:%.*]] = call float @llvm.fpext.f32.f16(half [[CONV302]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP147:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV304:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP147]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV305:%.*]] = call float @llvm.fdiv.f32(float [[CONV304]], float [[CONV303]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV306:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV305]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV306]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP148:%.*]] = load volatile float, ptr @f2, align 4
+// NOTNATIVE-NEXT:    [[TMP149:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV307:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP149]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV308:%.*]] = call float @llvm.fdiv.f32(float [[CONV307]], float [[TMP148]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV309:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV308]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV309]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP150:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV310:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP150]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP151:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV311:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP151]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV312:%.*]] = call float @llvm.fdiv.f32(float [[CONV311]], float [[CONV310]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV313:%.*]] = call i32 @llvm.fptosi.i32.f32(float [[DIV312]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile i32 [[CONV313]], ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[TMP152:%.*]] = load volatile i32, ptr @i0, align 4
+// NOTNATIVE-NEXT:    [[CONV314:%.*]] = call float @llvm.sitofp.f32.i32(i32 [[TMP152]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP153:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV315:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP153]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[DIV316:%.*]] = call float @llvm.fdiv.f32(float [[CONV315]], float [[CONV314]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV317:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV316]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV317]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP154:%.*]] = load volatile double, ptr @d0, align 8
+// NOTNATIVE-NEXT:    [[CONV318:%.*]] = call half @llvm.fptrunc.f16.f64(double [[TMP154]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV318]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP155:%.*]] = load volatile double, ptr @d0, align 8
+// NOTNATIVE-NEXT:    [[CONV319:%.*]] = call float @llvm.fptrunc.f32.f64(double [[TMP155]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV320:%.*]] = call half @llvm.fptrunc.f16.f32(float [[CONV319]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV320]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[TMP156:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV321:%.*]] = call double @llvm.fpext.f64.f16(half [[TMP156]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile double [[CONV321]], ptr @d0, align 8
+// NOTNATIVE-NEXT:    [[TMP157:%.*]] = load volatile half, ptr @h0, align 2
+// NOTNATIVE-NEXT:    [[CONV322:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP157]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV323:%.*]] = call double @llvm.fpext.f64.f32(float [[CONV322]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile double [[CONV323]], ptr @d0, align 8
+// NOTNATIVE-NEXT:    [[TMP158:%.*]] = load i16, ptr @s0, align 2
+// NOTNATIVE-NEXT:    [[CONV324:%.*]] = call float @llvm.sitofp.f32.i16(i16 [[TMP158]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV325:%.*]] = call half @llvm.fptrunc.f16.f32(float [[CONV324]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store volatile half [[CONV325]], ptr @h0, align 2
+// NOTNATIVE-NEXT:    ret void
+//
+// NATIVE-HALF-LABEL: define dso_local void @foo(
+// NATIVE-HALF-SAME: ) #[[ATTR0:[0-9]+]] {
+// NATIVE-HALF-NEXT:  [[ENTRY:.*:]]
+// NATIVE-HALF-NEXT:    [[TMP0:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV:%.*]] = call i32 @llvm.fptoui.i32.f16(half [[TMP0]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP1:%.*]] = load volatile i32, ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[CONV1:%.*]] = call half @llvm.uitofp.f16.i32(i32 [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV1]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP2:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TOBOOL:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP2]], half 0xH0000, metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[LNOT:%.*]] = xor i1 [[TOBOOL]], true
+// NATIVE-HALF-NEXT:    [[LNOT_EXT:%.*]] = zext i1 [[LNOT]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[LNOT_EXT]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP3:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[FNEG:%.*]] = fneg half [[TMP3]]
+// NATIVE-HALF-NEXT:    store volatile half [[FNEG]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP4:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    store volatile half [[TMP4]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP5:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[INC:%.*]] = call half @llvm.fadd.f16(half [[TMP5]], half 0xH3C00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[INC]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP6:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[INC2:%.*]] = call half @llvm.fadd.f16(half [[TMP6]], half 0xH3C00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[INC2]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP7:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[DEC:%.*]] = call half @llvm.fadd.f16(half [[TMP7]], half 0xHBC00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DEC]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP8:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[DEC3:%.*]] = call half @llvm.fadd.f16(half [[TMP8]], half 0xHBC00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DEC3]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP9:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP10:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[MUL:%.*]] = call half @llvm.fmul.f16(half [[TMP9]], half [[TMP10]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[MUL]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP11:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV4:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[MUL5:%.*]] = call half @llvm.fmul.f16(half [[TMP11]], half [[CONV4]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[MUL5]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP12:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV6:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP12]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP13:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[MUL7:%.*]] = call float @llvm.fmul.f32(float [[CONV6]], float [[TMP13]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV8:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL7]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV8]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP14:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[TMP15:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV9:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP15]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[MUL10:%.*]] = call float @llvm.fmul.f32(float [[TMP14]], float [[CONV9]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV11:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL10]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV11]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP16:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP17:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV12:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP17]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[MUL13:%.*]] = call half @llvm.fmul.f16(half [[TMP16]], half [[CONV12]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[MUL13]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP18:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP19:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[DIV:%.*]] = call half @llvm.fdiv.f16(half [[TMP18]], half [[TMP19]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DIV]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP20:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV14:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[DIV15:%.*]] = call half @llvm.fdiv.f16(half [[TMP20]], half [[CONV14]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DIV15]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP21:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV16:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP21]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP22:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[DIV17:%.*]] = call float @llvm.fdiv.f32(float [[CONV16]], float [[TMP22]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV18:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV17]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV18]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP23:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[TMP24:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV19:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP24]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[DIV20:%.*]] = call float @llvm.fdiv.f32(float [[TMP23]], float [[CONV19]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV21:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV20]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV21]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP25:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP26:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV22:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP26]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[DIV23:%.*]] = call half @llvm.fdiv.f16(half [[TMP25]], half [[CONV22]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DIV23]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP27:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[TMP28:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[ADD:%.*]] = call half @llvm.fadd.f16(half [[TMP27]], half [[TMP28]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[ADD]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV24:%.*]] = call half @llvm.fptrunc.f16.f64(double -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP29:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[ADD25:%.*]] = call half @llvm.fadd.f16(half [[CONV24]], half [[TMP29]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[ADD25]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP30:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV26:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP30]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP31:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[ADD27:%.*]] = call float @llvm.fadd.f32(float [[CONV26]], float [[TMP31]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV28:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD27]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV28]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP32:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP33:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV29:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP33]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[ADD30:%.*]] = call float @llvm.fadd.f32(float [[TMP32]], float [[CONV29]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV31:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD30]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV31]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP34:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP35:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV32:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP35]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[ADD33:%.*]] = call half @llvm.fadd.f16(half [[TMP34]], half [[CONV32]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[ADD33]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP36:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[TMP37:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[SUB:%.*]] = call half @llvm.fsub.f16(half [[TMP36]], half [[TMP37]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[SUB]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV34:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP38:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[SUB35:%.*]] = call half @llvm.fsub.f16(half [[CONV34]], half [[TMP38]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[SUB35]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP39:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV36:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP39]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP40:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[SUB37:%.*]] = call float @llvm.fsub.f32(float [[CONV36]], float [[TMP40]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV38:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB37]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV38]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP41:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP42:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV39:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP42]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[SUB40:%.*]] = call float @llvm.fsub.f32(float [[TMP41]], float [[CONV39]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV41:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB40]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV41]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP43:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP44:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV42:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP44]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[SUB43:%.*]] = call half @llvm.fsub.f16(half [[TMP43]], half [[CONV42]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[SUB43]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP45:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[TMP46:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP45]], half [[TMP46]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV44:%.*]] = zext i1 [[CMP]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV44]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP47:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV45:%.*]] = call half @llvm.fptrunc.f16.f64(double 4.200000e+01) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP46:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP47]], half [[CONV45]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV47:%.*]] = zext i1 [[CMP46]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV47]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP48:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV48:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP48]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP49:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[CMP49:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV48]], float [[TMP49]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV50:%.*]] = zext i1 [[CMP49]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV50]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP50:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP51:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV51:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP51]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP52:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP50]], float [[CONV51]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV53:%.*]] = zext i1 [[CMP52]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV53]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP52:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV54:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP52]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP53:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP55:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV54]], half [[TMP53]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV56:%.*]] = zext i1 [[CMP55]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV56]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP54:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP55:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV57:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP55]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP58:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP54]], half [[CONV57]], metadata !"olt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV59:%.*]] = zext i1 [[CMP58]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV59]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP56:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP57:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CMP60:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP56]], half [[TMP57]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV61:%.*]] = zext i1 [[CMP60]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV61]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[CONV62:%.*]] = call half @llvm.fptrunc.f16.f64(double 4.200000e+01) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP58:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CMP63:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV62]], half [[TMP58]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV64:%.*]] = zext i1 [[CMP63]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV64]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP59:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV65:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP59]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP60:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[CMP66:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV65]], float [[TMP60]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV67:%.*]] = zext i1 [[CMP66]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV67]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP61:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[TMP62:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV68:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP62]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP69:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP61]], float [[CONV68]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV70:%.*]] = zext i1 [[CMP69]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV70]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP63:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV71:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP63]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP64:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP72:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV71]], half [[TMP64]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV73:%.*]] = zext i1 [[CMP72]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV73]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP65:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP66:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV74:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP66]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP75:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP65]], half [[CONV74]], metadata !"ogt") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV76:%.*]] = zext i1 [[CMP75]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV76]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP67:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[TMP68:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP77:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP67]], half [[TMP68]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV78:%.*]] = zext i1 [[CMP77]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV78]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP69:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV79:%.*]] = call half @llvm.fptrunc.f16.f64(double 4.200000e+01) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP80:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP69]], half [[CONV79]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV81:%.*]] = zext i1 [[CMP80]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV81]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP70:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV82:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP70]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP71:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[CMP83:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV82]], float [[TMP71]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV84:%.*]] = zext i1 [[CMP83]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV84]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP72:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP73:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV85:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP73]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP86:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP72]], float [[CONV85]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV87:%.*]] = zext i1 [[CMP86]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV87]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP74:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV88:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP74]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP75:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP89:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV88]], half [[TMP75]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV90:%.*]] = zext i1 [[CMP89]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV90]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP76:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP77:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV91:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP77]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP92:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP76]], half [[CONV91]], metadata !"ole") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV93:%.*]] = zext i1 [[CMP92]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV93]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP78:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP79:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CMP94:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP78]], half [[TMP79]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV95:%.*]] = zext i1 [[CMP94]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV95]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP80:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV96:%.*]] = call half @llvm.fptrunc.f16.f64(double -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP97:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP80]], half [[CONV96]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV98:%.*]] = zext i1 [[CMP97]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV98]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP81:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV99:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP81]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP82:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[CMP100:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV99]], float [[TMP82]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV101:%.*]] = zext i1 [[CMP100]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV101]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP83:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[TMP84:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CONV102:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP84]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP103:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP83]], float [[CONV102]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV104:%.*]] = zext i1 [[CMP103]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV104]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP85:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV105:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP85]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP86:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP106:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV105]], half [[TMP86]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV107:%.*]] = zext i1 [[CMP106]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV107]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP87:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP88:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV108:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP88]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP109:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP87]], half [[CONV108]], metadata !"oge") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV110:%.*]] = zext i1 [[CMP109]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV110]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP89:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP90:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CMP111:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP89]], half [[TMP90]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV112:%.*]] = zext i1 [[CMP111]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV112]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP91:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV113:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP114:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP91]], half [[CONV113]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV115:%.*]] = zext i1 [[CMP114]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV115]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP92:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV116:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP92]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP93:%.*]] = load volatile float, ptr @f1, align 4
+// NATIVE-HALF-NEXT:    [[CMP117:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV116]], float [[TMP93]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV118:%.*]] = zext i1 [[CMP117]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV118]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP94:%.*]] = load volatile float, ptr @f1, align 4
+// NATIVE-HALF-NEXT:    [[TMP95:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV119:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP95]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP120:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP94]], float [[CONV119]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV121:%.*]] = zext i1 [[CMP120]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV121]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP96:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV122:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP96]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP97:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP123:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV122]], half [[TMP97]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV124:%.*]] = zext i1 [[CMP123]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV124]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP98:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP99:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV125:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP99]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP126:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP98]], half [[CONV125]], metadata !"oeq") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV127:%.*]] = zext i1 [[CMP126]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV127]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP100:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP101:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    [[CMP128:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP100]], half [[TMP101]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV129:%.*]] = zext i1 [[CMP128]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV129]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP102:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV130:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP131:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP102]], half [[CONV130]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV132:%.*]] = zext i1 [[CMP131]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV132]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP103:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV133:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP103]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP104:%.*]] = load volatile float, ptr @f1, align 4
+// NATIVE-HALF-NEXT:    [[CMP134:%.*]] = call i1 @llvm.fcmp.f32(float [[CONV133]], float [[TMP104]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV135:%.*]] = zext i1 [[CMP134]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV135]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP105:%.*]] = load volatile float, ptr @f1, align 4
+// NATIVE-HALF-NEXT:    [[TMP106:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[CONV136:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP106]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP137:%.*]] = call i1 @llvm.fcmp.f32(float [[TMP105]], float [[CONV136]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV138:%.*]] = zext i1 [[CMP137]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV138]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP107:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV139:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP107]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP108:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CMP140:%.*]] = call i1 @llvm.fcmp.f16(half [[CONV139]], half [[TMP108]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV141:%.*]] = zext i1 [[CMP140]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV141]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP109:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP110:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV142:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP110]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CMP143:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP109]], half [[CONV142]], metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV144:%.*]] = zext i1 [[CMP143]] to i32
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV144]], ptr @test, align 4
+// NATIVE-HALF-NEXT:    [[TMP111:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TOBOOL145:%.*]] = call i1 @llvm.fcmp.f16(half [[TMP111]], half 0xH0000, metadata !"une") #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    br i1 [[TOBOOL145]], label %[[COND_TRUE:.*]], label %[[COND_FALSE:.*]]
+// NATIVE-HALF:       [[COND_TRUE]]:
+// NATIVE-HALF-NEXT:    [[TMP112:%.*]] = load volatile half, ptr @h2, align 2
+// NATIVE-HALF-NEXT:    br label %[[COND_END:.*]]
+// NATIVE-HALF:       [[COND_FALSE]]:
+// NATIVE-HALF-NEXT:    [[TMP113:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    br label %[[COND_END]]
+// NATIVE-HALF:       [[COND_END]]:
+// NATIVE-HALF-NEXT:    [[COND:%.*]] = phi half [ [[TMP112]], %[[COND_TRUE]] ], [ [[TMP113]], %[[COND_FALSE]] ]
+// NATIVE-HALF-NEXT:    store volatile half [[COND]], ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP114:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    store volatile half [[TMP114]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV146:%.*]] = call half @llvm.fptrunc.f16.f32(float -2.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV146]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP115:%.*]] = load volatile float, ptr @f0, align 4
+// NATIVE-HALF-NEXT:    [[CONV147:%.*]] = call half @llvm.fptrunc.f16.f32(float [[TMP115]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV147]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP116:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV148:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP116]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV148]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP117:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV149:%.*]] = call i32 @llvm.fptosi.i32.f16(half [[TMP117]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV149]], ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[TMP118:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP119:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[ADD150:%.*]] = call half @llvm.fadd.f16(half [[TMP119]], half [[TMP118]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[ADD150]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV151:%.*]] = call half @llvm.fptrunc.f16.f32(float 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP120:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[ADD152:%.*]] = call half @llvm.fadd.f16(half [[TMP120]], half [[CONV151]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[ADD152]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP121:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP122:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV153:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP122]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[ADD154:%.*]] = call float @llvm.fadd.f32(float [[CONV153]], float [[TMP121]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV155:%.*]] = call half @llvm.fptrunc.f16.f32(float [[ADD154]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV155]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP123:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP124:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV156:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP124]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[ADD157:%.*]] = call half @llvm.fadd.f16(half [[CONV156]], half [[TMP123]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV158:%.*]] = call i32 @llvm.fptosi.i32.f16(half [[ADD157]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV158]], ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[TMP125:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV159:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP125]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP126:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[ADD160:%.*]] = call half @llvm.fadd.f16(half [[TMP126]], half [[CONV159]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[ADD160]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP127:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP128:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[SUB161:%.*]] = call half @llvm.fsub.f16(half [[TMP128]], half [[TMP127]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[SUB161]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV162:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP129:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[SUB163:%.*]] = call half @llvm.fsub.f16(half [[TMP129]], half [[CONV162]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[SUB163]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP130:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP131:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV164:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP131]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[SUB165:%.*]] = call float @llvm.fsub.f32(float [[CONV164]], float [[TMP130]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV166:%.*]] = call half @llvm.fptrunc.f16.f32(float [[SUB165]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV166]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP132:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP133:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV167:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP133]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[SUB168:%.*]] = call half @llvm.fsub.f16(half [[CONV167]], half [[TMP132]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV169:%.*]] = call i32 @llvm.fptosi.i32.f16(half [[SUB168]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV169]], ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[TMP134:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV170:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP134]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP135:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[SUB171:%.*]] = call half @llvm.fsub.f16(half [[TMP135]], half [[CONV170]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[SUB171]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP136:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP137:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[MUL172:%.*]] = call half @llvm.fmul.f16(half [[TMP137]], half [[TMP136]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[MUL172]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV173:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP138:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[MUL174:%.*]] = call half @llvm.fmul.f16(half [[TMP138]], half [[CONV173]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[MUL174]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP139:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP140:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV175:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP140]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[MUL176:%.*]] = call float @llvm.fmul.f32(float [[CONV175]], float [[TMP139]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV177:%.*]] = call half @llvm.fptrunc.f16.f32(float [[MUL176]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV177]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP141:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP142:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV178:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP142]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[MUL179:%.*]] = call half @llvm.fmul.f16(half [[CONV178]], half [[TMP141]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV180:%.*]] = call i32 @llvm.fptosi.i32.f16(half [[MUL179]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV180]], ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[TMP143:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV181:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP143]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP144:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[MUL182:%.*]] = call half @llvm.fmul.f16(half [[TMP144]], half [[CONV181]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[MUL182]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP145:%.*]] = load volatile half, ptr @h1, align 2
+// NATIVE-HALF-NEXT:    [[TMP146:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[DIV183:%.*]] = call half @llvm.fdiv.f16(half [[TMP146]], half [[TMP145]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DIV183]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV184:%.*]] = call half @llvm.fptrunc.f16.f64(double 1.000000e+00) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP147:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[DIV185:%.*]] = call half @llvm.fdiv.f16(half [[TMP147]], half [[CONV184]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DIV185]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP148:%.*]] = load volatile float, ptr @f2, align 4
+// NATIVE-HALF-NEXT:    [[TMP149:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV186:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP149]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[DIV187:%.*]] = call float @llvm.fdiv.f32(float [[CONV186]], float [[TMP148]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV188:%.*]] = call half @llvm.fptrunc.f16.f32(float [[DIV187]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV188]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP150:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP151:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV189:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP151]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[DIV190:%.*]] = call half @llvm.fdiv.f16(half [[CONV189]], half [[TMP150]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV191:%.*]] = call i32 @llvm.fptosi.i32.f16(half [[DIV190]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile i32 [[CONV191]], ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[TMP152:%.*]] = load volatile i32, ptr @i0, align 4
+// NATIVE-HALF-NEXT:    [[CONV192:%.*]] = call half @llvm.sitofp.f16.i32(i32 [[TMP152]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[TMP153:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[DIV193:%.*]] = call half @llvm.fdiv.f16(half [[TMP153]], half [[CONV192]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[DIV193]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP154:%.*]] = load volatile double, ptr @d0, align 8
+// NATIVE-HALF-NEXT:    [[CONV194:%.*]] = call half @llvm.fptrunc.f16.f64(double [[TMP154]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV194]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP155:%.*]] = load volatile double, ptr @d0, align 8
+// NATIVE-HALF-NEXT:    [[CONV195:%.*]] = call float @llvm.fptrunc.f32.f64(double [[TMP155]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV196:%.*]] = call half @llvm.fptrunc.f16.f32(float [[CONV195]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV196]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[TMP156:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV197:%.*]] = call double @llvm.fpext.f64.f16(half [[TMP156]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile double [[CONV197]], ptr @d0, align 8
+// NATIVE-HALF-NEXT:    [[TMP157:%.*]] = load volatile half, ptr @h0, align 2
+// NATIVE-HALF-NEXT:    [[CONV198:%.*]] = call float @llvm.fpext.f32.f16(half [[TMP157]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    [[CONV199:%.*]] = call double @llvm.fpext.f64.f32(float [[CONV198]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile double [[CONV199]], ptr @d0, align 8
+// NATIVE-HALF-NEXT:    [[TMP158:%.*]] = load i16, ptr @s0, align 2
+// NATIVE-HALF-NEXT:    [[CONV200:%.*]] = call half @llvm.sitofp.f16.i16(i16 [[TMP158]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store volatile half [[CONV200]], ptr @h0, align 2
+// NATIVE-HALF-NEXT:    ret void
+//
 void foo(void) {
-  // CHECK-LABEL: define{{.*}} void @foo()
 
   // Check unary ops
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i32 @llvm.experimental.constrained.fptoui.i32.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i32 @llvm.experimental.constrained.fptoui.i32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.uitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = (test);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half 0xH0000, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float 0.000000e+00, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (!h1);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: fneg float
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: fneg half
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = -h1;
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: load volatile half
-  // NATIVE-HALF-NEXT: store volatile half
-  // NOTNATIVE: store {{.*}} half {{.*}}, ptr
   h1 = +h1;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half 0xH3C00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1++;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half 0xH3C00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   ++h1;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half 0xHBC00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   --h1;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half 0xHBC00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1--;
 
   // Check binary ops with various operands
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = h0 * h2;
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float -2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = h0 * (__fp16) -2.0f;
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = h0 * f2;
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = f0 * h2;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = h0 * i0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h0 / h2);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float -2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h0 / (__fp16) -2.0f);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h0 / f2);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (f0 / h2);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h0 / i0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h2 + h0);
 
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f64(double -2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = ((__fp16)-2.0 + h0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h2 + f0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (f2 + h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h0 + i0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h2 - h0);
 
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float -2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = ((__fp16)-2.0f - h0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h2 - f0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (f2 - h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h0 - i0);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h2 < h0);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 4.200000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 4.200000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h2 < (__fp16)42.0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h2 < f0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (f2 < h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (i0 < h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"olt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 < i0);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 > h2);
 
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 4.200000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 4.200000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = ((__fp16)42.0 > h2);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 > f2);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (f0 > h2);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (i0 > h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ogt", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 > i0);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h2 <= h0);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 4.200000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 4.200000e+01, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h2 <= (__fp16)42.0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h2 <= f0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (f2 <= h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (i0 <= h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"ole", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 <= i0);
 
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 >= h2);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f64(double -2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 >= (__fp16)-2.0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 >= f2);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (f0 >= h2);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (i0 >= h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmps.f16(half %{{.*}}, half %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmps.f32(float %{{.*}}, float %{{.*}}, metadata !"oge", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 >= i0);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h1 == h2);
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h1 == (__fp16)1.0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h1 == f1);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (f1 == h1);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (i0 == h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"oeq", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 == i0);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h1 != h2);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h1 != (__fp16)1.0);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h1 != f1);
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (f1 != h1);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (i0 != h0);
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float %{{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   test = (h0 != i0);
 
-  // NATIVE-HALF: call i1 @llvm.experimental.constrained.fcmp.f16(half %{{.*}}, half 0xH0000, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i1 @llvm.experimental.constrained.fcmp.f32(float %{{.*}}, float {{.*}}, metadata !"une", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h1 = (h1 ? h2 : h0);
 
   // Check assignments (inc. compound)
-  // CHECK: store {{.*}} half {{.*}}, ptr
   // xATIVE-HALF: store {{.*}} half 0xHC000 // FIXME: We should be folding here.
   h0 = h1;
 
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float -2.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = (__fp16)-2.0f;
 
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = f0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = i0;
 
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i32 @llvm.experimental.constrained.fptosi.i32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   i0 = h0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 += h1;
 
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f32(float 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 += (__fp16)1.0f;
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 += f2;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i32 @llvm.experimental.constrained.fptosi.i32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   i0 += h0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fadd.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fadd.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 += i0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 -= h1;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 -= (__fp16)1.0;
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 -= f2;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i32 @llvm.experimental.constrained.fptosi.i32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   i0 -= h0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fsub.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fsub.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 -= i0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 *= h1;
 
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 *= (__fp16)1.0;
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 *= f2;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i32 @llvm.experimental.constrained.fptosi.i32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   i0 *= h0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fmul.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fmul.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 *= i0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 /= h1;
 
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fptrunc.f16.f64(double 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 /= (__fp16)1.0;
 
-  // CHECK: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 /= f2;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call i32 @llvm.experimental.constrained.fptosi.i32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call i32 @llvm.experimental.constrained.fptosi.i32.f32(float %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} i32 {{.*}}, ptr
   i0 /= h0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NATIVE-HALF: call half @llvm.experimental.constrained.fdiv.f16(half %{{.*}}, half %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.fdiv.f32(float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 /= i0;
 
   // Check conversions to/from double
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = d0;
 
-  // CHECK: call float @llvm.experimental.constrained.fptrunc.f32.f64(double %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = (float)d0;
 
-  // CHECK: call double @llvm.experimental.constrained.fpext.f64.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} double {{.*}}, ptr
   d0 = h0;
 
-  // CHECK: [[MID:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half %{{.*}}, metadata !"fpexcept.strict")
-  // CHECK: call double @llvm.experimental.constrained.fpext.f64.f32(float [[MID]], metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} double {{.*}}, ptr
   d0 = (float)h0;
 
-  // NATIVE-HALF: call half @llvm.experimental.constrained.sitofp.f16.i16(i16 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call float @llvm.experimental.constrained.sitofp.f32.i16(i16 %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // NOTNATIVE: call half @llvm.experimental.constrained.fptrunc.f16.f32(float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-  // CHECK: store {{.*}} half {{.*}}, ptr
   h0 = s0;
 }
 
-// CHECK-LABEL: define{{.*}} void @testTypeDef(
-// NATIVE-HALF: call <4 x half> @llvm.experimental.constrained.fadd.v4f16(<4 x half> %{{.*}}, <4 x half> %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// NOTNATIVE: %[[CONV:.*]] = call <4 x float> @llvm.experimental.constrained.fpext.v4f32.v4f16(<4 x half> %{{.*}}, metadata !"fpexcept.strict")
-// NOTNATIVE: %[[CONV1:.*]] = call <4 x float> @llvm.experimental.constrained.fpext.v4f32.v4f16(<4 x half> %{{.*}}, metadata !"fpexcept.strict")
-// NOTNATIVE: %[[ADD:.*]] = call <4 x float> @llvm.experimental.constrained.fadd.v4f32(<4 x float> %conv, <4 x float> %conv1, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// NOTNATIVE: call <4 x half> @llvm.experimental.constrained.fptrunc.v4f16.v4f32(<4 x float> %add, metadata !"round.tonearest", metadata !"fpexcept.strict")
 
+// NOTNATIVE-LABEL: define dso_local void @testTypeDef(
+// NOTNATIVE-SAME: ) #[[ATTR0]] {
+// NOTNATIVE-NEXT:  [[ENTRY:.*:]]
+// NOTNATIVE-NEXT:    [[T0:%.*]] = alloca <4 x half>, align 8
+// NOTNATIVE-NEXT:    [[T1:%.*]] = alloca <4 x half>, align 8
+// NOTNATIVE-NEXT:    [[TMP0:%.*]] = load <4 x half>, ptr [[T0]], align 8
+// NOTNATIVE-NEXT:    [[CONV:%.*]] = call <4 x float> @llvm.fpext.v4f32.v4f16(<4 x half> [[TMP0]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[TMP1:%.*]] = load <4 x half>, ptr [[T1]], align 8
+// NOTNATIVE-NEXT:    [[CONV1:%.*]] = call <4 x float> @llvm.fpext.v4f32.v4f16(<4 x half> [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[ADD:%.*]] = call <4 x float> @llvm.fadd.v4f32(<4 x float> [[CONV]], <4 x float> [[CONV1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    [[CONV2:%.*]] = call <4 x half> @llvm.fptrunc.v4f16.v4f32(<4 x float> [[ADD]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NOTNATIVE-NEXT:    store <4 x half> [[CONV2]], ptr [[T1]], align 8
+// NOTNATIVE-NEXT:    ret void
+//
+// NATIVE-HALF-LABEL: define dso_local void @testTypeDef(
+// NATIVE-HALF-SAME: ) #[[ATTR0]] {
+// NATIVE-HALF-NEXT:  [[ENTRY:.*:]]
+// NATIVE-HALF-NEXT:    [[T0:%.*]] = alloca <4 x half>, align 8
+// NATIVE-HALF-NEXT:    [[T1:%.*]] = alloca <4 x half>, align 8
+// NATIVE-HALF-NEXT:    [[TMP0:%.*]] = load <4 x half>, ptr [[T0]], align 8
+// NATIVE-HALF-NEXT:    [[TMP1:%.*]] = load <4 x half>, ptr [[T1]], align 8
+// NATIVE-HALF-NEXT:    [[ADD:%.*]] = call <4 x half> @llvm.fadd.v4f16(<4 x half> [[TMP0]], <4 x half> [[TMP1]]) #[[ATTR2]] [ "fp.control"(metadata !"rte") ]
+// NATIVE-HALF-NEXT:    store <4 x half> [[ADD]], ptr [[T1]], align 8
+// NATIVE-HALF-NEXT:    ret void
+//
 void testTypeDef() {
   __fp16 t0 __attribute__((vector_size(8)));
   float16_t t1 __attribute__((vector_size(8)));
   t1 = t0 + t1;
 }
 
+//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+// CHECK: {{.*}}
diff --git a/clang/test/CodeGen/pragma-fp-exc.cpp b/clang/test/CodeGen/pragma-fp-exc.cpp
index ff47173739dbb..585d63ec2eafa 100644
--- a/clang/test/CodeGen/pragma-fp-exc.cpp
+++ b/clang/test/CodeGen/pragma-fp-exc.cpp
@@ -1,8 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-DEF %s
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -ffp-exception-behavior=strict -emit-llvm -o - %s | FileCheck --check-prefix=CHECK-STRICT %s
 
 // REQUIRES: x86-registered-target
 
+// CHECK-DEF-LABEL: define dso_local noundef float @_Z7func_01fff(
+// CHECK-DEF-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-DEF-NEXT:  [[ENTRY:.*:]]
+// CHECK-DEF-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEF-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEF-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-DEF-NEXT:    [[RES:%.*]] = alloca float, align 4
+// CHECK-DEF-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-DEF-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-DEF-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-DEF-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-DEF-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-DEF-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
+// CHECK-DEF-NEXT:    store float [[ADD]], ptr [[RES]], align 4
+// CHECK-DEF-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-DEF-NEXT:    [[TMP3:%.*]] = load float, ptr [[RES]], align 4
+// CHECK-DEF-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float [[TMP3]], float [[TMP2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CHECK-DEF-NEXT:    store float [[ADD1]], ptr [[RES]], align 4
+// CHECK-DEF-NEXT:    [[TMP4:%.*]] = load float, ptr [[RES]], align 4
+// CHECK-DEF-NEXT:    ret float [[TMP4]]
+//
+// CHECK-STRICT-LABEL: define dso_local noundef float @_Z7func_01fff(
+// CHECK-STRICT-SAME: float noundef [[X:%.*]], float noundef [[Y:%.*]], float noundef [[Z:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-STRICT-NEXT:  [[ENTRY:.*:]]
+// CHECK-STRICT-NEXT:    [[X_ADDR:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[Y_ADDR:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[Z_ADDR:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    [[RES:%.*]] = alloca float, align 4
+// CHECK-STRICT-NEXT:    store float [[X]], ptr [[X_ADDR]], align 4
+// CHECK-STRICT-NEXT:    store float [[Y]], ptr [[Y_ADDR]], align 4
+// CHECK-STRICT-NEXT:    store float [[Z]], ptr [[Z_ADDR]], align 4
+// CHECK-STRICT-NEXT:    [[TMP0:%.*]] = load float, ptr [[X_ADDR]], align 4
+// CHECK-STRICT-NEXT:    [[TMP1:%.*]] = load float, ptr [[Y_ADDR]], align 4
+// CHECK-STRICT-NEXT:    [[ADD:%.*]] = call float @llvm.fadd.f32(float [[TMP0]], float [[TMP1]]) #[[ATTR2:[0-9]+]] [ "fp.control"(metadata !"rte") ]
+// CHECK-STRICT-NEXT:    store float [[ADD]], ptr [[RES]], align 4
+// CHECK-STRICT-NEXT:    [[TMP2:%.*]] = load float, ptr [[Z_ADDR]], align 4
+// CHECK-STRICT-NEXT:    [[TMP3:%.*]] = load float, ptr [[RES]], align 4
+// CHECK-STRICT-NEXT:    [[ADD1:%.*]] = call float @llvm.fadd.f32(float [[TMP3]], float [[TMP2]]) #[[ATTR2]] [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"maytrap") ]
+// CHECK-STRICT-NEXT:    store float [[ADD1]], ptr [[RES]], align 4
+// CHECK-STRICT-NEXT:    [[TMP4:%.*]] = load float, ptr [[RES]], align 4
+// CHECK-STRICT-NEXT:    ret float [[TMP4]]
+//
 float func_01(float x, float y, float z) {
   float res = x + y;
   {
@@ -11,10 +54,4 @@ float func_01(float x, float y, float z) {
   }
   return res;
 }
-// CHECK-DEF-LABEL: @_Z7func_01fff
-// CHECK-DEF:       call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.ignore")
-// CHECK-DEF:       call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.maytrap")
 
-// CHECK-STRICT-LABEL: @_Z7func_01fff
-// CHECK-STRICT:       call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
-// CHECK-STRICT:       call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.maytrap")