[clang] [llvm] [Hexagon] Add XQFloat code generation and post-RA QFP handling (PR #198902)
Fateme Hosseini via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 9 08:22:24 PDT 2026
https://github.com/fhossein-quic updated https://github.com/llvm/llvm-project/pull/198902
>From c354e66b0ece225f2991acc10b5a16fca4720469 Mon Sep 17 00:00:00 2001
From: Fateme Hosseini <fhossein at qti.qualcomm.com>
Date: Tue, 24 Mar 2026 14:00:51 -0700
Subject: [PATCH] [Hexagon] Add XQFloat code generation and post-RA QFP
handling
Introduce two new passes for the Hexagon HVX floating-point pipeline,
targeting v79+ where QFloat (qf16/qf32) is the native HVX FP format.
HexagonXQFloatGenerator lowers IEEE-754 HVX floating-point sequences
(sf/hf) to native QFloat (qf16/qf32) operations. QFloat instructions
are faster and more power-efficient than their IEEE counterparts, with
optional accuracy trade-offs. The pass exposes four modes:
* Strict IEEE-754 compliant
* IEEE-754 compliant (extended dynamic range and subnormal precision,
no IEEE-754 overflow handling)
* Lossy subnormals
* Legacy
HexagonPostRAHandleQFP runs after register allocation and corrects the
spill/refill paths. QFloat operands carry four extra precision bits
that are silently dropped if the value passes through a spill slot or
a non-HVX instruction. The pass uses the Register DataFlow Graph
(RDF) to walk use-def chains in non-SSA form, inserts qf->IEEE
conversions before spills and non-HVX uses, and rewrites saturating
instruction opcodes that can absorb IEEE operands directly.
Co-authored-by: Sumanth Gundapaneni <sgundapa at quicinc.com>
Co-authored-by: Santanu Das <santdas at qti.qualcomm.com>
---
clang/include/clang/Options/Options.td | 3 +
clang/lib/Driver/ToolChains/Clang.cpp | 112 +
clang/lib/Driver/ToolChains/Hexagon.cpp | 29 +-
clang/test/Driver/hexagon-hvx-ieee-qfloat.c | 25 +
.../test/Driver/hexagon-hvx-qfloat-backend.c | 43 +
llvm/include/llvm/CodeGen/RDFGraph.h | 2 +
llvm/lib/Target/Hexagon/CMakeLists.txt | 2 +
llvm/lib/Target/Hexagon/Hexagon.h | 5 +-
llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp | 24 +
llvm/lib/Target/Hexagon/HexagonInstrInfo.h | 3 +
.../Target/Hexagon/HexagonPostRAHandleQFP.cpp | 1854 ++++++++++++++
.../Target/Hexagon/HexagonTargetMachine.cpp | 32 +-
.../lib/Target/Hexagon/HexagonTargetMachine.h | 3 +
.../Hexagon/HexagonXQFloatGenerator.cpp | 2177 +++++++++++++++++
.../CodeGen/Hexagon/autohvx/xqf-add-qf.ll | 157 ++
.../CodeGen/Hexagon/autohvx/xqf-assertion1.ll | 84 +
.../Hexagon/autohvx/xqf-check-free-reg.ll | 110 +
.../Hexagon/autohvx/xqf-check-qf-instrs.ll | 73 +
.../autohvx/xqf-compliant-ieee-mul-qf16.ll | 86 +
.../autohvx/xqf-compliant-ieee-mul-qf32.ll | 136 +
.../Hexagon/autohvx/xqf-convert-elim.ll | 77 +
.../Hexagon/autohvx/xqf-corner-case1.ll | 147 ++
.../Hexagon/autohvx/xqf-fix-invalid-opcode.ll | 72 +
.../CodeGen/Hexagon/autohvx/xqf-fixup-qfp1.ll | 372 ---
.../Hexagon/autohvx/xqf-handle-conv.ll | 180 ++
.../CodeGen/Hexagon/autohvx/xqf-input-rt.ll | 63 +
.../Hexagon/autohvx/xqf-lossy-mul-qf16.ll | 74 +
.../Hexagon/autohvx/xqf-lossy-mul-qf32.ll | 109 +
.../CodeGen/Hexagon/autohvx/xqf-mode-flags.ll | 76 +
.../CodeGen/Hexagon/autohvx/xqf-multi-conv.ll | 133 +
.../CodeGen/Hexagon/autohvx/xqf-multidef.ll | 49 +
.../autohvx/xqf-normalization-assert.ll | 459 ++++
.../autohvx/xqf-postra-conv-double.mir | 120 +
.../autohvx/xqf-postra-conv-double2.ll | 28 +
.../Hexagon/autohvx/xqf-postra-copy3.ll | 20 +
.../Hexagon/autohvx/xqf-postra-fakereg.ll | 130 +
.../autohvx/xqf-postra-handle-crash.ll | 23 +
.../autohvx/xqf-postra-handle-crash2.mir | 86 +
.../autohvx/xqf-postra-handle-qf32-mul.ll | 69 +
.../Hexagon/autohvx/xqf-postra-legacy-mode.ll | 30 +
.../Hexagon/autohvx/xqf-postra-subreg2.ll | 99 +
.../Hexagon/autohvx/xqf-postra-subreg3.ll | 45 +
.../Hexagon/autohvx/xqf-postra-warnings.ll | 60 +
.../autohvx/xqf-strict-ieee-mul-qf16.ll | 91 +
.../autohvx/xqf-strictieee-mul-qf32.ll | 123 +
.../Hexagon/autohvx/xqf-unary-crash.ll | 25 +
.../xqf-v81-compliant-ieee-mul-qf32.ll | 109 +
.../autohvx/xqf-v81/xqf-v81-lossy-mul-qf32.ll | 98 +
.../xqf-v81/xqf-v81-strict-mul-qf32.ll | 119 +
.../Hexagon/autohvx/xqf-v81/xqf-v81-vsub.ll | 164 ++
llvm/test/CodeGen/Hexagon/autohvx/xqf-vsub.ll | 130 +
.../CodeGen/Hexagon/autohvx/xqf-warnings.ll | 143 ++
52 files changed, 8103 insertions(+), 380 deletions(-)
create mode 100644 clang/test/Driver/hexagon-hvx-ieee-qfloat.c
create mode 100644 clang/test/Driver/hexagon-hvx-qfloat-backend.c
create mode 100644 llvm/lib/Target/Hexagon/HexagonPostRAHandleQFP.cpp
create mode 100644 llvm/lib/Target/Hexagon/HexagonXQFloatGenerator.cpp
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-add-qf.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-assertion1.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-check-free-reg.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-check-qf-instrs.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf16.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf32.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-convert-elim.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-corner-case1.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-fix-invalid-opcode.ll
delete mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-fixup-qfp1.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-handle-conv.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-input-rt.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf16.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf32.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-mode-flags.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-multi-conv.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-multidef.ll
create mode 100755 llvm/test/CodeGen/Hexagon/autohvx/xqf-normalization-assert.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double.mir
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double2.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-copy3.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-fakereg.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash2.mir
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-qf32-mul.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-legacy-mode.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg2.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg3.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-warnings.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-strict-ieee-mul-qf16.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-strictieee-mul-qf32.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-unary-crash.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-compliant-ieee-mul-qf32.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-lossy-mul-qf32.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-strict-mul-qf32.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-vsub.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-vsub.ll
create mode 100644 llvm/test/CodeGen/Hexagon/autohvx/xqf-warnings.ll
diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td
index 4fd892e58df86..c82c00c98d3ef 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -6986,6 +6986,9 @@ def mhexagon_hvx_length_EQ : Joined<["-"], "mhvx-length=">,
def mhexagon_hvx_qfloat : Flag<["-"], "mhvx-qfloat">,
Group<m_hexagon_Features_HVX_Group>,
HelpText<"Enable Hexagon HVX QFloat instructions">;
+def mhexagon_hvx_qfloat_EQ : Joined<["-"], "mhvx-qfloat=">,
+ Group<m_hexagon_Features_HVX_Group>,
+ HelpText<"Enable Hexagon HVX QFloat instructions with mode: strict-ieee, ieee, lossy, legacy (v79+)">;
def mno_hexagon_hvx_qfloat : Flag<["-"], "mno-hvx-qfloat">,
Group<m_hexagon_Features_HVX_Group>,
HelpText<"Disable Hexagon HVX QFloat instructions">;
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 8a0efd70e6c0d..b942e74d8933f 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -61,6 +61,7 @@
#include "llvm/TargetParser/RISCVISAInfo.h"
#include "llvm/TargetParser/RISCVTargetParser.h"
#include <cctype>
+#include <iterator>
using namespace clang::driver;
using namespace clang::driver::tools;
@@ -2235,6 +2236,115 @@ void Clang::AddX86TargetArgs(const ArgList &Args,
}
}
+static StringRef getOptionName(StringRef Option, const char Delimiter = '=') {
+ size_t Index = Option.find(Delimiter);
+ if (Index != StringRef::npos)
+ Option = Option.substr(0, Index);
+ return Option;
+}
+
+static void checkAndRemoveLLVMArg(ArgStringList &CmdArgs, StringRef Opt) {
+ Opt = getOptionName(Opt);
+ if (CmdArgs.size() < 2)
+ return;
+
+ for (auto It = std::next(CmdArgs.begin()); It != CmdArgs.end(); ++It) {
+ StringRef Option = *It;
+ if (!Option.starts_with(Opt))
+ continue;
+ Option = getOptionName(Option);
+ if (Option != Opt)
+ continue;
+ if (StringRef(*(It - 1)) != "-mllvm")
+ continue;
+
+ It = CmdArgs.erase(It);
+ CmdArgs.erase(It - 1);
+ return;
+ }
+}
+
+static void pushBackLLVMArg(ArgStringList &CmdArgs, const char *A) {
+ checkAndRemoveLLVMArg(CmdArgs, A);
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back(A);
+}
+
+static void addQFloatLossyFastMathArgs(ArgStringList &CmdArgs) {
+ for (auto It = CmdArgs.begin(), Ie = CmdArgs.end(); It != Ie;) {
+ StringRef Option = *It;
+ if (Option == "-fmath-errno" || Option == "-ffp-contract=on") {
+ It = CmdArgs.erase(It);
+ Ie = CmdArgs.end();
+ } else {
+ ++It;
+ }
+ }
+
+ CmdArgs.push_back("-menable-no-infs");
+ CmdArgs.push_back("-menable-no-nans");
+ CmdArgs.push_back("-fapprox-func");
+ CmdArgs.push_back("-funsafe-math-optimizations");
+ CmdArgs.push_back("-fno-signed-zeros");
+ CmdArgs.push_back("-mreassociate");
+ CmdArgs.push_back("-freciprocal-math");
+ CmdArgs.push_back("-ffp-contract=fast");
+ CmdArgs.push_back("-ffast-math");
+ CmdArgs.push_back("-ffinite-math-only");
+ CmdArgs.push_back("-D__FAST_MATH__");
+ pushBackLLVMArg(CmdArgs, "-fast-math=true");
+}
+
+static void addQFloatBackendArg(const Driver &D, const ArgList &Args,
+ ArgStringList &CmdArgs) {
+ auto HvxVerOpt = toolchains::HexagonToolChain::GetHVXVersion(Args);
+ bool HasHVX = HvxVerOpt.has_value();
+ std::string HvxVer = HasHVX ? *HvxVerOpt : std::string();
+ if (Args.hasArg(options::OPT_mhexagon_hvx, options::OPT_mhexagon_hvx_EQ,
+ options::OPT_mhexagon_hvx_ieee_fp) &&
+ HasHVX) {
+ unsigned HvxVerNum = 0;
+ if (StringRef(HvxVer).drop_front(1).getAsInteger(10, HvxVerNum))
+ HvxVerNum = 0;
+
+ if (Arg *A = Args.getLastArg(options::OPT_mhexagon_hvx_qfloat,
+ options::OPT_mhexagon_hvx_qfloat_EQ,
+ options::OPT_mhexagon_hvx_ieee_fp)) {
+ if (HvxVerNum >= 79) {
+ if (A->getOption().matches(options::OPT_mhexagon_hvx_qfloat_EQ)) {
+ const char *Mode =
+ llvm::StringSwitch<const char *>(StringRef(A->getValue()).lower())
+ .Case("strict-ieee", "-hexagon-qfloat-mode=strict-ieee")
+ .Case("ieee", "-hexagon-qfloat-mode=ieee")
+ .Case("lossy", "-hexagon-qfloat-mode=lossy")
+ .Case("legacy", "-hexagon-qfloat-mode=legacy")
+ .Default("-hexagon-qfloat-mode=lossy");
+ pushBackLLVMArg(CmdArgs, Mode);
+ if (strcmp(Mode, "-hexagon-qfloat-mode=lossy") == 0)
+ addQFloatLossyFastMathArgs(CmdArgs);
+ } else if (A->getOption().matches(options::OPT_mhexagon_hvx_qfloat)) {
+ pushBackLLVMArg(CmdArgs, "-hexagon-qfloat-mode=lossy");
+ addQFloatLossyFastMathArgs(CmdArgs);
+ } else {
+ pushBackLLVMArg(CmdArgs, "-hexagon-qfloat-mode=ieee");
+ }
+ } else {
+ if (Arg *QFloatArg =
+ Args.getLastArg(options::OPT_mhexagon_hvx_qfloat,
+ options::OPT_mhexagon_hvx_qfloat_EQ,
+ options::OPT_mno_hexagon_hvx_qfloat);
+ QFloatArg && QFloatArg->getOption().matches(
+ options::OPT_mhexagon_hvx_qfloat_EQ)) {
+ D.Diag(diag::warn_drv_unsupported_option_part_for_target)
+ << QFloatArg->getValue() << QFloatArg->getAsString(Args)
+ << (std::string("HVX ") + HvxVer +
+ "; falling back to legacy qfloat mode");
+ }
+ }
+ }
+ }
+}
+
void Clang::AddHexagonTargetArgs(const ArgList &Args,
ArgStringList &CmdArgs) const {
CmdArgs.push_back("-mqdsp6-compat");
@@ -2254,6 +2364,8 @@ void Clang::AddHexagonTargetArgs(const ArgList &Args,
}
CmdArgs.push_back("-mllvm");
CmdArgs.push_back("-machine-sink-split=0");
+
+ addQFloatBackendArg(getToolChain().getDriver(), Args, CmdArgs);
}
void Clang::AddLanaiTargetArgs(const ArgList &Args,
diff --git a/clang/lib/Driver/ToolChains/Hexagon.cpp b/clang/lib/Driver/ToolChains/Hexagon.cpp
index 0e7055797a1f0..d60b0f201f1e0 100644
--- a/clang/lib/Driver/ToolChains/Hexagon.cpp
+++ b/clang/lib/Driver/ToolChains/Hexagon.cpp
@@ -106,14 +106,14 @@ static void handleHVXTargetFeatures(const Driver &D, const ArgList &Args,
// Handle HVX floating point flags.
auto checkFlagHvxVersion =
- [&](auto FlagOn, auto FlagOff,
+ [&](auto FlagOn, auto FlagOnWithModes, auto FlagOff, bool CheckMode,
unsigned MinVerNum) -> std::optional<StringRef> {
// Return an std::optional<StringRef>:
// - std::nullopt indicates a verification failure, or that the flag was not
// present in Args.
// - Otherwise the returned value is that name of the feature to add
// to Features.
- Arg *A = Args.getLastArg(FlagOn, FlagOff);
+ Arg *A = Args.getLastArg(FlagOn, FlagOnWithModes, FlagOff);
if (!A)
return std::nullopt;
@@ -130,17 +130,34 @@ static void handleHVXTargetFeatures(const Driver &D, const ArgList &Args,
<< withMinus(OptName) << ("v" + std::to_string(HvxVerNum));
return std::nullopt;
}
+
+ if (CheckMode && A->getOption().matches(FlagOnWithModes)) {
+ bool ValidMode =
+ llvm::StringSwitch<bool>(StringRef(A->getValue()).lower())
+ .Cases({"strict-ieee", "ieee", "lossy", "legacy"}, true)
+ .Default(false);
+ if (!ValidMode)
+ D.Diag(diag::err_drv_invalid_value)
+ << A->getAsString(Args) << A->getValue();
+ }
return makeFeature(OptName, true);
};
- if (auto F = checkFlagHvxVersion(options::OPT_mhexagon_hvx_qfloat,
- options::OPT_mno_hexagon_hvx_qfloat, 68)) {
+ if (auto F = checkFlagHvxVersion(
+ options::OPT_mhexagon_hvx_qfloat, options::OPT_mhexagon_hvx_qfloat_EQ,
+ options::OPT_mno_hexagon_hvx_qfloat, /*CheckMode=*/true, 68)) {
Features.push_back(*F);
}
- if (auto F = checkFlagHvxVersion(options::OPT_mhexagon_hvx_ieee_fp,
- options::OPT_mno_hexagon_hvx_ieee_fp, 68)) {
+ if (auto F = checkFlagHvxVersion(
+ options::OPT_mhexagon_hvx_ieee_fp, options::OPT_mhexagon_hvx_ieee_fp,
+ options::OPT_mno_hexagon_hvx_ieee_fp, /*CheckMode=*/false, 68)) {
Features.push_back(*F);
}
+
+ // On v79 and above, there is no IEEE hardware. Treat -mhvx-ieee-fp
+ // as "qfloat mode ieee".
+ if (HvxVerNum >= 79 && Args.getLastArg(options::OPT_mhexagon_hvx_ieee_fp))
+ Features.push_back("+hvx-qfloat");
}
// Hexagon target features.
diff --git a/clang/test/Driver/hexagon-hvx-ieee-qfloat.c b/clang/test/Driver/hexagon-hvx-ieee-qfloat.c
new file mode 100644
index 0000000000000..ee8dc9de25751
--- /dev/null
+++ b/clang/test/Driver/hexagon-hvx-ieee-qfloat.c
@@ -0,0 +1,25 @@
+// ---------------------------------------------------------------------------
+// Tests for the hvx qfloat target feature and backend flag if -mhvx-ieee-fp is
+// passed on v79 and above.
+// ---------------------------------------------------------------------------
+
+// Test for v79, the correct backend flag is passed for -mhvx-ieee-fp.
+// CHECK-IEEE: "-mllvm" "-hexagon-qfloat-mode=ieee"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-ieee-fp 2>&1 | FileCheck -check-prefix=CHECK-IEEE %s
+
+// Test for arches lower than v79 does not pass any backend flag.
+// CHECK-MODE-NOT: "-mllvm" "-hexagon-qfloat-mode="
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv75 -mhvx \
+// RUN: -mhvx-ieee-fp 2>&1 | FileCheck -check-prefix=CHECK-MODE %s
+
+// Test for v79, the correct qfloat target feature is set when -mhvx-ieee-fp is
+// passed.
+// CHECK-HVX-QFLOAT-ON: "-target-feature" "+hvx-qfloat"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-ieee-fp 2>&1 | FileCheck -check-prefix=CHECK-HVX-QFLOAT-ON %s
+
+// Test for arches lower than v79 does not set the qfloat target feature.
+// CHECK-FEATURE-NOT: "-target-feature" "+hvx-qfloat"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv75 -mhvx \
+// RUN: -mhvx-ieee-fp 2>&1 | FileCheck -check-prefix=CHECK-FEATURE %s
diff --git a/clang/test/Driver/hexagon-hvx-qfloat-backend.c b/clang/test/Driver/hexagon-hvx-qfloat-backend.c
new file mode 100644
index 0000000000000..ae6866777acff
--- /dev/null
+++ b/clang/test/Driver/hexagon-hvx-qfloat-backend.c
@@ -0,0 +1,43 @@
+// ---------------------------------------------------------------------------
+// Tests for the hvx qfloat modes backend flag.
+// ---------------------------------------------------------------------------
+
+// Test for correct backend flag with case-insensitive values.
+// CHECK-STRICT-IEEE: "-mllvm" "-hexagon-qfloat-mode=strict-ieee"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=strict-ieee 2>&1 | FileCheck -check-prefix=CHECK-STRICT-IEEE %s
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=sTriCt-Ieee 2>&1 | FileCheck -check-prefix=CHECK-STRICT-IEEE %s
+
+// CHECK-IEEE: "-mllvm" "-hexagon-qfloat-mode=ieee"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=ieee 2>&1 | FileCheck -check-prefix=CHECK-IEEE %s
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=IEEE 2>&1 | FileCheck -check-prefix=CHECK-IEEE %s
+
+// CHECK-LOSSY: "-mllvm" "-hexagon-qfloat-mode=lossy"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=lossy 2>&1 | FileCheck -check-prefix=CHECK-LOSSY %s
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=lOSSy 2>&1 | FileCheck -check-prefix=CHECK-LOSSY %s
+
+// CHECK-LEGACY: "-mllvm" "-hexagon-qfloat-mode=legacy"
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=legacy 2>&1 | FileCheck -check-prefix=CHECK-LEGACY %s
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat=LEGacy 2>&1 | FileCheck -check-prefix=CHECK-LEGACY %s
+
+// Test for default mode, if no mode is specified on v79.
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv79 -mhvx \
+// RUN: -mhvx-qfloat 2>&1 | FileCheck -check-prefix=CHECK-LOSSY %s
+
+// Test for arches lower than v79 does not pass any backend flag.
+// CHECK-MODE-NOT: "-mllvm" "-hexagon-qfloat-mode="
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv75 -mhvx \
+// RUN: -mhvx-qfloat 2>&1 | FileCheck -check-prefix=CHECK-MODE %s
+
+// Test for arches lower than v79 warns that qfloat mode is ignored.
+// CHECK-MODE-WARN: warning: ignoring 'ieee' in '-mhvx-qfloat=ieee' option as it is not currently supported for target 'HVX v75'
+// CHECK-MODE-WARN-NOT: "-mllvm" "-hexagon-qfloat-mode="
+// RUN: %clang -c %s -### -target hexagon-unknown-elf -mv75 -mhvx \
+// RUN: -mhvx-qfloat=ieee 2>&1 | FileCheck -check-prefix=CHECK-MODE-WARN %s
diff --git a/llvm/include/llvm/CodeGen/RDFGraph.h b/llvm/include/llvm/CodeGen/RDFGraph.h
index c1ec2ddff14a3..8d96efd8e2f0f 100644
--- a/llvm/include/llvm/CodeGen/RDFGraph.h
+++ b/llvm/include/llvm/CodeGen/RDFGraph.h
@@ -358,6 +358,8 @@ template <typename T> struct NodeAddr {
return !operator==(NA);
}
+ bool operator<(const NodeAddr<T> &NA) const { return Id < NA.Id; }
+
T Addr = nullptr;
NodeId Id = 0;
};
diff --git a/llvm/lib/Target/Hexagon/CMakeLists.txt b/llvm/lib/Target/Hexagon/CMakeLists.txt
index 38dcc09282330..f98b519fe4974 100644
--- a/llvm/lib/Target/Hexagon/CMakeLists.txt
+++ b/llvm/lib/Target/Hexagon/CMakeLists.txt
@@ -58,9 +58,11 @@ add_llvm_target(HexagonCodeGen
HexagonOptAddrMode.cpp
HexagonOptimizeSZextends.cpp
HexagonPeephole.cpp
+ HexagonPostRAHandleQFP.cpp
HexagonQFPOptimizer.cpp
HexagonRDFOpt.cpp
HexagonRegisterInfo.cpp
+ HexagonXQFloatGenerator.cpp
HexagonSelectionDAGInfo.cpp
HexagonSplitConst32AndConst64.cpp
HexagonSplitDouble.cpp
diff --git a/llvm/lib/Target/Hexagon/Hexagon.h b/llvm/lib/Target/Hexagon/Hexagon.h
index 1db2326b274dc..e9ee7b7c48e3e 100644
--- a/llvm/lib/Target/Hexagon/Hexagon.h
+++ b/llvm/lib/Target/Hexagon/Hexagon.h
@@ -69,8 +69,9 @@ void initializeHexagonOptimizeSZextendsPass(PassRegistry &);
void initializeHexagonPeepholePass(PassRegistry &);
void initializeHexagonSplitConst32AndConst64Pass(PassRegistry &);
void initializeHexagonVectorPrintPass(PassRegistry &);
-
void initializeHexagonQFPOptimizerPass(PassRegistry &);
+void initializeHexagonPostRAHandleQFPPass(PassRegistry &);
+void initializeHexagonXQFloatGeneratorPass(PassRegistry &);
Pass *createHexagonLoopIdiomPass();
Pass *createHexagonVectorLoopCarriedReuseLegacyPass();
@@ -119,6 +120,8 @@ FunctionPass *createHexagonVectorPrint();
FunctionPass *createHexagonVExtract();
FunctionPass *createHexagonExpandCondsets();
FunctionPass *createHexagonQFPOptimizer();
+FunctionPass *createHexagonPostRAHandleQFP();
+FunctionPass *createHexagonXQFloatGenerator();
} // end namespace llvm;
diff --git a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
index 6c95d54bf111b..5e8578a5d407d 100644
--- a/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp
@@ -4950,6 +4950,30 @@ bool HexagonInstrInfo::isQFPInstr(MachineInstr *MI) const {
return isQFP32Instr(MI) || isQFP16Instr(MI);
}
+// Return true if the function contains any qf-generating instructions.
+bool HexagonInstrInfo::hasQFPInstrs(const MachineFunction &MF) const {
+ for (const MachineBasicBlock &MBB : MF)
+ for (const MachineInstr &MI : MBB)
+ if (isQFPInstr(const_cast<MachineInstr *>(&MI)))
+ return true;
+ return false;
+}
+
+// Returns true if A appears before B within the same basic block.
+bool HexagonInstrInfo::isMIBefore(const MachineInstr *A,
+ const MachineInstr *B) const {
+ if (!A || !B || A->getParent() != B->getParent())
+ return false;
+
+ for (const MachineInstr &MI : *A->getParent()) {
+ if (&MI == A)
+ return true;
+ if (&MI == B)
+ return false;
+ }
+ return false;
+}
+
// Addressing mode relations.
short HexagonInstrInfo::changeAddrMode_abs_io(short Opc) const {
return Opc >= 0 ? Hexagon::changeAddrMode_abs_io(Opc) : Opc;
diff --git a/llvm/lib/Target/Hexagon/HexagonInstrInfo.h b/llvm/lib/Target/Hexagon/HexagonInstrInfo.h
index 230f5d2228457..1901b260926d2 100644
--- a/llvm/lib/Target/Hexagon/HexagonInstrInfo.h
+++ b/llvm/lib/Target/Hexagon/HexagonInstrInfo.h
@@ -52,6 +52,9 @@ class HexagonInstrInfo : public HexagonGenInstrInfo {
const HexagonRegisterInfo &getRegisterInfo() const { return RegInfo; }
+ bool isMIBefore(const MachineInstr *A, const MachineInstr *B) const;
+ bool hasQFPInstrs(const MachineFunction &MF) const;
+
/// TargetInstrInfo overrides.
/// If the specified machine instruction is a direct
diff --git a/llvm/lib/Target/Hexagon/HexagonPostRAHandleQFP.cpp b/llvm/lib/Target/Hexagon/HexagonPostRAHandleQFP.cpp
new file mode 100644
index 0000000000000..31a97918d3c1f
--- /dev/null
+++ b/llvm/lib/Target/Hexagon/HexagonPostRAHandleQFP.cpp
@@ -0,0 +1,1854 @@
+//===--------------------- HexagonPostRAHandleQFP.cpp --------------------------
+//===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+// For v79 and above, we generate qf operations for HVX which includes vadd,
+// vsub and vmpy instructions. These qf operations with qf operands are fast,
+// maintain similar accuracy as IEEE and saves power.
+//
+// However, these qf operands should always be converted back to IEEE format
+// when used in non-HVX instructions. This is because of how the qf values
+// are stored in memory. qf operands have 4 extra bits. If used in non-HVX
+// operations, these bits get dropped resulting in incorrect value being
+// used. So, before use in any non-HVX operation we need to convert these
+// qf values to IEEE format.
+//
+// During register allocation, when no more physical registers are available
+// the qf operands may be spilled to memory. This instantly causes loss of
+// accuracy. This pass prevents that by:
+// 1. Inserting qf type to IEEE type conversion instructions before the spill.
+// 2. Iterating over the uses of qf def (created before the spill) and
+// changing their opcodes to handle IEEE type operands for saturating
+// instructions. This is because, the refills will use IEEE type operands, but
+// the instructions will still assume qf operands. For non-saturating
+// instructions which uses qf, we incorporate a conversion to IEEE before that.
+// 3. Iterating over the uses of qf def created by the spill and replacing
+// them with appropiate opcode (which uses IEEE operands) for saturating
+// instructions. For non-saturating instructions which uses qf,
+// we incorporate a conversion to IEEE before that.
+// 4. Iterating over the copy instructions and checking their uses,
+// inserting conversions from qf to IEEE whenever required. The conversions
+// are inserted after their reaching def since there can be multiple defs
+// for use in non-SSA form.
+//
+// To get the use-def chains, we make use of Register DataFlow Graph (RDF),
+// since after register allocation SSA form is lost. This can be done during
+// spills and fills during Frame Lowering for register allocation. However,
+// that was abandoned due to the intermediate state of the code.
+// Liveness is preserved in this pass.
+//
+// NOTE:
+// Saturating instructions: Instructions for which transformation involves
+// only changing the opcode. Eg. vmpy(qf32, sf) saturates to vmpy(sf, sf) when
+// we see that the first operand is now a sf type.
+// Non-Saturating instructions: Instructions for which conversion(s) have
+// to be inserted. Eg. Vd.f8=Vu.qf16. If the use operand is now hf type,
+// we have to insert a conversion qf16 = hf before this instruction.
+//
+// FIXME tags have been added for potential errors, along with the underlying
+// assumption.
+// FIXME Implement v81 specific optimizations as below. At the moment, we add
+// converts.
+// Vd.qf16=Vu.hf
+// Vd.qf16=Vu.qf16
+// Vd.qf32=Vu.qf32
+// Vd.qf32=Vu.sf
+//===---------------------------------------------------------------------===//
+
+#include "HexagonTargetMachine.h"
+#include "llvm/CodeGen/LiveInterval.h"
+#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/LivePhysRegs.h"
+#include "llvm/CodeGen/MachineDominanceFrontier.h"
+#include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/RDFGraph.h"
+#include "llvm/CodeGen/RDFLiveness.h"
+#include "llvm/CodeGen/RDFRegisters.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Target/TargetMachine.h"
+
+#define DEBUG_TYPE "handle-qfp"
+
+using namespace llvm;
+using namespace rdf;
+
+extern cl::opt<QFloatMode> QFloatModeValue;
+
+cl::opt<bool> DisablePostRAHandleQFloat(
+ "disable-handle-qfp", cl::init(false),
+ cl::desc("Disable handling of Qfloat spills/refills after register "
+ "allocation."));
+
+static cl::opt<bool> EnablePostRAXqfCompliance(
+ "enable-postra-xqf-check", cl::init(false),
+ cl::desc("Enable ABI compliance for xqf operands post regalloc."));
+
+namespace llvm {
+FunctionPass *createHexagonPostRAHandleQFP();
+void initializeHexagonPostRAHandleQFPPass(PassRegistry &);
+} // namespace llvm
+
+// QF Instructions list which need to be analyzed.
+// The value of the key denotes a pair
+// pair.first|pair.second = True if IEEE type, false otherwise.
+// We only need to change the opcode to handling qf/sf
+// misuses for these, or these instructions can be 'saturated'.
+DenseMap<unsigned short, std::pair<bool, bool>> QFPSatInstsMap{
+ {Hexagon::V6_vadd_qf16_mix, {false, true}},
+ {Hexagon::V6_vadd_qf16, {false, false}},
+ {Hexagon::V6_vadd_qf32_mix, {false, true}},
+ {Hexagon::V6_vadd_qf32, {false, false}},
+ {Hexagon::V6_vsub_qf16_mix, {false, true}},
+ {Hexagon::V6_vsub_hf_mix, {true, false}},
+ {Hexagon::V6_vsub_qf16, {false, false}},
+ {Hexagon::V6_vsub_qf32_mix, {false, true}},
+ {Hexagon::V6_vsub_sf_mix, {true, false}},
+ {Hexagon::V6_vsub_qf32, {false, false}},
+ {Hexagon::V6_vmpy_qf16_mix_hf, {false, true}},
+ {Hexagon::V6_vmpy_qf16, {false, false}},
+ {Hexagon::V6_vmpy_qf32_mix_hf, {false, true}},
+ {Hexagon::V6_vmpy_qf32_qf16, {false, false}},
+ {Hexagon::V6_vmpy_qf32, {false, false}},
+ {Hexagon::V6_vmpy_rt_qf16, {false, true}},
+ // These opcodes take a single operand only.
+ // Second placeholder op is true always.
+ {Hexagon::V6_vabs_qf32_qf32, {false, true}},
+ {Hexagon::V6_vabs_qf16_qf16, {false, true}},
+ {Hexagon::V6_vneg_qf32_qf32, {false, true}},
+ {Hexagon::V6_vneg_qf16_qf16, {false, true}},
+ {Hexagon::V6_vilog2_qf32, {false, true}},
+ {Hexagon::V6_vilog2_qf16, {false, true}},
+ {Hexagon::V6_vconv_qf32_qf32, {false, true}},
+ {Hexagon::V6_vconv_qf16_qf16, {false, true}},
+};
+
+// This holds the instruction opcodes for which there are
+// no 'saturating' opcodes. The only way is to insert
+// convert instructions before them.
+SmallVector<unsigned short, 5> QFNonSatInstr{
+ Hexagon::V6_vconv_hf_qf16, Hexagon::V6_vconv_hf_qf32,
+ Hexagon::V6_vconv_sf_qf32,
+ // v81 instructions
+ Hexagon::V6_vconv_bf_qf32, Hexagon::V6_vconv_f8_qf16};
+
+namespace {
+class HexagonPostRAHandleQFP : public MachineFunctionPass {
+public:
+ static char ID;
+ HexagonPostRAHandleQFP() : MachineFunctionPass(ID) {
+ PassRegistry &R = *PassRegistry::getPassRegistry();
+ initializeHexagonPostRAHandleQFPPass(R);
+ }
+ StringRef getPassName() const override {
+ return "Hexagon handle QFloat spills and refills post RA.";
+ }
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ AU.addRequired<MachineDominatorTreeWrapperPass>();
+ AU.addRequired<MachineDominanceFrontierWrapperPass>();
+ AU.setPreservesCFG();
+ }
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+private:
+ // QFUses collects the instructions which uses QF operands.
+ // These have to be deleted and transformed to opcodes
+ // to denote usage of IEEE operands.
+ // It might involve changing the order of the Register operands.
+ using QFUses = std::map<MachineInstr *, std::pair<bool, bool>>;
+ QFUses QFUsesMap;
+
+ // Holds the Register Dataglow Graph.
+ DataFlowGraph *DFG = nullptr;
+
+ // Stores spill nodes and their reaching definition instructions
+ // which generates the qf operand to be stored.
+ std::vector<std::pair<MachineInstr *, NodeAddr<DefNode *>>> SpillMIs;
+ // Stores the refill nodes consisting of load instructions.
+ std::vector<NodeAddr<DefNode *>> RefillMIs;
+
+ // Stores the type of op.
+ enum ConvOperand {
+ Undefined = 0x0,
+ Lo = 0x1,
+ Hi = 0x2,
+ HiLo = 0x3,
+ };
+ // Stores the convert instructions which take qf operands.
+ MapVector<MachineInstr *, unsigned> QFNonSatMIs;
+
+ // Stores the qf-generating vmul/vadd/etc. nodes with mutiple reaching defs
+ std::set<NodeAddr<StmtNode *>> PossibleMultiReachDefs;
+ // Qf generating instructions to ignore. Do not insert conversion instruction
+ // to sf/hf from qf, if the instr is present in this list; since that means
+ // a conversion has already been inserted after the instruction.
+ SmallPtrSet<MachineInstr *, 4> IgnoreInsertConvList;
+
+ // Register type
+ enum class RegType { qf32, qf16, qf32_double, qf16_double, ieee, undefined };
+ // Stores the copy instructions which their reaching def, along with the op
+ // type
+ std::map<std::pair<NodeAddr<DefNode *>, NodeAddr<DefNode *>>, RegType>
+ QFCopys;
+
+ // Stores the reaching defs of copies whose result has to be converted to IEEE
+ DenseMap<MachineInstr *, RegType> ReachDefOfCopies;
+
+ // Stores copies which need to be converted back to qf. The uses of these
+ // copies feed to qf type instructions and hence can be converted back to qf
+ // type.
+ DenseMap<MachineInstr *, std::pair<NodeAddr<DefNode *>, RegType>>
+ ConvertToQfCopies;
+
+ // Subregister kill set for a doubletype use. The pair of bool,bool
+ // represents the hi and lo subregisters of the double register.
+ DenseMap<MachineInstr *, std::pair<bool, bool>> SubRegKillSet;
+
+ const HexagonInstrInfo *HII = nullptr;
+ const HexagonRegisterInfo *HRI = nullptr;
+ MachineRegisterInfo *MRI = nullptr;
+ Liveness *LV = nullptr;
+ const HexagonSubtarget *HST = nullptr;
+
+ void collectQFPStackSpill(NodeAddr<StmtNode *> *);
+ void collectQFPStackRefill(NodeAddr<StmtNode *> *);
+ void collectCopies(NodeAddr<StmtNode *> *);
+ bool HandleRefills();
+ bool HandleSpills();
+ bool HandleCopies();
+ bool HandleNonSatInstr();
+ bool HandleMultiReachingDefs();
+ bool HandleReachDefOfCopies();
+ bool HandleConvertToQfCopies();
+ RegType HasQfUses(NodeAddr<DefNode *>, MachineInstr *);
+ void collectConvQFInstr(NodeAddr<DefNode *> &);
+ void collectQFUses(NodeAddr<DefNode *>, MachineInstr *DefMI);
+ void conditionallyInsert(MachineInstr &, Register &);
+
+ // Helper functions
+ unsigned short getreplacedQFOpcode(unsigned, bool, bool);
+ MCPhysReg findAllocatableReg(MachineInstr *MI) const;
+ void insertIEEEToQF(MachineInstr *, Register, MachineOperand, bool is32bit);
+ void collectLivenessForSubregs(NodeAddr<UseNode *> &);
+ void insertInstr(MachineInstr *, unsigned, unsigned, unsigned, RegState);
+};
+} // namespace
+
+// This class handles spurious vector instrutions which do not
+// follow the ABI. For eg, vcombine(qf,qf) takes qf operands
+// instead of IEEE type. This diagnostic pass can be used
+// as a final verifier for XQF implementation. Turned off by
+// default
+class XqfPostRADiagnosis {
+public:
+ XqfPostRADiagnosis(DataFlowGraph &_G, Liveness &_L,
+ const HexagonInstrInfo *_HII)
+ : G(&_G), L(&_L), HII(_HII) {}
+ // Deleting default constructor to handle misconstruction
+ XqfPostRADiagnosis() = delete;
+
+ void runCompliance() const;
+ void print_warning(Twine &, MachineInstr *, MachineInstr *) const;
+
+private:
+ DataFlowGraph *G = nullptr;
+ Liveness *L = nullptr;
+ const HexagonInstrInfo *HII = nullptr;
+};
+
+void XqfPostRADiagnosis::print_warning(Twine &wstr, MachineInstr *DefMI,
+ MachineInstr *UseMI) const {
+#ifndef NDEBUG
+ dbgs() << wstr;
+ dbgs() << "\n\tDef:";
+ DefMI->dump();
+ // dbgs() << "\t" << DefMI->getParent()->getName();
+ dbgs() << "\tUse:";
+ UseMI->dump();
+ // dbgs() << "\t" << UseMI->getParent()->getName();
+#endif // NDEBUG
+}
+
+// This static function gets all reached uses of a def.
+// When it encounters a phi node, it goes over the
+// reached uses of the phi node too.
+static void getAllRealUses(NodeAddr<DefNode *> DA, NodeSet &UNodeSet,
+ Liveness *L, DataFlowGraph *G,
+ bool comprehensive = false) {
+ RegisterRef DR = DA.Addr->getRegRef(*G);
+ NodeAddr<StmtNode *> DefStmt = DA.Addr->getOwner(*G);
+ MachineInstr *Instr = DefStmt.Addr->getCode();
+ auto UseSet = L->getAllReachedUses(DR, DA);
+
+ for (auto UI : UseSet) {
+ NodeAddr<UseNode *> UA = G->addr<UseNode *>(UI);
+
+ /*LLVM_DEBUG(
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*G);
+ MachineInstr* UseInstr = UseStmt.Addr->getCode();
+ if (UseInstr != nullptr)
+ {dbgs() << "\t\t[Reached Use]: "; UseInstr->dump();}
+ );*/
+
+ MachineFunction *MF = Instr->getMF();
+ const auto &HRI = MF->getSubtarget<HexagonSubtarget>().getRegisterInfo();
+ Register RR = UA.Addr->getRegRef(*G).Id;
+ if (HRI->isFakeReg(RR))
+ continue;
+
+ if (UA.Addr->getFlags() & NodeAttrs::PhiRef) {
+ NodeAddr<PhiNode *> PA = UA.Addr->getOwner(*G);
+ NodeId id = PA.Id;
+ const Liveness::RefMap &phiUse = L->getRealUses(id);
+ for (auto I : phiUse) {
+ if (!G->getPRI().alias(RegisterRef(I.first), DR))
+ continue;
+ auto phiUseSet = I.second;
+ for (auto phiUI : phiUseSet) {
+ NodeAddr<UseNode *> phiUA = G->addr<UseNode *>(phiUI.first);
+ UNodeSet.insert(phiUA.Id);
+ }
+ }
+ } else {
+ // FIXME Due to bug in RDF, check if the reaching def of the use
+ // reaches this instruction
+ if (comprehensive) {
+ UNodeSet.insert(UA.Id);
+ continue;
+ }
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*G);
+ for (NodeAddr<UseNode *> UA : UseStmt.Addr->members_if(G->IsUse, *G)) {
+ NodeId QFPDefNode = UA.Addr->getReachingDef();
+ NodeAddr<DefNode *> RegDef = G->addr<DefNode *>(QFPDefNode);
+ // FIXME Reaching def computation error
+ if (QFPDefNode == 0)
+ continue;
+ NodeAddr<StmtNode *> RegStmt = RegDef.Addr->getOwner(*G);
+ MachineInstr *ReachDefInstr = RegStmt.Addr->getCode();
+ if (ReachDefInstr && ReachDefInstr == Instr)
+ UNodeSet.insert(UA.Id);
+ }
+ }
+ }
+}
+
+void XqfPostRADiagnosis::runCompliance() const {
+ NodeAddr<FuncNode *> FA = G->getFunc();
+ for (NodeAddr<BlockNode *> BA : FA.Addr->members(*G)) {
+ for (auto IA : BA.Addr->members(*G)) {
+ if (!G->IsCode<NodeAttrs::Stmt>(IA))
+ continue;
+ NodeAddr<StmtNode *> SA = IA;
+ MachineInstr *DefMI = SA.Addr->getCode();
+ if (DefMI->isDebugInstr() || DefMI->isInlineAsm())
+ continue;
+ auto NodeBase = SA.Addr->members_if(G->IsDef, *G);
+ if (NodeBase.empty())
+ continue;
+ NodeAddr<DefNode *> DfNode = NodeBase.front();
+
+ NodeSet UseSet;
+ getAllRealUses(DfNode, UseSet, L, G, true);
+ for (auto UI : UseSet) {
+ NodeAddr<UseNode *> UA = G->addr<UseNode *>(UI);
+ if (UA.Addr->getFlags() & NodeAttrs::PhiRef)
+ continue;
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*G);
+ MachineInstr *UseMI = UseStmt.Addr->getCode();
+ if (UseMI->isDebugInstr() || UseMI->isInlineAsm())
+ continue;
+ unsigned OpNo = UA.Addr->getOp().getOperandNo();
+ if (HII->usesQF32Operand(UseMI, OpNo) && !HII->isQFP32Instr(DefMI)) {
+ Twine wstr(Twine("Mismatch: sf type used as qf32 at operand ")
+ .concat(Twine(OpNo)));
+ print_warning(wstr, DefMI, UseMI);
+ } else if (!HII->usesQF32Operand(UseMI, OpNo) &&
+ HII->isQFP32Instr(DefMI)) {
+ Twine wstr(Twine("Mismatch: qf32 type used as sf at operand ")
+ .concat(Twine(OpNo)));
+ print_warning(wstr, DefMI, UseMI);
+ } else if (HII->usesQF16Operand(UseMI, OpNo) &&
+ !HII->isQFP16Instr(DefMI)) {
+ Twine wstr(Twine("Mismatch: hf type used as qf16 at operand ")
+ .concat(Twine(OpNo)));
+ print_warning(wstr, DefMI, UseMI);
+ } else if (!HII->usesQF16Operand(UseMI, OpNo) &&
+ HII->isQFP16Instr(DefMI)) {
+ Twine wstr(Twine("Mismatch: qf16 type used as hf at operand ")
+ .concat(Twine(OpNo)));
+ print_warning(wstr, DefMI, UseMI);
+ }
+ }
+ }
+ }
+}
+
+char HexagonPostRAHandleQFP::ID = 0;
+
+namespace llvm {
+char &HexagonPostRAHandleQFPID = HexagonPostRAHandleQFP::ID;
+}
+
+// Check whether the instruction is added already, if not add it
+// along with the Register values and qf type.
+// If already added, then check the register values and edit them.
+void HexagonPostRAHandleQFP::conditionallyInsert(MachineInstr &MI,
+ Register &DefReg) {
+ LLVM_DEBUG(dbgs() << "\nCollecting instruction using QF: "; MI.dump());
+ // check if the key exists.
+ Register Reg1 = MI.getOperand(1).getReg();
+
+ // If the use is a unary operation, make second register point to Defreg
+ // This ensures that secondOp is always true
+ Register Reg2 = MI.getNumOperands() == 2 ? DefReg : MI.getOperand(2).getReg();
+
+ if (QFUsesMap.find(&MI) != QFUsesMap.end()) {
+ auto Entry = QFUsesMap[&MI];
+ bool firstOp = ((Reg1 == DefReg) ? true : false) | Entry.first;
+ bool secondOp = ((Reg2 == DefReg) ? true : false) | Entry.second;
+ QFUsesMap[&MI] = std::make_pair(firstOp, secondOp);
+
+ } else { // encountered first time.
+ // Get the default type of the operand:
+ // True : IEEE type
+ // False : QF type
+ auto defaultPair = QFPSatInstsMap[MI.getOpcode()];
+ bool firstOp = (Reg1 == DefReg) ? true : defaultPair.first;
+ bool secondOp = (Reg2 == DefReg) ? true : defaultPair.second;
+ QFUsesMap[&MI] = std::make_pair(firstOp, secondOp);
+ }
+}
+
+unsigned short HexagonPostRAHandleQFP::getreplacedQFOpcode(unsigned srcOpcode,
+ bool firstOp,
+ bool secondOp) {
+ if (firstOp && secondOp) {
+ switch (srcOpcode) {
+ case Hexagon::V6_vadd_qf32:
+ case Hexagon::V6_vadd_qf32_mix:
+ return Hexagon::V6_vadd_sf;
+ case Hexagon::V6_vadd_qf16:
+ case Hexagon::V6_vadd_qf16_mix:
+ return Hexagon::V6_vadd_hf;
+
+ case Hexagon::V6_vsub_qf32:
+ case Hexagon::V6_vsub_qf32_mix:
+ case Hexagon::V6_vsub_sf_mix:
+ return Hexagon::V6_vsub_sf;
+ case Hexagon::V6_vsub_qf16:
+ case Hexagon::V6_vsub_qf16_mix:
+ case Hexagon::V6_vsub_hf_mix:
+ return Hexagon::V6_vsub_hf;
+
+ case Hexagon::V6_vmpy_qf32:
+ return Hexagon::V6_vmpy_qf32_sf;
+ case Hexagon::V6_vmpy_qf16:
+ case Hexagon::V6_vmpy_qf16_mix_hf:
+ return Hexagon::V6_vmpy_qf16_hf;
+ case Hexagon::V6_vmpy_qf32_qf16:
+ case Hexagon::V6_vmpy_qf32_mix_hf:
+ return Hexagon::V6_vmpy_qf32_hf;
+
+ case Hexagon::V6_vmpy_rt_qf16:
+ return Hexagon::V6_vmpy_rt_hf;
+ // v81 opcodes start
+ case Hexagon::V6_vabs_qf32_qf32:
+ return Hexagon::V6_vabs_qf32_sf;
+ case Hexagon::V6_vabs_qf16_qf16:
+ return Hexagon::V6_vabs_qf16_hf;
+ case Hexagon::V6_vneg_qf32_qf32:
+ return Hexagon::V6_vneg_qf32_sf;
+ case Hexagon::V6_vneg_qf16_qf16:
+ return Hexagon::V6_vneg_qf16_hf;
+ case Hexagon::V6_vilog2_qf32:
+ return Hexagon::V6_vilog2_sf;
+ case Hexagon::V6_vilog2_qf16:
+ return Hexagon::V6_vilog2_hf;
+ case Hexagon::V6_vconv_qf32_qf32:
+ return Hexagon::V6_vconv_qf32_sf;
+ case Hexagon::V6_vconv_qf16_qf16:
+ return Hexagon::V6_vconv_qf16_hf;
+ // v81 opcodes end
+
+ default:
+ llvm_unreachable("Invalid qf opcode in this scenario!");
+ }
+ } else if (firstOp) {
+ switch (srcOpcode) {
+ case Hexagon::V6_vadd_qf32:
+ return Hexagon::V6_vadd_qf32_mix; // interchange reqd
+ case Hexagon::V6_vadd_qf16:
+ return Hexagon::V6_vadd_qf16_mix; // interchange reqd
+
+ case Hexagon::V6_vsub_qf32:
+ if (HST->useHVXV81Ops())
+ return Hexagon::V6_vsub_sf_mix;
+ else if (HST->useHVXV79Ops())
+ return Hexagon::V6_vsub_sf; // conv reqd
+ else
+ llvm_unreachable("Invalid Hexagon Arch for this scenario!");
+ case Hexagon::V6_vsub_qf16:
+ if (HST->useHVXV81Ops())
+ return Hexagon::V6_vsub_hf_mix;
+ else if (HST->useHVXV79Ops())
+ return Hexagon::V6_vsub_hf; // conv reqd
+ else
+ llvm_unreachable("Invalid Hexagon Arch for this scenario!");
+ case Hexagon::V6_vsub_qf32_mix:
+ return Hexagon::V6_vsub_sf;
+ case Hexagon::V6_vsub_qf16_mix:
+ return Hexagon::V6_vsub_hf;
+
+ // This opcode does not have a mixed type. Hence if one
+ // of op1 or op2 is IEEE type and another qf type,
+ // send the opcode which takes in both as IEEE type.
+ case Hexagon::V6_vmpy_qf32:
+ return Hexagon::V6_vmpy_qf32_sf; // conv reqd
+ case Hexagon::V6_vmpy_qf16:
+ return Hexagon::V6_vmpy_qf16_mix_hf; // interchange reqd
+ case Hexagon::V6_vmpy_qf32_qf16:
+ return Hexagon::V6_vmpy_qf32_mix_hf; // interchange reqd
+
+ default:
+ return srcOpcode;
+ }
+ } else if (secondOp) {
+ switch (srcOpcode) {
+ case Hexagon::V6_vadd_qf32:
+ return Hexagon::V6_vadd_qf32_mix;
+ case Hexagon::V6_vadd_qf16:
+ return Hexagon::V6_vadd_qf16_mix;
+
+ case Hexagon::V6_vsub_qf32:
+ return Hexagon::V6_vsub_qf32_mix;
+ case Hexagon::V6_vsub_qf16:
+ return Hexagon::V6_vsub_qf16_mix;
+ case Hexagon::V6_vsub_sf_mix:
+ return Hexagon::V6_vsub_sf;
+ case Hexagon::V6_vsub_hf_mix:
+ return Hexagon::V6_vsub_hf;
+
+ case Hexagon::V6_vmpy_qf32:
+ return Hexagon::V6_vmpy_qf32_sf; // conv reqd
+
+ case Hexagon::V6_vmpy_qf16:
+ return Hexagon::V6_vmpy_qf16_mix_hf;
+ case Hexagon::V6_vmpy_qf32_qf16:
+ return Hexagon::V6_vmpy_qf32_mix_hf;
+
+ default:
+ return srcOpcode;
+ }
+ } else
+ return srcOpcode;
+}
+
+// Insert IEEE to Qf conversion instructions
+// is32bit: If true, SrcReg holds sf type, else a hf type
+void HexagonPostRAHandleQFP::insertIEEEToQF(MachineInstr *MI, Register SrcReg,
+ MachineOperand SrcOp,
+ bool is32bit = false) {
+
+ auto MBB = MI->getParent();
+ MachineInstrBuilder MIB;
+ const DebugLoc &DL = MI->getDebugLoc();
+
+ if (HST->useHVXV81Ops()) {
+ auto Op = is32bit ? Hexagon::V6_vconv_qf32_sf : Hexagon::V6_vconv_qf16_hf;
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(Op), SrcReg)
+ .addReg(SrcReg, RegState::Renamable | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+
+ } else if (HST->useHVXV79Ops()) {
+ // Get an available register
+ auto V0_Reg = findAllocatableReg(MI);
+
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(Hexagon::V6_vd0), V0_Reg);
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+ auto Op = is32bit ? Hexagon::V6_vadd_sf : Hexagon::V6_vadd_hf;
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(Op), SrcReg)
+ .addReg(SrcReg, RegState::Renamable | RegState::Kill)
+ .addReg(V0_Reg, RegState::Kill);
+ LLVM_DEBUG(dbgs() << "Inserting new instruction: "; MIB.getInstr()->dump());
+ } else
+ llvm_unreachable("Not possible to insert qf = hf/sf for this unknown\
+ subtarget!");
+}
+
+// Create a new instruction which handle sf/hf types to replace
+// qf type handling instructions.
+bool HexagonPostRAHandleQFP::HandleRefills() {
+
+ bool Changed = false;
+ LLVM_DEBUG(dbgs() << "HandleRefills: ");
+ std::vector<MachineInstr *> eraseList;
+
+ for (auto It : QFUsesMap) {
+
+ // Separately handle unary qf opcodes
+ MachineInstr *MI = It.first;
+ auto SrcOpcode = MI->getOpcode();
+ auto Pair = It.second;
+ auto SrcOp1 = MI->getOperand(1);
+ Register DestReg = MI->getOperand(0).getReg();
+ auto MBB = MI->getParent();
+ MachineInstrBuilder MIB;
+ LLVM_DEBUG(dbgs() << "\nProcessing: "; MI->dump());
+ const DebugLoc &DL = MI->getDebugLoc();
+
+ // lambda to handle unary qf operations
+ // ieee: True if the 1st operand is sf/hf type, false if qf type
+ auto HandleUnaryRefill = [&](MachineInstr *MI, bool isIeee) -> bool {
+ if (isIeee) {
+ auto finalOpcode = getreplacedQFOpcode(SrcOpcode, true, true);
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1));
+ Changed |= true;
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+ } else
+ eraseList.push_back(MI);
+ return Changed;
+ };
+
+ if (MI->getNumOperands() == 2) {
+ Changed |= HandleUnaryRefill(It.first, It.second.first);
+ continue;
+ }
+ auto SrcOp2 = MI->getOperand(2);
+
+ // lambda to handle mixed type vsub instructions for v79
+ auto HandleSub = [&](auto srcOpcode) -> bool {
+ auto ConvOp = (srcOpcode == Hexagon::V6_vsub_qf32)
+ ? Hexagon::V6_vconv_sf_qf32
+ : Hexagon::V6_vconv_hf_qf16;
+ auto SubOp = (ConvOp == Hexagon::V6_vconv_sf_qf32) ? Hexagon::V6_vsub_sf
+ : Hexagon::V6_vsub_hf;
+
+ Register SrcOp2Reg = SrcOp2.getReg();
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(ConvOp), SrcOp2Reg)
+ .addReg(SrcOp2Reg, getRegState(SrcOp2) | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(SubOp), DestReg)
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1))
+ .addReg(SrcOp2Reg, getRegState(SrcOp2));
+ // If Op2 is not killed, it is used after this instruction.
+ // convert it back to original qf form.
+ if (!SrcOp2.isKill())
+ insertIEEEToQF(&*(++MI->getIterator()), SrcOp2.getReg(), SrcOp2);
+ return true;
+ };
+
+ // If both operands are sf type, we only need to replace the opcode.
+ if (Pair.first == true && Pair.second == true) {
+ auto finalOpcode = getreplacedQFOpcode(SrcOpcode, true, true);
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1))
+ .addReg(SrcOp2.getReg(), getRegState(SrcOp2));
+ Changed |= true;
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+
+ } else if (Pair.first == true && Pair.second == false) {
+ auto finalOpcode = getreplacedQFOpcode(SrcOpcode, true, false);
+
+ // If 2nd op is qf, first op is sf, convert the 2nd
+ // op to sf before inserting the vmpy instruction.
+ if (SrcOpcode == Hexagon::V6_vmpy_qf32) {
+ Register SrcOp2Reg = SrcOp2.getReg();
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32),
+ SrcOp2Reg)
+ .addReg(SrcOp2Reg, getRegState(SrcOp2) | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction before: ";
+ MIB.getInstr()->dump());
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1))
+ .addReg(SrcOp2Reg, getRegState(SrcOp2));
+ // If Op2 is not killed convert back to qf, since there
+ // are uses for this qf op.
+ if (!SrcOp2.isKill())
+ insertIEEEToQF(&*(++MI->getIterator()), SrcOp2.getReg(), SrcOp2,
+ true /* sf type reg */);
+
+ // if the opcode is mixed type, we use Op2 as first operand
+ // since that takes in qf type. Op1 is taken as second op.
+ } else if (finalOpcode == Hexagon::V6_vadd_qf16_mix ||
+ finalOpcode == Hexagon::V6_vadd_qf32_mix ||
+ finalOpcode == Hexagon::V6_vmpy_qf16_mix_hf ||
+ finalOpcode == Hexagon::V6_vmpy_qf32_mix_hf) {
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp2.getReg(), getRegState(SrcOp2))
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1));
+
+ // Subtracting is not associative, so if Op1 is sf/hf type and
+ // Op2 is qf type, we cannot interchange the operands.
+ // For v79, we convert Op2 to IEEE and use the non-mix type
+ // instruction for the subtraction.
+ // For v81, we have an appropiate opcode with vsub(sf/hf, qf) type
+ } else if ((SrcOpcode == Hexagon::V6_vsub_qf32 ||
+ SrcOpcode == Hexagon::V6_vsub_qf16) &&
+ HST->useHVXV79Ops()) {
+ Changed |= HandleSub(SrcOpcode);
+
+ } else {
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1))
+ .addReg(SrcOp2.getReg(), getRegState(SrcOp2));
+ }
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+ Changed |= true;
+ } else if (Pair.first == false && Pair.second == true) {
+
+ auto finalOpcode = getreplacedQFOpcode(SrcOpcode, false, true);
+ // If 2nd op is sf, first op is qf, convert the 1st
+ // op to sf before inserting the vmpy instruction.
+ if (SrcOpcode == Hexagon::V6_vmpy_qf32) {
+ Register SrcOp1Reg = SrcOp1.getReg();
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32),
+ SrcOp1Reg)
+ .addReg(SrcOp1Reg, getRegState(SrcOp1) | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction before: ";
+ MIB.getInstr()->dump());
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp1Reg, getRegState(SrcOp1))
+ .addReg(SrcOp2.getReg(), getRegState(SrcOp2));
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+ // If Op1 is not killed convert back to qf, since there
+ // are uses for this qf op.
+ if (!SrcOp1.isKill())
+ insertIEEEToQF(&*(++MI->getIterator()), SrcOp1.getReg(), SrcOp1,
+ true /*sf type reg*/);
+ } else {
+
+ MIB = BuildMI(*MBB, *MI, DL, HII->get(finalOpcode), DestReg)
+ .addReg(SrcOp1.getReg(), getRegState(SrcOp1))
+ .addReg(SrcOp2.getReg(), getRegState(SrcOp2));
+ LLVM_DEBUG(dbgs() << "\nInserting new instruction: ";
+ MIB.getInstr()->dump());
+ }
+ Changed |= true;
+ } else {
+ // Both the operands of this instructions are valid, so no use of
+ // this instruction is to be modified. We need to remove this
+ // instruction from the action map QFUsesMap.
+ eraseList.push_back(MI);
+ }
+ }
+
+ for (MachineInstr *delMI : eraseList)
+ QFUsesMap.erase(delMI);
+
+ return Changed;
+}
+
+// Insert a new instruction.
+void HexagonPostRAHandleQFP::insertInstr(MachineInstr *MI, unsigned MIOpcode,
+ unsigned SrcReg, unsigned DstReg,
+ RegState Flags) {
+
+ MachineInstrBuilder MIB;
+ MachineBasicBlock *MBB = MI->getParent();
+ DebugLoc DL = MI->getDebugLoc();
+ MachineBasicBlock::iterator MIt = MI;
+ auto MINext = ++MI->getIterator();
+ if (++MIt == MBB->end())
+ MIB = BuildMI(MBB, DL, HII->get(MIOpcode), DstReg).addReg(SrcReg, Flags);
+ else
+ MIB = BuildMI(*MBB, MINext, DL, HII->get(MIOpcode), DstReg)
+ .addReg(SrcReg, Flags);
+ LLVM_DEBUG(dbgs() << "\t\tInserting after conv: "; MIB.getInstr()->dump());
+}
+
+// Find an available vector register to store 0x0. We have reserved vector
+// register v30 to be exempted from being used during register allocation
+// for this purpose.
+MCPhysReg HexagonPostRAHandleQFP::findAllocatableReg(MachineInstr *MI) const {
+ LLVM_DEBUG(dbgs() << "\tUsing V30 register to store a vector of zeroes!");
+ return Hexagon::V30;
+}
+
+// Insert qf = sf/hf conversions before non-saturating instructions
+bool HexagonPostRAHandleQFP::HandleNonSatInstr() {
+
+ for (auto It : QFNonSatMIs) {
+ MachineInstr *MI = It.first;
+ auto MIOpcode = MI->getOpcode();
+ DebugLoc DL = MI->getDebugLoc();
+ auto Op = MI->getOperand(1);
+ Register DefReg = Op.getReg();
+ LLVM_DEBUG(dbgs() << "Analyzing convert instruction: "; MI->dump());
+ // Handle hf = qf16.
+ // Handle f8 = qf16
+ if (MIOpcode == Hexagon::V6_vconv_hf_qf16 ||
+ MIOpcode == Hexagon::V6_vconv_f8_qf16) {
+
+ insertIEEEToQF(MI, DefReg, Op);
+ // TODO Check if there are any reaching def which is qf generating type.
+ // That op should be converted to sf/hf
+ if (!Op.isKill())
+ insertInstr(MI, Hexagon::V6_vconv_hf_qf16, DefReg, DefReg,
+ getRegState(Op) | RegState::Kill);
+
+ // Handle hf = qf.qf.
+ // Handle bf = qf.qf
+ } else if (MIOpcode == Hexagon::V6_vconv_hf_qf32 ||
+ MIOpcode == Hexagon::V6_vconv_bf_qf32) {
+ Register DefLo = HRI->getSubReg(DefReg, Hexagon::vsub_lo);
+ Register DefHi = HRI->getSubReg(DefReg, Hexagon::vsub_hi);
+
+ if (It.second == ConvOperand::HiLo) {
+ insertIEEEToQF(MI, DefLo, Op, true /* sf type */);
+ insertIEEEToQF(MI, DefHi, Op, true /* sf type */);
+
+ // Check which subregister is live and convert it
+ // and according insert conversion for that subreg
+ auto KillState = SubRegKillSet[MI];
+ if (!KillState.first)
+ insertInstr(MI, Hexagon::V6_vconv_sf_qf32, DefHi, DefHi,
+ getRegState(Op) | RegState::Kill);
+
+ if (!KillState.second)
+ insertInstr(MI, Hexagon::V6_vconv_sf_qf32, DefLo, DefLo,
+ getRegState(Op) | RegState::Kill);
+
+ } else if (It.second == ConvOperand::Hi) {
+ insertIEEEToQF(MI, DefHi, Op, true /* sf type */);
+ if (!Op.isKill())
+ insertInstr(MI, Hexagon::V6_vconv_sf_qf32, DefHi, DefHi,
+ getRegState(Op) | RegState::Kill);
+
+ } else { // It.second == ConvOperand::Lo
+ insertIEEEToQF(MI, DefLo, Op, true /* sf type */);
+ if (!Op.isKill())
+ insertInstr(MI, Hexagon::V6_vconv_sf_qf32, DefLo, DefLo,
+ getRegState(Op) | RegState::Kill);
+ }
+ // Handle sf = qf32.
+ } else if (MIOpcode == Hexagon::V6_vconv_sf_qf32) {
+ insertIEEEToQF(MI, DefReg, Op, true /* sf type */);
+ if (!Op.isKill())
+ insertInstr(MI, Hexagon::V6_vconv_sf_qf32, DefReg, DefReg,
+ getRegState(Op) | RegState::Kill);
+
+ } else {
+ llvm_unreachable("Unhandled non-saturating instruction!");
+ }
+ }
+
+ if (QFNonSatMIs.empty())
+ return false;
+ return true;
+}
+
+// Calculates the liveness of subregisters (whether killed or not)
+// when double register is used. This is necessary because RDF
+// carries liveness of the superreg and not the subregisters individually
+void HexagonPostRAHandleQFP::collectLivenessForSubregs(
+ NodeAddr<UseNode *> &UsedNode) {
+ RegisterRef UR = UsedNode.Addr->getRegRef(*DFG);
+ NodeAddr<StmtNode *> UseStmt = UsedNode.Addr->getOwner(*DFG);
+ MachineInstr *UseInstr = UseStmt.Addr->getCode();
+ auto UseOp = UseInstr->getOperand(1);
+ Register UseDefLo = HRI->getSubReg(UseOp.getReg(), Hexagon::vsub_lo);
+ Register UseDefHi = HRI->getSubReg(UseOp.getReg(), Hexagon::vsub_hi);
+
+ NodeSet Visited, Defs;
+ bool isHiSubRegKilled = true, isLoSubRegKilled = true;
+ const auto &P = LV->getAllReachingDefsRec(UR, UsedNode, Visited, Defs);
+
+ if (!P.second)
+ return;
+
+ for (auto RD : P.first) {
+ NodeAddr<DefNode *> RegDef = DFG->addr<DefNode *>(RD);
+ Register RR = RegDef.Addr->getRegRef(*DFG).Id;
+ if (HRI->isFakeReg(RR))
+ continue;
+ NodeAddr<StmtNode *> RegStmt = RegDef.Addr->getOwner(*DFG);
+ MachineInstr *ReachDefInstr = RegStmt.Addr->getCode();
+ if (ReachDefInstr == nullptr)
+ continue;
+
+ // If the reaching def is WReg, then the kill flag in the use is correct
+ // since there is no subreg
+ Register DefReg = ReachDefInstr->getOperand(0).getReg();
+ if (Hexagon::HvxWRRegClass.contains(DefReg)) {
+ if (!UseOp.isKill())
+ isHiSubRegKilled = isLoSubRegKilled = false;
+
+ // If the reaching ref is VReg, the liveness might be different between
+ // each of the subreg. Handle them individually.
+ // Find the other uses after this use for the reaching def. If it exists,
+ // the subregister is live after the use.
+ // NOTE: Assumption: The uses are in order in RDF.
+ } else {
+ NodeSet UseSet;
+ getAllRealUses(RegDef, UseSet, LV, DFG);
+ for (auto UIntr : UseSet) {
+ NodeAddr<UseNode *> UA = DFG->addr<UseNode *>(UIntr);
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*DFG);
+ MachineInstr *UseMI = UseStmt.Addr->getCode();
+ if (UseMI == nullptr)
+ continue;
+ // When we reach the use set a flag to see if there are other uses
+ // after this. If yes, then the register is not killed.
+ if (UseMI == UseInstr)
+ continue;
+ if (HII->isMIBefore(UseInstr, UseMI) && DefReg == UseDefLo) {
+ isLoSubRegKilled = false;
+ break;
+ }
+ if (HII->isMIBefore(UseInstr, UseMI) && DefReg == UseDefHi) {
+ isHiSubRegKilled = false;
+ break;
+ }
+ }
+ }
+ }
+ SubRegKillSet[UseInstr] = std::make_pair(isHiSubRegKilled, isLoSubRegKilled);
+}
+
+// Store all refill instructions.
+void HexagonPostRAHandleQFP::collectQFPStackRefill(
+ NodeAddr<StmtNode *> *StNode) {
+ NodeAddr<DefNode *> DfNode =
+ StNode->Addr->members_if(DFG->IsDef, *DFG).front();
+ MachineInstr *MI = StNode->Addr->getCode();
+ // Check if operand to this instruction is a frame index.
+ const MachineOperand &OpFI = MI->getOperand(1);
+ if (!OpFI.isFI())
+ return;
+
+ // LLVM_DEBUG(dbgs() << "\n[Stack Refill]: Collecting: "; MI->dump());
+ RefillMIs.push_back(DfNode);
+}
+
+// Iterate over the uses of the qf generating instruction in RDG graph
+// If we get a qf to IEEE convert instruction, add it to a list.
+void HexagonPostRAHandleQFP::collectConvQFInstr(NodeAddr<DefNode *> &RegDef) {
+
+ NodeSet UseSet;
+ NodeAddr<StmtNode *> DefStmt = RegDef.Addr->getOwner(*DFG);
+ MachineInstr *DefInstr = DefStmt.Addr->getCode();
+ getAllRealUses(RegDef, UseSet, LV, DFG);
+ for (auto UI : UseSet) {
+ NodeAddr<UseNode *> UA = DFG->addr<UseNode *>(UI);
+ if (UA.Addr->getFlags() & NodeAttrs::PhiRef)
+ continue;
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*DFG);
+ MachineInstr *QFConvInstr = UseStmt.Addr->getCode();
+ if (std::find(QFNonSatInstr.begin(), QFNonSatInstr.end(),
+ QFConvInstr->getOpcode()) != QFNonSatInstr.end()) {
+
+ // The use is a double register type. But the def can be hi/lo or double
+ // type. So conversion needs to be inserted only for the type
+ // which is in IEEE form.
+ auto UseReg = QFConvInstr->getOperand(1).getReg();
+ auto DefReg = DefInstr->getOperand(0).getReg();
+ if (Hexagon::HvxWRRegClass.contains(UseReg)) {
+
+ collectLivenessForSubregs(UA);
+ unsigned Op = ConvOperand::Undefined;
+ if (QFNonSatMIs.contains(QFConvInstr))
+ Op = QFNonSatMIs[QFConvInstr];
+
+ // Def is double type
+ if (Hexagon::HvxWRRegClass.contains(DefReg))
+ Op = ConvOperand::HiLo;
+ // Def is lo of double type
+ else if (DefReg == HRI->getSubReg(UseReg, Hexagon::vsub_lo))
+ Op |= ConvOperand::Lo;
+ // Def is hi of double type
+ else
+ Op |= ConvOperand::Hi;
+ QFNonSatMIs[QFConvInstr] = Op;
+ } else // for other def-use, BothOp is used as default
+ QFNonSatMIs[QFConvInstr] = ConvOperand::HiLo;
+
+ IgnoreInsertConvList.insert(DefInstr);
+ LLVM_DEBUG(std::string OpType = ""; switch (QFNonSatMIs[QFConvInstr]) {
+ case ConvOperand::HiLo:
+ OpType = "HiLo Op";
+ break;
+ case ConvOperand::Lo:
+ OpType = "Lo Op";
+ break;
+ case ConvOperand::Hi:
+ OpType = "Hi Op";
+ break;
+ default:
+ OpType = "Undefined";
+ } dbgs() << "Collecting convert instruction with type "
+ << OpType << " : ";
+ QFConvInstr->dump());
+ }
+ }
+}
+
+// Check if the COPY statements use came from a def which generates
+// a qf type. If yes, collect it in a vector. Also, collect copies
+// with reaching def other copies (nested copies).
+void HexagonPostRAHandleQFP::collectCopies(NodeAddr<StmtNode *> *StNode) {
+
+ NodeAddr<DefNode *> CopyDef =
+ StNode->Addr->members_if(DFG->IsDef, *DFG).front();
+ MachineInstr *CopyInstr = StNode->Addr->getCode();
+ LLVM_DEBUG(dbgs() << "\nAnalyzing copy: "; StNode->Addr->getCode()->dump());
+
+ for (NodeAddr<UseNode *> UA : StNode->Addr->members_if(DFG->IsUse, *DFG)) {
+ RegisterRef UR = UA.Addr->getRegRef(*DFG);
+ NodeSet Visited, Defs;
+ const auto &P = LV->getAllReachingDefsRec(UR, UA, Visited, Defs);
+ if (!P.second) {
+ LLVM_DEBUG({
+ dbgs() << "*** Unable to collect all reaching defs for use ***\n"
+ << PrintNode<UseNode *>(UA, *DFG) << '\n';
+ });
+ continue;
+ }
+
+ // Note: there can be multiple reaching defs of the copy
+ for (auto RD : P.first) {
+ NodeAddr<DefNode *> RegDef = DFG->addr<DefNode *>(RD);
+ Register RR = RegDef.Addr->getRegRef(*DFG).Id;
+ if (HRI->isFakeReg(RR))
+ continue;
+ NodeAddr<StmtNode *> RegStmt = RegDef.Addr->getOwner(*DFG);
+ MachineInstr *ReachDefInstr = RegStmt.Addr->getCode();
+ if (ReachDefInstr == nullptr)
+ continue;
+ LLVM_DEBUG(dbgs() << "\t[Reaching Def]: "; ReachDefInstr->dump());
+
+ // If the reaching def is a COPY,collect it with reg type ieee
+ if (ReachDefInstr->getOpcode() == TargetOpcode::COPY) {
+ auto pairKey = std::make_pair(CopyDef, RegDef);
+ QFCopys[pairKey] = RegType::ieee;
+ continue;
+ }
+
+ // If the reaching def is a qf instr, collect the copy.
+ // reg type is selected based on the op
+ auto RegT = RegType::undefined;
+ if (HII->isQFPInstr(ReachDefInstr)) {
+ if (HII->isQFP32Instr(ReachDefInstr)) {
+ // check whether the copies register is hvxWR or hvxVR type
+ // NOTE: Assumption: A copy's reaching def shall not be 2,
+ // i.e., for each of the subregister.
+ if (Hexagon::HvxWRRegClass.contains(
+ ReachDefInstr->getOperand(0).getReg()))
+ RegT = RegType::qf32_double;
+ else
+ RegT = RegType::qf32;
+ } else if (HII->isQFP16Instr(ReachDefInstr)) {
+ // Check if qf16 instruction outputs double-wide register
+ if (Hexagon::HvxWRRegClass.contains(
+ ReachDefInstr->getOperand(0).getReg())) {
+ RegT = RegType::qf16_double;
+ } else {
+ RegT = RegType::qf16;
+ }
+ }
+ } else {
+ // if the copy involves non-qf vector registers collect it too
+ Register CopyReg = CopyInstr->getOperand(1).getReg();
+ if (Hexagon::HvxWRRegClass.contains(CopyReg) ||
+ Hexagon::HvxVRRegClass.contains(CopyReg))
+ RegT = RegType::ieee;
+ else
+ continue;
+ }
+ auto pairKey = std::make_pair(CopyDef, RegDef);
+ QFCopys[pairKey] = RegT;
+ }
+ }
+}
+
+// Inserts an qf instruction to a list. These instruction
+// values are spilled to the stack.
+void HexagonPostRAHandleQFP::collectQFPStackSpill(
+ NodeAddr<StmtNode *> *StNode) {
+
+ MachineInstr *MI = StNode->Addr->getCode();
+ LLVM_DEBUG(dbgs() << "\n[Stack Spill]: Analyzing: "; MI->dump());
+ // Check if operand to this instruction is a frame index.
+ const MachineOperand &OpFI = MI->getOperand(0);
+ if (!OpFI.isFI())
+ return;
+
+ // Pre-RegAlloc
+ //%46:hvxwr = V6_vmpy_qf32_hf %7:hvxvr, %10:hvxvr
+ // PS_vstorerw_ai %stack.3, 0, %46:hvxwr :: (store (s2048) into %stack.3,
+ // align 128)
+ // Post-RegAlloc
+ // renamable $w4 = V6_vmpy_qf32_hf killed renamable $v1, renamable $v0
+ // PS_vstorerw_ai %stack.3, 0, renamable $w4 :: (store (s2048) into %stack.3,
+ // align 128)
+
+ if (!MI->getOperand(2).isReg())
+ return;
+
+ // Iterate over the operands of the store instruction to get their reaching
+ // defs
+ NodeId QFPDefNode = 0;
+ for (NodeAddr<UseNode *> UA : StNode->Addr->members_if(DFG->IsUse, *DFG)) {
+ QFPDefNode = UA.Addr->getReachingDef();
+
+ // Get the defining instruction node(s)
+ NodeAddr<DefNode *> RegDef = DFG->addr<DefNode *>(QFPDefNode);
+ assert(QFPDefNode != 0 && "Reaching def computation error");
+ NodeAddr<StmtNode *> RegStmt = RegDef.Addr->getOwner(*DFG);
+ MachineInstr *ReachDefInstr = RegStmt.Addr->getCode();
+ if (ReachDefInstr == nullptr)
+ continue;
+ LLVM_DEBUG(dbgs() << "[Stack Spill]:\tReaching Def of operand:";
+ ReachDefInstr->dump());
+ // Reaching Def cannot be a phi instruction.
+ if (RegDef.Addr->getFlags() & NodeAttrs::PhiRef)
+ continue;
+
+ if (!HII->isQFPInstr(ReachDefInstr))
+ continue;
+
+ auto RR = RegDef.Addr->getRegRef(*DFG).Id;
+ if (HRI->isFakeReg(RR))
+ continue;
+
+ LLVM_DEBUG(dbgs() << "Found a QFPStackSpill via \n"; MI->dump();
+ dbgs() << "The corresponding XQF instruction is:\n";
+ ReachDefInstr->dump());
+
+ // Collect the spills.
+ SpillMIs.push_back(std::make_pair(MI, RegDef));
+ }
+}
+
+// Find the uses of qf generating instructions and conditionally add them
+// to a list.
+void HexagonPostRAHandleQFP::collectQFUses(NodeAddr<DefNode *> RegDef,
+ MachineInstr *DefMI) {
+
+ NodeSet UseSet;
+ LLVM_DEBUG(dbgs() << " Finding uses of: "; DefMI->dump(););
+ getAllRealUses(RegDef, UseSet, LV, DFG);
+
+ for (auto UI : UseSet) {
+ NodeAddr<UseNode *> UA = DFG->addr<UseNode *>(UI);
+ if (UA.Addr->getFlags() & NodeAttrs::PhiRef)
+ continue;
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*DFG);
+ MachineInstr *UseMI = UseStmt.Addr->getCode();
+ LLVM_DEBUG(dbgs() << "\t\t\t[Reached Use of QF operand]: "; UseMI->dump());
+
+ Register UsedReg = UA.Addr->getRegRef(*DFG).Id;
+ if (QFPSatInstsMap.find(UseMI->getOpcode()) != QFPSatInstsMap.end()) {
+ if (PossibleMultiReachDefs.count(UseStmt) == 0) {
+ PossibleMultiReachDefs.insert(UseStmt);
+ LLVM_DEBUG(dbgs() << "\n[Collect instr with possible multidef]:";
+ UseMI->dump());
+ }
+ conditionallyInsert(*UseMI, UsedReg);
+ }
+ }
+}
+
+// Process the list which can have multiple definitions. A possible case
+// can be reaching defs to be a copy and a qf-generating instr respectively.
+// Only handle the qf-generating instruction by inserting convert to sf/hf
+// after it. Additionally, then handle the reached uses of this reaching
+// def since the type has changed to sf/hf from qf after the conversion.
+bool HexagonPostRAHandleQFP::HandleMultiReachingDefs() {
+
+ bool Changed = false;
+ // Note: It may seem this loop can further add to PossibleMultiReachDefs.
+ // But it is not expected to since if any instruction has multiple
+ // definitions it should already be present in it.
+ for (auto It : PossibleMultiReachDefs) {
+ MachineInstr *Instr = It.Addr->getCode();
+ // get the op type for the original instruction.
+ // True is sf/hf, false is qf
+ auto Pair = QFUsesMap[Instr];
+
+ unsigned short UseNo = 1;
+ // Iterate over the operands
+ for (NodeAddr<UseNode *> UA : It.Addr->members_if(DFG->IsUse, *DFG)) {
+
+ // If the type is qf for the operand,
+ // we skip since there is no scope for mismatch
+ if ((UseNo == 1 && Pair.first == false) ||
+ (UseNo == 2 && Pair.second == false)) {
+ ++UseNo;
+ continue;
+ }
+
+ RegisterRef UR = UA.Addr->getRegRef(*DFG);
+ NodeSet Visited, Defs;
+ const auto &P = LV->getAllReachingDefsRec(UR, UA, Visited, Defs);
+ if (!P.second) {
+ LLVM_DEBUG({
+ dbgs() << "*** Unable to collect all reaching defs for use ***\n"
+ << PrintNode<UseNode *>(UA, *DFG) << '\n';
+ });
+ continue;
+ }
+
+ // Iterate over the reaching defs and process the ones which
+ // generate qf. Ignore the ones which have already been handled
+ for (auto RD : P.first) {
+ NodeAddr<DefNode *> RegDef = DFG->addr<DefNode *>(RD);
+
+ // Ignore fake reaches
+ auto RR = RegDef.Addr->getRegRef(*DFG).Id;
+ if (HRI->isFakeReg(RR))
+ continue;
+
+ NodeAddr<StmtNode *> RegStmt = RegDef.Addr->getOwner(*DFG);
+ MachineInstr *ReachDefInstr = RegStmt.Addr->getCode();
+
+ if (ReachDefInstr == nullptr)
+ continue;
+
+ if (!HII->isQFPInstr(ReachDefInstr))
+ continue;
+ if (IgnoreInsertConvList.find(ReachDefInstr) !=
+ IgnoreInsertConvList.end())
+ continue;
+ LLVM_DEBUG(dbgs() << "[Multidef] Handling reaching def:";
+ ReachDefInstr->dump());
+
+ auto *MBB = ReachDefInstr->getParent();
+ auto &dl = ReachDefInstr->getDebugLoc();
+ auto NextReachMI = ++ReachDefInstr->getIterator();
+ auto DefOp = ReachDefInstr->getOperand(0);
+ Register OpReg = DefOp.getReg();
+ MachineInstrBuilder MIB;
+
+ // For double vector regs, two conversions are inserted. Single
+ // conversion for qf32 type
+ if (HII->isQFP32Instr(ReachDefInstr)) {
+ // if the reaching def is a qf double type
+ if (Hexagon::HvxWRRegClass.contains(
+ ReachDefInstr->getOperand(0).getReg())) {
+ Register RegLo = HRI->getSubReg(OpReg, Hexagon::vsub_lo);
+ Register RegHi = HRI->getSubReg(OpReg, Hexagon::vsub_hi);
+ MIB = BuildMI(*MBB, NextReachMI, dl,
+ HII->get(Hexagon::V6_vconv_sf_qf32), RegLo)
+ .addReg(RegLo, RegState::Renamable | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "[MultiDef] Inserting convert instruction: ";
+ MIB.getInstr()->dump());
+ MIB = BuildMI(*MBB, NextReachMI, dl,
+ HII->get(Hexagon::V6_vconv_sf_qf32), RegHi)
+ .addReg(RegHi, RegState::Renamable | RegState::Kill);
+ } else { // If the reaching def is a qf type
+ MIB = BuildMI(*MBB, NextReachMI, dl,
+ HII->get(Hexagon::V6_vconv_sf_qf32), OpReg)
+ .addReg(OpReg, RegState::Renamable | RegState::Kill);
+ }
+ }
+ if (HII->isQFP16Instr(ReachDefInstr)) {
+ MIB = BuildMI(*MBB, NextReachMI, dl,
+ HII->get(Hexagon::V6_vconv_hf_qf16), OpReg)
+ .addReg(OpReg, RegState::Renamable | RegState::Kill);
+ }
+ LLVM_DEBUG(dbgs() << "[MultiDef] Inserting convert instruction: ";
+ MIB.getInstr()->dump(); dbgs() << "\tafter instruction: ";
+ ReachDefInstr->dump());
+
+ // find the uses of the newly transformed to sf/hf and handle
+ // accordingly. Uses can be vmul/vadd/etc. types or converts which take
+ // in qf types.
+ collectQFUses(RegDef, ReachDefInstr);
+ collectConvQFInstr(RegDef);
+ IgnoreInsertConvList.insert(ReachDefInstr);
+ Changed = true;
+ }
+ UseNo++;
+ }
+ }
+ return Changed;
+}
+
+bool HexagonPostRAHandleQFP::HandleConvertToQfCopies() {
+ if (ConvertToQfCopies.empty())
+ return false;
+
+ LLVM_DEBUG(
+ dbgs() << "\n*** Inserting convert to qf for selected copies ***\n");
+
+ // Any reached use of the copy should not already be collected to be
+ // converted to IEEE. If present, it means that the reached use has
+ // other reaching def with type IEEE, other than this copy.
+ auto CanTransform = [&](MachineInstr *MI, unsigned OpNo) -> bool {
+ if (QFUsesMap.find(MI) != QFUsesMap.end()) {
+ auto Entry = QFUsesMap[MI];
+ if (OpNo == 1 && Entry.first == true)
+ return false;
+ if (OpNo == 2 && Entry.second == true)
+ return false;
+ }
+ return true;
+ };
+
+ for (auto It : ConvertToQfCopies) {
+ NodeSet UseSet;
+ getAllRealUses(It.second.first, UseSet, LV, DFG);
+
+ bool transform = true;
+ for (auto UI : UseSet) {
+ NodeAddr<UseNode *> UA = DFG->addr<UseNode *>(UI);
+ if (UA.Addr->getFlags() & NodeAttrs::PhiRef)
+ continue;
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*DFG);
+ MachineInstr *UseMI = UseStmt.Addr->getCode();
+ unsigned OpNo = UA.Addr->getOp().getOperandNo();
+
+ if (!CanTransform(UseMI, OpNo)) {
+ transform = false;
+ break;
+ }
+ }
+
+ if (transform) {
+
+ LLVM_DEBUG(dbgs() << "\n[HandleConvertToQfCopies]\tProcessing Copy:";
+ It.first->dump());
+ auto CopyOp = It.first->getOperand(0);
+ auto NextMIIter = std::next(It.first->getIterator());
+ switch (It.second.second) {
+ case RegType::qf32_double: {
+ Register DefLo = HRI->getSubReg(CopyOp.getReg(), Hexagon::vsub_lo);
+ Register DefHi = HRI->getSubReg(CopyOp.getReg(), Hexagon::vsub_hi);
+ insertIEEEToQF(&*NextMIIter, DefLo, CopyOp, /*is32bit=*/true);
+ insertIEEEToQF(&*NextMIIter, DefHi, CopyOp, /*is32bit=*/true);
+ break;
+ }
+ case RegType::qf16_double: {
+ Register DefLo = HRI->getSubReg(CopyOp.getReg(), Hexagon::vsub_lo);
+ Register DefHi = HRI->getSubReg(CopyOp.getReg(), Hexagon::vsub_hi);
+ insertIEEEToQF(&*NextMIIter, DefLo, CopyOp, /*is32bit=*/false);
+ insertIEEEToQF(&*NextMIIter, DefHi, CopyOp, /*is32bit=*/false);
+ break;
+ }
+ case RegType::qf16:
+ insertIEEEToQF(&*NextMIIter, CopyOp.getReg(), CopyOp,
+ /*is32bit=*/false);
+ break;
+ case RegType::qf32:
+ insertIEEEToQF(&*NextMIIter, CopyOp.getReg(), CopyOp, /*is32bit=*/true);
+ break;
+ default:
+ break;
+ }
+ } else {
+ collectQFUses(It.second.first, It.first);
+ collectConvQFInstr(It.second.first);
+ }
+ }
+ return true;
+}
+
+bool HexagonPostRAHandleQFP::HandleReachDefOfCopies() {
+ if (ReachDefOfCopies.empty())
+ return false;
+
+ MachineInstrBuilder MIB;
+ for (auto It : ReachDefOfCopies) {
+ auto *MBB = It.first->getParent();
+ auto &dl = It.first->getDebugLoc();
+ auto NextMI = ++(It.first)->getIterator();
+ auto RegOp = It.first->getOperand(0);
+ Register OpReg = RegOp.getReg();
+
+ if (It.second == RegType::qf32)
+ MIB =
+ BuildMI(*MBB, NextMI, dl, HII->get(Hexagon::V6_vconv_sf_qf32), OpReg)
+ .addReg(OpReg, RegState::Renamable | RegState::Kill);
+ else if (It.second == RegType::qf16)
+ MIB =
+ BuildMI(*MBB, NextMI, dl, HII->get(Hexagon::V6_vconv_hf_qf16), OpReg)
+ .addReg(OpReg, RegState::Renamable | RegState::Kill);
+ else if (It.second == RegType::qf32_double) {
+ Register RegLo = HRI->getSubReg(OpReg, Hexagon::vsub_lo);
+ Register RegHi = HRI->getSubReg(OpReg, Hexagon::vsub_hi);
+ MIB =
+ BuildMI(*MBB, NextMI, dl, HII->get(Hexagon::V6_vconv_sf_qf32), RegLo)
+ .addReg(RegLo, RegState::Renamable | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "Inserting convert instruction: ";
+ MIB.getInstr()->dump());
+ MIB =
+ BuildMI(*MBB, NextMI, dl, HII->get(Hexagon::V6_vconv_sf_qf32), RegHi)
+ .addReg(RegHi, RegState::Renamable | RegState::Kill);
+ } else if (It.second == RegType::qf16_double) {
+ Register RegLo = HRI->getSubReg(OpReg, Hexagon::vsub_lo);
+ Register RegHi = HRI->getSubReg(OpReg, Hexagon::vsub_hi);
+ MIB =
+ BuildMI(*MBB, NextMI, dl, HII->get(Hexagon::V6_vconv_hf_qf16), RegLo)
+ .addReg(RegLo, RegState::Renamable | RegState::Kill);
+ LLVM_DEBUG(dbgs() << "Inserting convert instruction: ";
+ MIB.getInstr()->dump());
+ MIB =
+ BuildMI(*MBB, NextMI, dl, HII->get(Hexagon::V6_vconv_hf_qf16), RegHi)
+ .addReg(RegHi, RegState::Renamable | RegState::Kill);
+ }
+ LLVM_DEBUG(dbgs() << "Inserting convert instruction: ";
+ MIB.getInstr()->dump(); dbgs() << "\tafter instruction: ";
+ It.first->dump());
+ }
+ return true;
+}
+
+HexagonPostRAHandleQFP::RegType
+HexagonPostRAHandleQFP::HasQfUses(NodeAddr<DefNode *> CopyDef,
+ MachineInstr *CopyMI) {
+ NodeSet UseSet;
+ getAllRealUses(CopyDef, UseSet, LV, DFG);
+
+ if (UseSet.size() == 0)
+ return RegType::undefined;
+
+ bool hasQf16Use = false;
+ bool hasQf32Use = false;
+
+ LLVM_DEBUG(dbgs() << "[COPY]\nUses of the copy are: ");
+ for (auto UI : UseSet) {
+ NodeAddr<UseNode *> UA = DFG->addr<UseNode *>(UI);
+ if (UA.Addr->getFlags() & NodeAttrs::PhiRef)
+ continue;
+ NodeAddr<StmtNode *> UseStmt = UA.Addr->getOwner(*DFG);
+ MachineInstr *UseMI = UseStmt.Addr->getCode();
+ unsigned OpNo = UA.Addr->getOp().getOperandNo();
+
+ LLVM_DEBUG(dbgs() << "\nCopy's use: "; UseMI->dump());
+ // Any reached use should not be a non-qf instruction
+ if (!HII->usesQFOperand(UseMI, OpNo))
+ return RegType::ieee;
+
+ // Determine the qf type from the use
+ if (HII->usesQF16Operand(UseMI, OpNo))
+ hasQf16Use = true;
+ else if (HII->usesQF32Operand(UseMI, OpNo))
+ hasQf32Use = true;
+
+ // Any reached use should not already be converted to IEEE.
+ // If present, it means that the reached use has other reaching def
+ // other than the copy.
+ if (QFUsesMap.find(UseMI) != QFUsesMap.end()) {
+ auto Entry = QFUsesMap[UseMI];
+ if (OpNo == 1 && Entry.first == true)
+ return RegType::ieee;
+ if (OpNo == 2 && Entry.second == true)
+ return RegType::ieee;
+ }
+ }
+
+ // Set the output type based on uses
+ if (hasQf16Use) {
+ // Check if copy destination is double-wide
+ if (Hexagon::HvxWRRegClass.contains(CopyMI->getOperand(0).getReg()))
+ return RegType::qf16_double;
+ else
+ return RegType::qf16;
+ } else if (hasQf32Use) {
+ if (Hexagon::HvxWRRegClass.contains(CopyMI->getOperand(0).getReg()))
+ return RegType::qf32_double;
+ else
+ return RegType::qf32;
+ }
+
+ return RegType::undefined;
+}
+
+// Go through the collected copies and insert conversion to sf/hf
+// conditionally *after their reaching defs*. This is done because there
+// can be mutliple reaching defs of the copies. Also, check for the uses
+// of the reaching def and handle qf uses too by changing opcode or
+// inserting converts.
+// Additionally, check for the uses of the copy
+// and handle them via changing opcode or inserting converts.
+bool HexagonPostRAHandleQFP::HandleCopies() {
+
+ bool Changed = false;
+
+ // If a convert is inserted after a reaching def, add it to ignorelist.
+ // This is because this reaching def can be reaching def of other copies
+ // due to non-SSA form.
+ for (auto It : QFCopys) {
+
+ // Get details of the copy node
+ NodeAddr<DefNode *> CopyNode = It.first.first;
+ NodeAddr<StmtNode *> StNode = CopyNode.Addr->getOwner(*DFG);
+ [[maybe_unused]] auto *CopyMI = StNode.Addr->getCode();
+ LLVM_DEBUG(dbgs() << "\nHandling Reaching Defs of COPY: "; CopyMI->dump();
+ std::string Type; switch (It.second) {
+ case RegType::qf32_double:
+ Type = "qf32_double";
+ break;
+ case RegType::qf32:
+ Type = "qf32";
+ break;
+ case RegType::qf16:
+ Type = "qf16";
+ break;
+ case RegType::qf16_double:
+ Type = "qf16_double";
+ break;
+ default:
+ Type = "ieee";
+ } dbgs() << "\t Type: "
+ << Type << "\n");
+
+ // insert convert to IEEE after the reaching def if it generates qf type
+ RegType RTy = It.second;
+ if (RTy != RegType::ieee) {
+
+ // get details of the reaching def node
+ NodeAddr<DefNode *> ReachDefNode = It.first.second;
+ NodeAddr<StmtNode *> StNode = ReachDefNode.Addr->getOwner(*DFG);
+ auto *ReachingDef = StNode.Addr->getCode();
+
+ if (IgnoreInsertConvList.find(ReachingDef) != IgnoreInsertConvList.end())
+ continue;
+
+ // Collect the reaching defs to be processed later.
+ ReachDefOfCopies.insert(std::make_pair(ReachingDef, RTy));
+
+ // Process the reached uses of the reaching def now for
+ // incorrect usage, since the register type has changed
+ // following the conversion.
+ LLVM_DEBUG(dbgs() << "\n[COPY]\tAnalyzing uses of the reaching defs \
+ of the copy...");
+ collectQFUses(ReachDefNode, ReachingDef);
+ collectConvQFInstr(ReachDefNode);
+ IgnoreInsertConvList.insert(ReachingDef);
+ Changed = true;
+ }
+ }
+
+ // Loop through copies with qf uses
+ for (auto It : QFCopys) {
+
+ // Get details of the copy node
+ NodeAddr<DefNode *> CopyNode = It.first.first;
+ NodeAddr<StmtNode *> StNode = CopyNode.Addr->getOwner(*DFG);
+ auto *CopyMI = StNode.Addr->getCode();
+ LLVM_DEBUG(dbgs() << "\nHandling COPY: "; CopyMI->dump());
+ RegType RTy = It.second;
+
+ // Process the reached uses of the copy to find any incorrect
+ // qf uses. If the copy's uses are all qf types, we need to convert
+ // its result back to qf
+ // FIXME: don't include the copy if its the last instruction since
+ // it is *probably* not possible to insert via BuildMI at the end of BB
+ RTy = HasQfUses(CopyNode, CopyMI);
+ if (RTy != RegType::ieee && RTy != RegType::undefined &&
+ (++CopyMI->getIterator() != CopyMI->getParent()->end())) {
+ if (!ConvertToQfCopies.contains(CopyMI)) {
+ ConvertToQfCopies[CopyMI] = std::make_pair(CopyNode, RTy);
+ LLVM_DEBUG(dbgs() << "\n[ConvertToQfCopies]\tAdded copy: ";
+ CopyMI->dump(); std::string Type; switch (RTy) {
+ case RegType::qf32_double:
+ Type = "qf32_double";
+ break;
+ case RegType::qf32:
+ Type = "qf32";
+ break;
+ case RegType::qf16:
+ Type = "qf16";
+ break;
+ case RegType::qf16_double:
+ Type = "qf16_double";
+ break;
+ default:
+ Type = "ieee";
+ } dbgs() << "\t Type: "
+ << Type << "\n");
+ }
+ continue;
+ }
+ LLVM_DEBUG(dbgs() << "\n[COPY]\tAnalyzing uses of the copy...");
+ collectQFUses(CopyNode, CopyMI);
+ collectConvQFInstr(CopyNode);
+ }
+
+ Changed |= HandleReachDefOfCopies();
+ Changed |= HandleMultiReachingDefs();
+ Changed |= HandleConvertToQfCopies();
+
+ return Changed;
+}
+
+// Inserts conversion instruction sf/hf = qf before spilling
+// Uses the same physical register for conversion.
+// Additinally checks for the uses of the register; and
+// conditionally store them to handle later.
+bool HexagonPostRAHandleQFP::HandleSpills() {
+
+ LLVM_DEBUG(dbgs() << "\n[Handling Spill]\n");
+ bool Changed = false;
+ for (auto It : SpillMIs) {
+
+ MachineInstr *MI = It.first;
+ auto OpC = MI->getOpcode();
+ DebugLoc DL = MI->getDebugLoc();
+
+ auto NodeDef = It.second;
+ NodeAddr<StmtNode *> Stmt = NodeDef.Addr->getOwner(*DFG);
+ MachineInstr *DefMI = Stmt.Addr->getCode();
+ auto RegOp = MI->getOperand(2);
+ Register DefR = RegOp.getReg();
+
+ // handles widened qf16/qf32 instructions.
+ if (OpC == Hexagon::PS_vstorerw_ai) {
+ if (!Hexagon::HvxWRRegClass.contains(DefR))
+ assert(false && " Unhandled Vector Register class passed\n");
+ // Walk through the uses of DefLo and DefHi and if there is QFP
+ // instructions, the instruction needs to be updated to use sf operands
+ // instead of qf operands.
+ collectQFUses(NodeDef, DefMI);
+
+ if (IgnoreInsertConvList.find(DefMI) != IgnoreInsertConvList.end())
+ continue;
+
+ // Collect the reached uses of ReachDefInstr
+ // which are sf/hf = qf conversion instructions.
+ collectConvQFInstr(NodeDef);
+ Register DefLo = HRI->getSubReg(DefR, Hexagon::vsub_lo);
+ Register DefHi = HRI->getSubReg(DefR, Hexagon::vsub_hi);
+
+ // Create two copy instructions, one each for Hi and Lo conditionally.
+ // Liveness is the same is for the store instruction for the register.
+ // If both are double registers, two insertions are done.
+ // If one of the subregs are reaching to the store, conversion is done
+ // for that subreg.
+ Register DReg = DefMI->getOperand(0).getReg();
+ if (HII->isQFP16Instr(DefMI)) {
+ if (DefLo == DReg || Hexagon::HvxWRRegClass.contains(DReg))
+ insertInstr(DefMI, Hexagon::V6_vconv_hf_qf16, DefLo, DefLo,
+ getRegState(RegOp) | RegState::Kill);
+
+ if (DefHi == DReg || Hexagon::HvxWRRegClass.contains(DReg))
+ insertInstr(DefMI, Hexagon::V6_vconv_hf_qf16, DefHi, DefHi,
+ getRegState(RegOp) | RegState::Kill);
+ } else if (HII->isQFP32Instr(DefMI)) {
+ if (DefLo == DReg || Hexagon::HvxWRRegClass.contains(DReg))
+ insertInstr(DefMI, Hexagon::V6_vconv_sf_qf32, DefLo, DefLo,
+ getRegState(RegOp) | RegState::Kill);
+
+ if (DefHi == DReg || Hexagon::HvxWRRegClass.contains(DReg))
+ insertInstr(DefMI, Hexagon::V6_vconv_sf_qf32, DefHi, DefHi,
+ getRegState(RegOp) | RegState::Kill);
+ }
+ IgnoreInsertConvList.insert(DefMI);
+ Changed = true;
+
+ // Handles instructions which output qf32 type.
+ } else if (OpC == Hexagon::PS_vstorerv_ai && HII->isQFP32Instr(DefMI)) {
+ collectQFUses(NodeDef, DefMI);
+ if (IgnoreInsertConvList.find(DefMI) != IgnoreInsertConvList.end())
+ continue;
+ collectConvQFInstr(NodeDef);
+
+ insertInstr(DefMI, Hexagon::V6_vconv_sf_qf32, DefR, DefR,
+ getRegState(RegOp) | RegState::Kill);
+
+ IgnoreInsertConvList.insert(DefMI);
+ Changed = true;
+
+ // Handles instructions which output qf16 type.
+ } else if (OpC == Hexagon::PS_vstorerv_ai && HII->isQFP16Instr(DefMI)) {
+ collectQFUses(NodeDef, DefMI);
+ if (IgnoreInsertConvList.find(DefMI) != IgnoreInsertConvList.end())
+ continue;
+ collectConvQFInstr(NodeDef);
+
+ insertInstr(DefMI, Hexagon::V6_vconv_hf_qf16, DefR, DefR,
+ getRegState(RegOp) | RegState::Kill);
+
+ IgnoreInsertConvList.insert(DefMI);
+ Changed = true;
+ } else {
+ LLVM_DEBUG(MI->dump());
+ llvm_unreachable("This case is not handled. Look above for MI\n");
+ }
+ }
+ return Changed;
+}
+
+bool HexagonPostRAHandleQFP::runOnMachineFunction(MachineFunction &MF) {
+
+ if (DisablePostRAHandleQFloat)
+ return false;
+
+ LLVM_DEBUG(
+ dbgs() << "\n=== Entering Hexagon Fixup QF spills and refills pass ===\n"
+ << "Mode: ";
+ switch (QFloatModeValue) {
+ case QFloatMode::StrictIEEE:
+ dbgs() << "Strict IEEE";
+ break;
+ case QFloatMode::IEEE:
+ dbgs() << "IEEE";
+ break;
+ case QFloatMode::Lossy:
+ dbgs() << "Lossy";
+ break;
+ default:
+ dbgs() << "Legacy";
+ break;
+ };
+ dbgs() << "\n";);
+ bool Changed = false;
+
+ auto &_HST = MF.getSubtarget<HexagonSubtarget>();
+ if (!_HST.useHVXOps())
+ return false;
+
+ HII = _HST.getInstrInfo();
+
+ // If the mode is legacy, the function may not contain qf instructions
+ // check if this pass is required to run for legacy mode.
+ if (QFloatModeValue == QFloatMode::Legacy)
+ if (!HII->hasQFPInstrs(MF))
+ return false;
+
+ HRI = _HST.getRegisterInfo();
+ MRI = &MF.getRegInfo();
+ const auto &MDF = getAnalysis<MachineDominanceFrontierWrapperPass>().getMDF();
+ MachineDominatorTree *MDT =
+ &getAnalysis<MachineDominatorTreeWrapperPass>().getDomTree();
+ HST = &_HST;
+
+ // We need Register Dataflow Graph(RDG) to calculate reaching definitions
+ // since the Machine code is not in SSA.
+ // DDG holds the graph on which we iterate for the nodes.
+ DataFlowGraph G(MF, *HII, *HRI, *MDT, MDF);
+ G.build();
+ DFG = &G;
+
+ Liveness L(*MRI, *DFG);
+ L.computePhiInfo();
+ LV = &L;
+
+ // Find and save the list of QFP stack spills.
+ // For refills store all refill instructions to process conditionally later.
+ NodeAddr<FuncNode *> FA = DFG->getFunc();
+ LLVM_DEBUG(dbgs() << "==== [RefMap#]=====:\n "
+ << Print<NodeAddr<FuncNode *>>(FA, *DFG) << "\n");
+ for (NodeAddr<BlockNode *> BA : FA.Addr->members(*DFG)) {
+ for (auto IA : BA.Addr->members(*DFG)) {
+
+ if (!DFG->IsCode<NodeAttrs::Stmt>(IA))
+ continue;
+
+ // 'SA' holds the Statement node which contains the machine instruction.
+ NodeAddr<StmtNode *> SA = IA;
+ MachineInstr *I = SA.Addr->getCode();
+
+ switch (I->getOpcode()) {
+ case Hexagon::PS_vstorerw_ai:
+ case Hexagon::PS_vstorerv_ai:
+ collectQFPStackSpill(&SA);
+ break;
+ case Hexagon::PS_vloadrw_ai:
+ case Hexagon::PS_vloadrv_ai:
+ collectQFPStackRefill(&SA);
+ break;
+ case TargetOpcode::COPY:
+ collectCopies(&SA);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+
+ // Walk through the spills and insert converts when necessary.
+ // Additionally, walk though the uses of the converts and
+ // store them conditionally for later processing.
+ LLVM_DEBUG(dbgs() << "\nHandling spills....");
+ Changed |= HandleSpills();
+ SpillMIs.clear();
+
+ // Walk through the uses of the refill instructions.
+ // Process them if they are used as qf operands.
+ LLVM_DEBUG(dbgs() << "\nCollecting refills....\n");
+ for (NodeAddr<DefNode *> DfNode : RefillMIs) {
+
+ NodeAddr<StmtNode *> Stmt = DfNode.Addr->getOwner(*DFG);
+ MachineInstr *DefMI = Stmt.Addr->getCode();
+ collectQFUses(DfNode, DefMI);
+ collectConvQFInstr(DfNode);
+ }
+ RefillMIs.clear();
+
+ LLVM_DEBUG(dbgs() << "\nHandling copies....");
+ Changed |= HandleCopies();
+ QFCopys.clear();
+ PossibleMultiReachDefs.clear();
+ ReachDefOfCopies.clear();
+ ConvertToQfCopies.clear();
+
+ LLVM_DEBUG(dbgs() << "\n === QF Uses map === "; for (auto It : QFUsesMap) {
+ dbgs() << "\nInstruction: ";
+ It.first->dump();
+ dbgs() << "\t Property: " << It.second.first << " ," << It.second.second;
+ });
+
+ // Insert new opcodes as applicable for the refill uses.
+ // Delete the original instructions.
+ Changed |= HandleRefills();
+
+ // Handle non-saturating instructions by inserting convert(s) from sf to qf.
+ Changed |= HandleNonSatInstr();
+ QFNonSatMIs.clear();
+ // Cleanup
+ for (auto It : QFUsesMap)
+ It.first->eraseFromParent();
+ QFUsesMap.clear();
+ IgnoreInsertConvList.clear();
+
+ // Option if enabled, checks for qf use-def mismatches
+ if (EnablePostRAXqfCompliance) {
+ dbgs() << "\nChecking for ABI compliance for XQF post register \
+allocation for function: "
+ << MF.getName() << "\n";
+ DataFlowGraph DFG(MF, *HII, *HRI, *MDT, MDF);
+ DFG.build();
+ Liveness LV(*MRI, DFG);
+ LV.computeLiveIns();
+ XqfPostRADiagnosis VDiag(DFG, LV, HII);
+ VDiag.runCompliance();
+ }
+ return Changed;
+}
+
+//===----------------------------------------------------------------------===//
+// Public Constructor Functions
+//===----------------------------------------------------------------------===//
+INITIALIZE_PASS_BEGIN(HexagonPostRAHandleQFP, "handle-qfp-spills-refills",
+ "Hexagon Post RA Handle QFloat", false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTreeWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(MachineDominanceFrontierWrapperPass)
+INITIALIZE_PASS_END(HexagonPostRAHandleQFP, "handle-qfp-spills-refills",
+ "Hexagon PostRA Handle QFloat", false, false)
+
+FunctionPass *llvm::createHexagonPostRAHandleQFP() {
+ return new HexagonPostRAHandleQFP();
+}
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp b/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp
index 5c72b6cb20883..9c62e2528562e 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp
@@ -40,6 +40,10 @@ static cl::opt<bool>
static cl::opt<bool> EnableRDFOpt("rdf-opt", cl::Hidden, cl::init(true),
cl::desc("Enable RDF-based optimizations"));
+static cl::opt<bool> EnablePostRAHandleQFP(
+ "hexagon-handle-qfloat", cl::init(true), cl::Hidden,
+ cl::desc("Fix up QFloat spills and reloads after register allocation"));
+
cl::opt<unsigned> RDFFuncBlockLimit(
"rdf-bb-limit", cl::Hidden, cl::init(1000),
cl::desc("Basic block limit for a function for RDF optimizations"));
@@ -156,6 +160,19 @@ static cl::opt<bool> EnableInstSimplify("hexagon-instsimplify", cl::Hidden,
cl::init(true),
cl::desc("Enable instsimplify"));
+cl::opt<QFloatMode> QFloatModeValue(
+ "hexagon-qfloat-mode", cl::desc("Specify the qfloat mode to operate on."),
+ cl::Hidden, cl::init(QFloatMode::Legacy),
+ cl::values(
+ clEnumValN(QFloatMode::StrictIEEE, "strict-ieee",
+ "Enable code generation for qfloat strict IEEE-754 mode"),
+ clEnumValN(QFloatMode::IEEE, "ieee",
+ "Enable code generation for qfloat IEEE-754 mode"),
+ clEnumValN(QFloatMode::Lossy, "lossy",
+ "Enable code generation for qfloat lossy-subnormals mode"),
+ clEnumValN(QFloatMode::Legacy, "legacy",
+ "Enable code generation for qfloat legacy mode")));
+
/// HexagonTargetMachineModule - Note that this is used on hosts that
/// cannot link in a library unless there are references into the
/// library. In particular, it seems that it is not possible to get
@@ -230,6 +247,8 @@ LLVMInitializeHexagonTarget() {
initializeHexagonSplitConst32AndConst64Pass(PR);
initializeHexagonVectorPrintPass(PR);
initializeHexagonQFPOptimizerPass(PR);
+ initializeHexagonXQFloatGeneratorPass(PR);
+ initializeHexagonPostRAHandleQFPPass(PR);
}
HexagonTargetMachine::HexagonTargetMachine(const Target &T, const Triple &TT,
@@ -397,12 +416,17 @@ void HexagonPassConfig::addIRPasses() {
bool HexagonPassConfig::addInstSelector() {
HexagonTargetMachine &TM = getHexagonTargetMachine();
+ const HexagonSubtarget *HST = TM.getSubtargetImpl();
bool NoOpt = (getOptLevel() == CodeGenOptLevel::None);
if (!NoOpt)
addPass(createHexagonOptimizeSZextends());
addPass(createHexagonISelDag(TM, getOptLevel()));
+ // Run the QFloat mode code generation pass only if v79 or greater.
+ // Do not run this pass, if legacy mode is passed on command line.
+ if (HST->useHVXV79Ops() && (QFloatModeValue != QFloatMode::Legacy))
+ addPass(createHexagonXQFloatGenerator());
if (!NoOpt) {
if (EnableVExtractOpt)
@@ -429,7 +453,10 @@ bool HexagonPassConfig::addInstSelector() {
addPass(createHexagonGenInsert());
if (EnableEarlyIf)
addPass(createHexagonEarlyIfConversion());
- addPass(createHexagonQFPOptimizer());
+ // For v75 or below, or if legacy mode is requested, run QFPOptizer pass
+ // to preserve backward compatibility.
+ if (!HST->useHVXV79Ops() || (QFloatModeValue == QFloatMode::Legacy))
+ addPass(createHexagonQFPOptimizer());
}
return false;
@@ -466,6 +493,9 @@ void HexagonPassConfig::addPreRegAlloc() {
}
void HexagonPassConfig::addPostRegAlloc() {
+ if (EnablePostRAHandleQFP)
+ addPass(createHexagonPostRAHandleQFP());
+
if (getOptLevel() != CodeGenOptLevel::None) {
if (EnableRDFOpt)
addPass(createHexagonRDFOpt());
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetMachine.h b/llvm/lib/Target/Hexagon/HexagonTargetMachine.h
index 98a21bbba4794..383d0e3cc5345 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetMachine.h
+++ b/llvm/lib/Target/Hexagon/HexagonTargetMachine.h
@@ -21,6 +21,8 @@
namespace llvm {
+enum class QFloatMode { StrictIEEE, IEEE, Lossy, Legacy };
+
class HexagonTargetMachine : public CodeGenTargetMachineImpl {
std::unique_ptr<TargetLoweringObjectFile> TLOF;
HexagonSubtarget Subtarget;
@@ -33,6 +35,7 @@ class HexagonTargetMachine : public CodeGenTargetMachineImpl {
std::optional<CodeModel::Model> CM, CodeGenOptLevel OL,
bool JIT);
~HexagonTargetMachine() override;
+ const HexagonSubtarget *getSubtargetImpl() const { return &Subtarget; }
const HexagonSubtarget *getSubtargetImpl(const Function &F) const override;
void registerPassBuilderCallbacks(PassBuilder &PB) override;
diff --git a/llvm/lib/Target/Hexagon/HexagonXQFloatGenerator.cpp b/llvm/lib/Target/Hexagon/HexagonXQFloatGenerator.cpp
new file mode 100644
index 0000000000000..4a1540e4ddcfe
--- /dev/null
+++ b/llvm/lib/Target/Hexagon/HexagonXQFloatGenerator.cpp
@@ -0,0 +1,2177 @@
+//===-------------------- HexagonXQFloatGenerator.cpp --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass enables generation of XQFloat instructions. XQF instructions
+// are more efficient, but can be less precise in comparison to IEEE ones.
+// Based on the accuracy preservation of the generated code, we enabled four
+// modes - Strict IEEE-754 compliant, IEEE-754 compliant, Lossy subnormals and
+// legacy mode.
+//
+// Strict IEEE mode adheres to similar accuracy and precision as of IEEE-754.
+//
+// IEEE-754 compliant mode excludes IEEE-754 overflows and lower precision
+// subnormals due to larger dynamic range than IEEE-754.
+// All subnormals have extra precision.
+//
+// Lossy subnormals mode without normalization result in a loss of accuracy.
+// This provides greater precision than a clamp of subnormals to 0.
+// If dataset excludes subnormals, it behavas as IEEE-754 compliant mode.
+//
+// The direct mode has a loss of 1 bit of accuracy compared to IEEE-754.
+//
+// V79 replaces the prior internal HVX floating point format for floating-point
+// arithmetic. The new internal HVX floating-point format yields results
+// identical to IEEE-754 round-to-even mode. The new format contains more bits
+// than IEEE-754, which optionally produces results with greater range and
+// accuracy. Only the HVX vector registers use the HVX floating-point format.
+// Memory maintains all floating-point data in IEEE-754 format,
+// and all loads/stores use the IEEE-754 format. A subset of HVX floating-point
+// operations transform IEEE-754 floating-point data to HVX floating-point data.
+// Subsequent HVX floating-point instructions may consume operands in the HVX
+// floating-point without conversion to IEEE-754, which allows for performant
+// & energy efficient code. The program does not need to switch between formats
+// continuously. The program must convert the HVX floating-point results to
+// IEEE-754 prior to storing to memory.
+
+// HVX floating-point achieves IEEE-754 compliance through normalization.
+// The program may skip normalization when faster calculation is desired, and
+// IEEE-754 compliance isn’t required. HVX floating-point contains two input
+// types: qf32, single precision floating point, and qf16, half precision
+// floating point. In Hexagon, IEEE-754 contains two input types: sf, single
+// precision floating point, and hf, half precision floating point.
+//
+// Only HVX floating-point source and destination instructions use HVX
+// floating-point values. Instructions specify the HVX floating-point format
+// with the qf16 and qf32 identifier. A source vector register will drop the
+// extended state of a HVX floating-point value when an instruction reads the
+// source vector register without the qf16 or qf32 identifier. A destination
+// vector register will reset its extended state when an instruction writes to
+// a vector register without the qf16 or qf32 identifier. When dropping the
+// extended state, the floating-point value loses accuracy. The program may
+// preserve the floating-point value by converting HVX floating-point values
+// to IEEE-754 values. Compiler must convert HVX floating-point values to
+// IEEE-754 values before using as an input to stores, permutes, shifts, and
+// any other operations that do not source the HVX floating-point format.
+//
+// Depending on the desired results, HVX floating-point operations may have
+// some requirements on the input sources. The HVX floating-point values
+// require normalization to achieve IEEE-754 compliance, while faster operations
+// may skip normalization. The program normalizes HVX floating-point values
+// before subsequent HVX floating-point operations, so the floating-point value
+// does not lose precision. The program also obtains results identical to
+// IEEE-754 by converting all HVX floating-point results to IEEE-754 format
+// before consumed in any subsequent operation. There are however cases where
+// this conversion is redundant, or the differences between IEEE-754 and HVX
+// floating-point may not be a concern.
+//
+// The conversion logic can be understood by the table below:
+//
+// ================================================================================================================================================
+// | | | |
+// | Inputs to add/subtarct | Inputs to
+// multiplication instuctions | Non-HVX floating
+// point | | instructions | | instruction
+// | | | | |
+// ===============================================================================================================================================|
+// Sources | IEEE- | HVX | HVX | sf | qf32 | qf32 | hf
+// | qf16 | qf16 | IEE-754 | HVX | HVX |
+// | 754 | floating | floating | | from | from | |
+// from | from | | floating | floating | | |
+// point | point | | mult | adder | | mult
+// | adder | | point | point | | | from |
+// from | | | | | | | |
+// from | from | | | multi | adder | |
+// | | | | | | mult |
+// adder | | | | | | | | | | |
+// | | |
+// ===============================================================================================================================================|
+// Strict | Direct | Convert | Convert | Normalize | Convert | Convert
+// | widening | Convert | Convert | Direct | Convert | Convert | IEEE-754
+// | Use | to | to | | to IEEE | to IEEE | multiply
+// | to IEEE, | to IEEE, | use | to | to | compliance | |
+// IEEE | IEEE | | then | then | then | widening
+// | widening | | IEEE | IEEE |
+// | | | | | normalize | normalize
+// | convert | multiply,| multiply,| | | |
+// | | | | | | | to IEEE
+// | convert | convert | | | | | |
+// | | | | | | to
+// IEEE | to IEEE | | | |
+// -----------------------------------------------------------------------------------------------------------------------------------------------|
+// IEEE-754 | Direct | Direct | Direct | Normalize | Direct | Normalize
+// | Widening | Direct | Widening | Direct | Convert | Convert | compliance
+// | Use | Use | Use | | use | | multiply
+// | use | multiply | use | to IEEE | to IEEE |
+// -----------------------------------------------------------------------------------------------------------------------------------------------|
+// Lossy | Direct | Direct | Direct | Direct | Direct | Normalize
+// | Direct | Direct | Widening | Direct | Convert | Convert | Subnormals
+// | Use | Use | Use | Use | use | | use |
+// use | multiply | use | to IEEE | to IEEE |
+// -----------------------------------------------------------------------------------------------------------------------------------------------|
+// Direct | Direct | Direct | Direct | Direct | Direct | Direct |
+// Direct | Direct | Direct | Direct | Direct | Direct | Lossy |
+// Use | Use | Use | Use | use | use | use |
+// use | use | use | use | use |
+// -----------------------------------------------------------------------------------------------------------------------------------------------|
+//
+// For v81, the normalization sequence changes. Instead of multiplying 0
+// and -0, a simple copy operation normalizes the unnormal value. Both
+// qf and IEEE-754 value can be unnormal.
+// Additionally for v81, we have two new vsub instructions which are handled.
+
+#define HEXAGON_XQFLOAT_GENERATOR "XQFloat Generator pass"
+
+#include "Hexagon.h"
+#include "HexagonInstrInfo.h"
+#include "HexagonSubtarget.h"
+#include "HexagonTargetMachine.h"
+#include "vector"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineOperand.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/DebugLoc.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+
+#define DEBUG_TYPE "hexagon-xqf-gen"
+
+using namespace llvm;
+
+extern cl::opt<QFloatMode> QFloatModeValue;
+
+// Master flag to enable XQF generations
+cl::opt<bool> EnableHVXXQFloat("enable-xqf-gen", cl::init(false),
+ cl::desc("Enable XQFloat generations"));
+// Master flag to remove extraneous qf to sf/hf conversions
+cl::opt<bool>
+ EnableConversionsRemoval("enable-rem-conv", cl::init(false),
+ cl::desc("Enable extraneous conversions removal"));
+
+// Diagnostic flags
+cl::opt<bool> PrintDebug("debug-print", cl::init(false),
+ cl::desc("Print function mir after transformation"));
+cl::opt<bool>
+ EnableConvDiag("enable-diag-conv", cl::init(false),
+ cl::desc("Print function after conversion removal."));
+
+// This vector contains the opcodes which generate qf32 from add/subtract
+SmallVector<unsigned short, 7> XQFPAdd32 = {
+ // vector add instructions
+ Hexagon::V6_vadd_sf, Hexagon::V6_vadd_qf32, Hexagon::V6_vadd_qf32_mix,
+
+ // vector subtract instructions
+ Hexagon::V6_vsub_qf32, Hexagon::V6_vsub_qf32_mix, Hexagon::V6_vsub_sf,
+ Hexagon::V6_vsub_sf_mix};
+
+// This vector contains the opcodes which generate qf16 from add/subtract
+SmallVector<unsigned short, 7> XQFPAdd16 = {
+ // vector add instructions
+ Hexagon::V6_vadd_hf, Hexagon::V6_vadd_qf16, Hexagon::V6_vadd_qf16_mix,
+
+ // vector subtract intrutions
+ Hexagon::V6_vsub_hf, Hexagon::V6_vsub_qf16, Hexagon::V6_vsub_qf16_mix,
+ Hexagon::V6_vsub_hf_mix};
+
+// This vector contains the opcodes which generate qf32 from multiplication
+SmallVector<unsigned short, 5> XQFPMult32 = {
+ Hexagon::V6_vmpy_qf32, Hexagon::V6_vmpy_qf32_qf16, Hexagon::V6_vmpy_qf32_hf,
+ Hexagon::V6_vmpy_qf32_sf, Hexagon::V6_vmpy_qf32_mix_hf};
+// This vector contains the opcodes which generate qf16 from multiplication
+SmallVector<unsigned short, 3> XQFPMult16 = {Hexagon::V6_vmpy_qf16,
+ Hexagon::V6_vmpy_qf16_hf,
+ Hexagon::V6_vmpy_qf16_mix_hf};
+
+namespace llvm {
+FunctionPass *createHexagonXQFloatGenerator();
+void initializeHexagonXQFloatGeneratorPass(PassRegistry &);
+} // namespace llvm
+
+namespace {
+
+struct HexagonXQFloatGenerator : public MachineFunctionPass {
+public:
+ static char ID;
+ HexagonXQFloatGenerator() : MachineFunctionPass(ID) {}
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override { return HEXAGON_XQFLOAT_GENERATOR; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+private:
+ // Handle each XQF optimization level
+ bool HandleStrictIEEE(MachineFunction &);
+ bool HandleCompliantIEEE(MachineFunction &);
+ bool HandleLossySubnormals(MachineFunction &);
+ bool HandleLossyLegacy(MachineFunction &);
+
+ // Checkers functions for input operands
+ bool checkIfInputFromAdder32(Register Reg);
+ bool checkIfInputFromAdder16(Register Reg);
+ bool checkIfInputFromMult32(Register Reg);
+ bool checkIfInputFromMult16(Register Reg);
+ bool deleteList();
+
+ // Helper functions for conversion/normalization/widening
+ bool widenMultiplicationInputF16(MachineInstr &, Register &, Register &,
+ Register &, bool);
+ bool widenMultiplicationInputF16Rt(MachineInstr &, Register &, Register &,
+ Register &);
+ void widenMultiplyInputHF(MachineInstr &, Register &, Register &, Register &);
+ bool normalizeMultiplicationInputF32(MachineInstr &, Register &, Register &,
+ Register &, Register &, bool &);
+ void normalizeMultiplicationInputSF(MachineInstr &, Register &, Register &,
+ Register &, Register &, bool &);
+ bool convertNormalizeMultOp32(MachineInstr &, Register &, Register &,
+ Register &, Register &, bool &);
+ bool convertWidenMultOp16(MachineInstr &, Register &, Register &, Register &,
+ bool);
+ bool convertWidenMultOp32(MachineInstr &, Register &, Register &, Register &,
+ bool);
+ void createPrologInstructions(MachineInstr &, Register &);
+ bool convertAddOpToIEEE16(MachineInstr &, Register &, Register &, Register &,
+ bool, bool, bool);
+ bool convertAddOpToIEEE32(MachineInstr &, Register &, Register &, Register &,
+ bool, bool, bool);
+ void generateQF16FromQF32(MachineInstr &, Register &, Register &);
+ bool convertIfInputToNonHVX(MachineInstr &, bool);
+ void createConvertInstr(MachineInstr *, Register &, Register &, bool);
+
+ // V81 specific normalization function
+ bool V81normalizeMultF32(MachineInstr &, Register &, Register &, Register &,
+ bool, bool, bool);
+
+ const HexagonSubtarget *HST = nullptr;
+ const HexagonInstrInfo *HII = nullptr;
+ MachineRegisterInfo *MRI = nullptr;
+
+ SmallVector<MachineInstr *, 16>
+ OriginalMI; // Hold the instructions to be deleted
+};
+
+// Print machine function
+static void debug_print([[maybe_unused]] MachineFunction &MF) {
+ dbgs() << "\n=== Printing function ===\n";
+#ifndef NDEBUG
+ for (MachineBasicBlock &MBB : MF)
+ MBB.dump();
+#endif // NDEBUG
+}
+
+// This class removes redundant vector convert instructions from qf to hf/sf.
+// Additionally, it relaces use of sf/hf registers with qf types.
+// The resulting code is complete without dangling instructions.
+// FIXME: Liveness is not preserved.
+class VectorConvertRemove {
+
+public:
+ VectorConvertRemove(MachineFunction &_MF, MachineRegisterInfo *_MRI,
+ const HexagonSubtarget *_HST)
+ : MF(_MF), MRI(_MRI), HST(_HST) {
+ HII = HST->getInstrInfo();
+ }
+
+ void run();
+
+private:
+ MachineFunction &MF;
+ MachineRegisterInfo *MRI;
+ const HexagonSubtarget *HST;
+ const HexagonInstrInfo *HII;
+
+ enum Operation { Add16, Add32, Sub16, Sub32, Mul16, Mul32 };
+ // Helper functions
+ void handle_addsub_sf_sf(MachineInstr &, Register &, Register &, Register &,
+ bool);
+ void handle_addsub_qf_sf(MachineInstr &, Register &, Register &, Register &,
+ bool);
+ void handle_addsubmul_hf_hf(MachineInstr &, Register &, Register &,
+ Register &, Operation);
+ void handle_addsubmul_qf_hf(MachineInstr &, Register &, Register &,
+ Register &, Operation);
+ void handle_qf32_mul_sf_sf(MachineInstr &, Register &, Register &,
+ Register &);
+ void handle_qf16_mul_hf_hf(MachineInstr &, Register &, Register &,
+ Register &);
+ bool checkHVXUses32(MachineInstr *, MachineInstr *);
+ bool checkHVXUses16(MachineInstr *, MachineInstr *);
+ unsigned getOperation(Operation, bool, bool);
+
+ // List which holds conversion instructions
+ SmallPtrSet<MachineInstr *, 16> ConvInstrList;
+ // List which holds qf handling instructions
+ std::vector<MachineInstr *> SfHfInstrList;
+};
+
+// both : both operands are replaced
+unsigned VectorConvertRemove::getOperation(Operation Op, bool firstOpQf,
+ bool secOpQf) {
+ if (firstOpQf && secOpQf) {
+ switch (Op) {
+ case Add16:
+ return Hexagon::V6_vadd_qf16;
+ case Add32:
+ return Hexagon::V6_vadd_qf32;
+ case Sub16:
+ return Hexagon::V6_vsub_qf16;
+ case Sub32:
+ return Hexagon::V6_vsub_qf32;
+ case Mul16:
+ return Hexagon::V6_vmpy_qf16;
+ case Mul32:
+ return Hexagon::V6_vmpy_qf32_qf16;
+ }
+ } else if (firstOpQf) {
+ switch (Op) {
+ case Add16:
+ return Hexagon::V6_vadd_qf16_mix;
+ case Add32:
+ return Hexagon::V6_vadd_qf32_mix;
+ case Sub16:
+ return Hexagon::V6_vsub_qf16_mix;
+ case Sub32:
+ return Hexagon::V6_vsub_qf32_mix;
+ case Mul16:
+ return Hexagon::V6_vmpy_qf16_mix_hf;
+ case Mul32:
+ return Hexagon::V6_vmpy_qf32_mix_hf;
+ }
+ } else if (secOpQf) {
+ switch (Op) {
+ case Sub16:
+ return Hexagon::V6_vsub_hf_mix;
+ case Sub32:
+ return Hexagon::V6_vsub_sf_mix;
+ default:
+ break;
+ }
+ } else {
+ }
+ llvm_unreachable("Unknown opcode and operand combination!");
+}
+
+// Return false if there are multiple instructions where the qf32 is used
+// other than the instruction for which it is called
+bool VectorConvertRemove::checkHVXUses32(MachineInstr *MI,
+ MachineInstr *UseMI) {
+ Register convReg = MI->getOperand(0).getReg();
+ // Iterate over all uses of the Def we are analyzing
+ for (auto &MO : make_range(MRI->use_begin(convReg), MRI->use_end())) {
+ MachineInstr *UMI = MO.getParent();
+ if (UMI == UseMI)
+ continue;
+ // Since the convert cannot be deleted, we set the operand as NOT kill
+ MI->getOperand(1).setIsKill(false);
+ return false;
+ }
+ return true;
+}
+
+// Return false if there are multiple instructions where the qf16 is used
+// other than the instruction for which it is called
+bool VectorConvertRemove::checkHVXUses16(MachineInstr *MI,
+ MachineInstr *UseMI) {
+ Register convReg = MI->getOperand(0).getReg();
+ // Iterate over all uses of the Def we are analyzing
+ for (auto &MO : make_range(MRI->use_begin(convReg), MRI->use_end())) {
+ MachineInstr *UMI = MO.getParent();
+ if (UMI == UseMI)
+ continue;
+ // Since the convert cannot be deleted, we set the operand as NOT kill
+ MI->getOperand(1).setIsKill(false);
+ return false;
+ }
+ return true;
+}
+
+// Removes converts feeding to op(sf,sf), and replaces its sf operands with qf
+void VectorConvertRemove::handle_addsub_sf_sf(MachineInstr &MI, Register &Reg1,
+ Register &Reg2, Register &Dest,
+ bool isAdd) {
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ bool firstConv = false, secConv = false;
+ bool DefOp1_del = false, DefOp2_del = false;
+ Register Src1, Src2;
+
+ MachineInstr *DefOp1 = MRI->getVRegDef(Reg1);
+ MachineInstr *DefOp2 = MRI->getVRegDef(Reg2);
+ // check if the first operand is from a convert operation
+ if (DefOp1->getOpcode() == Hexagon::V6_vconv_sf_qf32) {
+ if (checkHVXUses32(DefOp1, &MI))
+ DefOp1_del = true;
+ Src1 = DefOp1->getOperand(1).getReg();
+ firstConv = true;
+ }
+
+ // check if the second operand is from a convert operation
+ if (DefOp2->getOpcode() == Hexagon::V6_vconv_sf_qf32) {
+ if (checkHVXUses32(DefOp2, &MI))
+ DefOp2_del = true;
+ Src2 = DefOp2->getOperand(1).getReg();
+ secConv = true;
+ }
+
+ if (firstConv && secConv) {
+ BuildMI(MBB, MI, DL,
+ HII->get(getOperation(isAdd ? Operation::Add32 : Operation::Sub32,
+ true, true)),
+ Dest)
+ .addReg(Src1)
+ .addReg(Src2);
+ SfHfInstrList.push_back(&MI);
+ } else if (firstConv) {
+ BuildMI(MBB, MI, DL,
+ HII->get(getOperation(isAdd ? Operation::Add32 : Operation::Sub32,
+ true, false)),
+ Dest)
+ .addReg(Src1)
+ .addReg(Reg2);
+ SfHfInstrList.push_back(&MI);
+ } else if (secConv) {
+ // For v79, there is no provision for 2nd op being qf for add/sub
+ if (HST->useHVXV81Ops()) {
+ if (isAdd)
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), Dest)
+ .addReg(Src2)
+ .addReg(Reg1);
+ else
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_sf_mix), Dest)
+ .addReg(Reg1)
+ .addReg(Src2);
+ SfHfInstrList.push_back(&MI);
+ // For v79, there is no provision for 2nd op being qf for add/sub. Since
+ // add is commutative, the ops can be rotated.
+ } else if (HST->useHVXV79Ops()) {
+ // for vadd we interchange the ops, for vsub we ignore
+ if (isAdd) {
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), Dest)
+ .addReg(Src2)
+ .addReg(Reg1);
+ SfHfInstrList.push_back(&MI);
+ } else // don't delete the convert instruction for vsub
+ DefOp2_del = false;
+ }
+ } else { // none of the operands are from convert instructions
+ }
+
+ if (DefOp1_del)
+ ConvInstrList.insert(DefOp1);
+ if (DefOp2_del)
+ ConvInstrList.insert(DefOp2);
+}
+
+// Removes converts feeding to op(hf,hf), and replaces its hf operands with qf
+void VectorConvertRemove::handle_addsubmul_hf_hf(MachineInstr &MI,
+ Register &Reg1, Register &Reg2,
+ Register &Dest, Operation Op) {
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ bool firstConv = false, secConv = false;
+ bool DefOp1_del = false, DefOp2_del = false;
+ bool isSub = Op == Operation::Sub16;
+ Register Src1, Src2;
+
+ MachineInstr *DefOp1 = MRI->getVRegDef(Reg1);
+ MachineInstr *DefOp2 = MRI->getVRegDef(Reg2);
+ // check if the first operand is from a convert operation
+ if (DefOp1->getOpcode() == Hexagon::V6_vconv_hf_qf16) {
+ if (checkHVXUses16(DefOp1, &MI))
+ DefOp1_del = true;
+ Src1 = DefOp1->getOperand(1).getReg();
+ firstConv = true;
+ }
+
+ // check if the second operand is from a convert operation
+ if (DefOp2->getOpcode() == Hexagon::V6_vconv_hf_qf16) {
+ if (checkHVXUses16(DefOp2, &MI))
+ DefOp2_del = true;
+ Src2 = DefOp2->getOperand(1).getReg();
+ secConv = true;
+ }
+
+ if (firstConv && secConv) {
+ BuildMI(MBB, MI, DL, HII->get(getOperation(Op, true, true)), Dest)
+ .addReg(Src1)
+ .addReg(Src2);
+ SfHfInstrList.push_back(&MI);
+ } else if (firstConv) {
+ BuildMI(MBB, MI, DL, HII->get(getOperation(Op, true, false)), Dest)
+ .addReg(Src1)
+ .addReg(Reg2);
+ SfHfInstrList.push_back(&MI);
+ } else if (secConv) {
+ // For v81, we interchange the ops for vadd/vmul
+ // for vsub we use qf as second operand
+ if (HST->useHVXV81Ops()) {
+ if (!isSub)
+ BuildMI(MBB, MI, DL, HII->get(getOperation(Op, true, false)), Dest)
+ .addReg(Src2)
+ .addReg(Reg1);
+ else
+ BuildMI(MBB, MI, DL, HII->get(getOperation(Op, false, true)), Dest)
+ .addReg(Reg1)
+ .addReg(Src2);
+ SfHfInstrList.push_back(&MI);
+ } else if (HST->useHVXV79Ops()) {
+ // for vadd/vmul we interchange the ops, for vsub we ignore
+ if (!isSub) {
+ BuildMI(MBB, MI, DL, HII->get(getOperation(Op, true, false)), Dest)
+ .addReg(Src2)
+ .addReg(Reg1);
+ SfHfInstrList.push_back(&MI);
+ } else // don't delete the convert instruction for vsub
+ DefOp2_del = false;
+ }
+ } else { // none of the operands are from convert instructions
+ }
+
+ if (DefOp1_del)
+ ConvInstrList.insert(DefOp1);
+ if (DefOp2_del)
+ ConvInstrList.insert(DefOp2);
+}
+
+// Removes converts feeding to op(qf,sf), and replaces its sf operands with qf
+void VectorConvertRemove::handle_addsub_qf_sf(MachineInstr &MI, Register &Reg1,
+ Register &Reg2, Register &Dest,
+ bool isAdd) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+ Register Src;
+ bool conv = false;
+
+ MachineInstr *DefOp = MRI->getVRegDef(Reg2);
+ // check if the second operand is from a convert operation
+ if (DefOp->getOpcode() == Hexagon::V6_vconv_sf_qf32) {
+ if (checkHVXUses32(DefOp, &MI))
+ ConvInstrList.insert(DefOp);
+ Src = DefOp->getOperand(1).getReg();
+ conv = true;
+ }
+
+ if (conv) {
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_qf32 : Hexagon::V6_vsub_qf32),
+ Dest)
+ .addReg(Reg1)
+ .addReg(Src);
+ SfHfInstrList.push_back(&MI);
+ }
+}
+
+// Removes converts feeding to op(qf,hf), and replaces its hf operands with qf
+void VectorConvertRemove::handle_addsubmul_qf_hf(MachineInstr &MI,
+ Register &Reg1, Register &Reg2,
+ Register &Dest, Operation Op) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+ Register Src;
+ bool conv = false;
+
+ MachineInstr *DefOp = MRI->getVRegDef(Reg2);
+ // check if the second operand is from a convert operation
+ if (DefOp->getOpcode() == Hexagon::V6_vconv_hf_qf16) {
+ if (checkHVXUses16(DefOp, &MI))
+ ConvInstrList.insert(DefOp);
+ Src = DefOp->getOperand(1).getReg();
+ conv = true;
+ }
+
+ if (conv) {
+ BuildMI(MBB, MI, DL, HII->get(getOperation(Op, true, true)), Dest)
+ .addReg(Reg1)
+ .addReg(Src);
+ SfHfInstrList.push_back(&MI);
+ }
+}
+
+// Removes converts feeding to op(sf,sf), and replaces its sf operands with qf
+void VectorConvertRemove::handle_qf32_mul_sf_sf(MachineInstr &MI,
+ Register &Reg1, Register &Reg2,
+ Register &Dest) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+ Register Src1, Src2;
+ bool firstConv = false, secConv = false;
+
+ MachineInstr *DefOp1 = MRI->getVRegDef(Reg1);
+ MachineInstr *DefOp2 = MRI->getVRegDef(Reg2);
+
+ if (DefOp1->getOpcode() == Hexagon::V6_vconv_sf_qf32 &&
+ DefOp2->getOpcode() == Hexagon::V6_vconv_sf_qf32) {
+ // If yes, we can remove the convert
+ if (checkHVXUses32(DefOp1, &MI) && checkHVXUses32(DefOp2, &MI)) {
+ ConvInstrList.insert(DefOp1);
+ ConvInstrList.insert(DefOp2);
+ }
+ Src1 = DefOp1->getOperand(1).getReg();
+ Src2 = DefOp2->getOperand(1).getReg();
+ firstConv = true;
+ secConv = true;
+ }
+
+ // If both are true, then only replace with qf32 = vmpy(qf32, qf32)
+ if (firstConv && secConv) {
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(Src1)
+ .addReg(Src2);
+ SfHfInstrList.push_back(&MI);
+ }
+}
+
+void VectorConvertRemove::run() {
+ for (auto &MBB : MF) {
+ for (auto &MI : MBB) {
+ // Skip if the instruction does not have two operands,
+ // or is a bundle instruction
+ // or is a debug instruction
+ if (MI.getNumOperands() != 3 || MI.isDebugInstr())
+ continue;
+
+ auto Op1 = MI.getOperand(1);
+ if (!Op1.isReg())
+ continue;
+ auto Op2 = MI.getOperand(2);
+ if (!Op2.isReg())
+ continue;
+ auto Op0 = MI.getOperand(0);
+ if (!Op0.isReg())
+ continue;
+ Register Reg1 = Op1.getReg();
+ Register Reg2 = Op2.getReg();
+ Register Dest = Op0.getReg();
+
+ switch (MI.getOpcode()) {
+ // TODO Handle the new vsub instructions
+ // qf32 = vadd(sf, sf)
+ case Hexagon::V6_vadd_sf:
+ handle_addsub_sf_sf(MI, Reg1, Reg2, Dest, true);
+ break;
+ // qf32 = vsub(sf, sf)
+ case Hexagon::V6_vsub_sf:
+ handle_addsub_sf_sf(MI, Reg1, Reg2, Dest, false);
+ break;
+ // qf32 = vadd(qf32, sf)
+ case Hexagon::V6_vadd_qf32_mix:
+ handle_addsub_qf_sf(MI, Reg1, Reg2, Dest, true);
+ break;
+ // qf32 = vsub(qf32, sf)
+ case Hexagon::V6_vsub_qf32_mix:
+ handle_addsub_qf_sf(MI, Reg1, Reg2, Dest, false);
+ break;
+ // qf16 = vadd(hf, hf)
+ case Hexagon::V6_vadd_hf:
+ handle_addsubmul_hf_hf(MI, Reg1, Reg2, Dest, Operation::Add16);
+ break;
+ // qf16 = vsub(hf, hf)
+ case Hexagon::V6_vsub_hf:
+ handle_addsubmul_hf_hf(MI, Reg1, Reg2, Dest, Operation::Sub16);
+ break;
+ // qf16 = vadd(qf16, hf)
+ case Hexagon::V6_vadd_qf16_mix:
+ handle_addsubmul_qf_hf(MI, Reg1, Reg2, Dest, Operation::Add16);
+ break;
+ // qf16 = vsub(qf16, hf)
+ case Hexagon::V6_vsub_qf16_mix:
+ handle_addsubmul_qf_hf(MI, Reg1, Reg2, Dest, Operation::Sub16);
+ break;
+ // qf32 = vmpy(sf, sf)
+ case Hexagon::V6_vmpy_qf32_sf:
+ handle_qf32_mul_sf_sf(MI, Reg1, Reg2, Dest);
+ break;
+ // qf32 = vmpy(hf, hf)
+ case Hexagon::V6_vmpy_qf32_hf:
+ handle_addsubmul_hf_hf(MI, Reg1, Reg2, Dest, Operation::Mul32);
+ break;
+ // qf32 = vmpy(qf16, hf)
+ case Hexagon::V6_vmpy_qf32_mix_hf:
+ handle_addsubmul_qf_hf(MI, Reg1, Reg2, Dest, Operation::Mul32);
+ break;
+ // qf16 = vmpy(hf, hf)
+ case Hexagon::V6_vmpy_qf16_hf:
+ handle_addsubmul_hf_hf(MI, Reg1, Reg2, Dest, Operation::Mul16);
+ break;
+ // qf16 = vmpy(qf16, hf)
+ case Hexagon::V6_vmpy_qf16_mix_hf:
+ handle_addsubmul_qf_hf(MI, Reg1, Reg2, Dest, Operation::Mul16);
+ ;
+ break;
+ default:
+ break;
+ }
+ }
+ }
+
+ // Delete the vadd/vsub/vmpy instructions
+ for (MachineInstr *sfhfMI : SfHfInstrList) {
+ LLVM_DEBUG(dbgs() << "deleting sf/hf instruction ");
+ LLVM_DEBUG(sfhfMI->dump());
+ sfhfMI->eraseFromParent();
+ }
+ // Delete conversion instructions
+ for (MachineInstr *convMI : ConvInstrList) {
+ LLVM_DEBUG(dbgs() << "deleting conversion instruction");
+ LLVM_DEBUG(convMI->dump());
+ convMI->eraseFromParent();
+ }
+}
+
+char HexagonXQFloatGenerator::ID = 0;
+
+} // namespace
+
+INITIALIZE_PASS(HexagonXQFloatGenerator, "hexagon-xqfloat-generator",
+ HEXAGON_XQFLOAT_GENERATOR, false, false)
+
+FunctionPass *llvm::createHexagonXQFloatGenerator() {
+ return new HexagonXQFloatGenerator();
+}
+
+// Returns true if qf32 input is from an adder/subtract unit
+bool HexagonXQFloatGenerator::checkIfInputFromAdder32(Register Reg) {
+ MachineInstr *Def = MRI->getVRegDef(Reg);
+ if (!Def)
+ return false;
+
+ // If the definition is a copy, we need to analyze its def again
+ if (Def->getOpcode() == TargetOpcode::COPY) {
+ Register SrcReg = Def->getOperand(1).getReg();
+ if (SrcReg.isValid())
+ return checkIfInputFromAdder32(SrcReg);
+ return false;
+ } else if (Def->getOpcode() == TargetOpcode::REG_SEQUENCE) {
+ Register SrcReg1 = Def->getOperand(1).getReg();
+ Register SrcReg2 = Def->getOperand(2).getReg();
+ bool isTrue = false;
+ if (SrcReg1.isValid())
+ isTrue = checkIfInputFromAdder32(SrcReg1);
+ if (SrcReg2.isValid())
+ isTrue = checkIfInputFromAdder32(SrcReg2);
+ return isTrue;
+ } else
+ return std::find(XQFPAdd32.begin(), XQFPAdd32.end(), Def->getOpcode()) !=
+ XQFPAdd32.end();
+}
+
+// Returns true if qf16 input is from an adder/subtract unit
+bool HexagonXQFloatGenerator::checkIfInputFromAdder16(Register Reg) {
+ MachineInstr *Def = MRI->getVRegDef(Reg);
+ if (!Def)
+ return false;
+
+ // if the definition is a copy, we need to analyze its def again
+ if (Def->getOpcode() == TargetOpcode::COPY) {
+ Register SrcReg = Def->getOperand(1).getReg();
+ if (SrcReg.isValid())
+ return checkIfInputFromAdder16(SrcReg);
+ return false;
+ } else
+ return std::find(XQFPAdd16.begin(), XQFPAdd16.end(), Def->getOpcode()) !=
+ XQFPAdd16.end();
+}
+
+// Returns true if qf32 input is from a multiplier unit
+bool HexagonXQFloatGenerator::checkIfInputFromMult32(Register Reg) {
+ MachineInstr *Def = MRI->getVRegDef(Reg);
+ if (!Def)
+ return false;
+
+ // if the definition is a copy, we need to analyze its def again
+ if (Def->getOpcode() == TargetOpcode::COPY) {
+ Register SrcReg = Def->getOperand(1).getReg();
+ if (SrcReg.isValid())
+ return checkIfInputFromMult32(SrcReg);
+ return false;
+ } else if (Def->getOpcode() == TargetOpcode::REG_SEQUENCE) {
+ Register SrcReg1 = Def->getOperand(1).getReg();
+ Register SrcReg2 = Def->getOperand(2).getReg();
+ bool isTrue = false;
+ if (SrcReg1.isValid())
+ isTrue |= checkIfInputFromMult32(SrcReg1);
+ if (SrcReg2.isValid())
+ isTrue |= checkIfInputFromMult32(SrcReg2);
+ return isTrue;
+ } else
+ return std::find(XQFPMult32.begin(), XQFPMult32.end(), Def->getOpcode()) !=
+ XQFPMult32.end();
+}
+
+// Returns true if qf16 input is from a multiplier unit
+bool HexagonXQFloatGenerator::checkIfInputFromMult16(Register Reg) {
+ MachineInstr *Def = MRI->getVRegDef(Reg);
+ if (!Def)
+ return false;
+
+ // if the definition is a copy, we need to analyze its def again
+ if (Def->getOpcode() == TargetOpcode::COPY) {
+ Register SrcReg = Def->getOperand(1).getReg();
+ if (SrcReg.isValid())
+ return checkIfInputFromMult16(SrcReg);
+ return false;
+ } else
+ return std::find(XQFPMult16.begin(), XQFPMult16.end(), Def->getOpcode()) !=
+ XQFPMult16.end();
+}
+
+// Generates sf = qf32 instruction or hf = qf16 intruction
+void HexagonXQFloatGenerator::createConvertInstr(MachineInstr *UseMI,
+ Register &NewR, Register &OldR,
+ bool is32bit) {
+ const DebugLoc &DL = UseMI->getDebugLoc();
+ MachineBasicBlock *MBB = UseMI->getParent();
+ NewR = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ if (is32bit)
+ BuildMI(*MBB, *UseMI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), NewR)
+ .addReg(OldR);
+ else
+ BuildMI(*MBB, *UseMI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), NewR)
+ .addReg(OldR);
+}
+
+// Generate HVX to IEEE conversion instruction for all non-HVX uses
+bool HexagonXQFloatGenerator::convertIfInputToNonHVX(MachineInstr &MI,
+ bool is32bit) {
+ Register NewR;
+ bool Changed = false;
+ ;
+ Register Dest = MI.getOperand(0).getReg();
+
+ // Iterate over all uses of the Def we are analyzing
+ for (auto &MO : make_range(MRI->use_begin(Dest), MRI->use_end())) {
+ MachineInstr *UseMI = MO.getParent();
+ // Omit if the use is a REG_SEQUENCE instruction, since the only
+ // use of REG_SEQUENCE in qf context is transforming to IEEE.
+ // Omit for use in DBG instructions.
+ // Omit for use in PHI instructions since PHI result can be used as a qf
+ // operand.
+ if (UseMI->getOpcode() == TargetOpcode::REG_SEQUENCE ||
+ UseMI->getOpcode() == TargetOpcode::DBG_VALUE ||
+ UseMI->getOpcode() == TargetOpcode::DBG_LABEL ||
+ UseMI->getOpcode() == TargetOpcode::PHI)
+ continue;
+
+ // If 32-bit operand
+ if (is32bit) {
+ // If it is a copy instruction, we need to analyze it uses
+ if (UseMI->getOpcode() == TargetOpcode::COPY)
+ return convertIfInputToNonHVX(*UseMI, /* 32 bit */ true);
+ if (!HII->usesQFOperand(UseMI)) {
+ createConvertInstr(UseMI, NewR, Dest, /*32 bit*/ true);
+ MO.setReg(NewR);
+ Changed = true;
+ }
+ // If 16-bit operand
+ } else {
+ // If it is a copy instruction, we need to analyze it uses
+ if (UseMI->getOpcode() == TargetOpcode::COPY)
+ return convertIfInputToNonHVX(*UseMI, /* 16 bit */ false);
+ if (!HII->usesQFOperand(UseMI)) {
+ createConvertInstr(UseMI, NewR, Dest, /*16 bit*/ false);
+ MO.setReg(NewR);
+ Changed = true;
+ }
+ }
+ }
+ return Changed;
+}
+
+// generate qf16 = qf32 via:
+// hf = qf32
+// V0 = #0
+// qf16 = vsub(hf,V0)
+void HexagonXQFloatGenerator::generateQF16FromQF32(MachineInstr &MI,
+ Register &Dest,
+ Register &SrcReg) {
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ Register convertReg = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf32), convertReg)
+ .addReg(SrcReg);
+ Register VR0 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vd0), VR0);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_hf), Dest)
+ .addReg(convertReg)
+ .addReg(VR0);
+}
+
+// Widen qf16 = vmpy(hf, hf) result unconditionally
+void HexagonXQFloatGenerator::widenMultiplyInputHF(MachineInstr &MI,
+ Register &Reg1,
+ Register &Reg2,
+ Register &Dest) {
+ Register output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), output_mpy)
+ .addReg(Reg1)
+ .addReg(Reg2);
+ generateQF16FromQF32(MI, Dest, output_mpy);
+}
+
+// Widen vmpy(qf16, qf16/hf) result conditionally
+bool HexagonXQFloatGenerator::widenMultiplicationInputF16(MachineInstr &MI,
+ Register &Reg1,
+ Register &Reg2,
+ Register &Dest,
+ bool twoOps) {
+ bool firstconvert = false, secondconvert = false;
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ // We widen only that operand which comes from add/subtract unit.
+ if (checkIfInputFromAdder16(Reg1))
+ firstconvert = true;
+ // twoOps == true suggest 2nd operand is qf16, else it is hf
+ if (twoOps && checkIfInputFromAdder16(Reg2))
+ secondconvert = true;
+
+ Register widenReg;
+ // if either operands from add/subtract unit, we widen
+ if (twoOps) {
+ if (firstconvert || secondconvert) {
+ widenReg = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_qf16), widenReg)
+ .addReg(Reg1)
+ .addReg(Reg2);
+ } else {
+ return false;
+ }
+ } else {
+ if (firstconvert) {
+ widenReg = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), widenReg)
+ .addReg(Reg1)
+ .addReg(Reg2);
+ } else {
+ return false;
+ }
+ }
+
+ // generate qf16 = qf32
+ generateQF16FromQF32(MI, Dest, widenReg);
+
+ return true;
+}
+
+// Handle qf16 = vmpy(qf16, Rt)
+// For strict IEEE mode, convert the qf16 to IEEE before widening
+bool HexagonXQFloatGenerator::widenMultiplicationInputF16Rt(MachineInstr &MI,
+ Register &Reg1,
+ Register &Reg2,
+ Register &Dest) {
+ // If the first input is not from an adder, for strict-ieee check if
+ // input from mult, else return false.
+ if (!checkIfInputFromAdder16(Reg1)) {
+ if (QFloatModeValue == QFloatMode::StrictIEEE) {
+ if (!checkIfInputFromMult16(Reg1))
+ return false;
+ } else
+ return false;
+ }
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ Register VSplatReg = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_lvsplatw), VSplatReg).addReg(Reg2);
+
+ Register widenReg = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ if (QFloatModeValue == QFloatMode::StrictIEEE) {
+ Register VHf = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VHf).addReg(Reg1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), widenReg)
+ .addReg(VHf)
+ .addReg(VSplatReg);
+ } else {
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), widenReg)
+ .addReg(Reg1)
+ .addReg(VSplatReg);
+ }
+
+ // generate qf16 = qf32
+ generateQF16FromQF32(MI, Dest, widenReg);
+ return true;
+}
+
+// Handle qf32 = vadd/vsub(qf32/sf, qf32/sf)
+// Handle vadd/vsub instructions with qf32 operands conditionally
+// isAdd: true if an add instruction is analyzed, false for subtract
+// isFirstOpQf: true if 1st operand is qf32 type, false if sf type
+// isSecOpQf: true if 2nd operand is qf32 type, false if sf type
+bool HexagonXQFloatGenerator::convertAddOpToIEEE32(
+ MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
+ bool isAdd, bool isFirstOpQf, bool isSecOpQf) {
+
+ Register VR1;
+ Register VR2;
+ bool firstconvert = false, secondconvert = false;
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ // If the first operand is qf32 type
+ if (isFirstOpQf) {
+ // If the first operand is from add/sub/mul unit,
+ // generate IEEE conversion instruction sf = qf32
+ if (checkIfInputFromAdder32(Reg1) || checkIfInputFromMult32(Reg1)) {
+ VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR1)
+ .addReg(Reg1);
+ firstconvert = true;
+ }
+ }
+
+ // If 2nd operand is of qf32 type
+ if (isSecOpQf) {
+ // If the second operand is from add/sub/mul unit,
+ // generate IEEE conversion instruction
+ if (checkIfInputFromAdder32(Reg2) || checkIfInputFromMult32(Reg2)) {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR2)
+ .addReg(Reg2);
+ secondconvert = true;
+ }
+ }
+
+ // If both operands are qf32 type, use V6_v[add/sub]_sf instruction
+ // If one of them is of sf type, use V6_v[add/sub]_qf32_mix instruction
+ // Output is qf32
+ if (isFirstOpQf && isSecOpQf) {
+ if (firstconvert && secondconvert) {
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_sf : Hexagon::V6_vsub_sf), Dest)
+ .addReg(VR1)
+ .addReg(VR2);
+ } else if (firstconvert) {
+ if (isAdd)
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), Dest)
+ .addReg(Reg2)
+ .addReg(VR1);
+ // For vsub type, for v81 we use a different opcode,
+ // for v79, we convert the 2nd op to IEEE too.
+ else {
+ if (HST->useHVXV81Ops())
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_sf_mix), Dest)
+ .addReg(VR1)
+ .addReg(Reg2);
+ else {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR2)
+ .addReg(Reg2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_sf), Dest)
+ .addReg(VR1)
+ .addReg(VR2);
+ }
+ }
+ } else if (secondconvert) {
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_qf32_mix
+ : Hexagon::V6_vsub_qf32_mix),
+ Dest)
+ .addReg(Reg1)
+ .addReg(VR2);
+ } else { // none of the inputs is from an add/sub/mul unit
+ return false;
+ }
+ // handle vadd/vsub when the 1st op of original instruction is qf type
+ } else if (isFirstOpQf) {
+ if (firstconvert)
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_sf : Hexagon::V6_vsub_sf), Dest)
+ .addReg(VR1)
+ .addReg(Reg2);
+ else
+ return false;
+ // handle vadd/vsub when the 2nd op of original instruction is qf type
+ } else if (isSecOpQf) {
+ if (secondconvert)
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_sf : Hexagon::V6_vsub_sf), Dest)
+ .addReg(Reg1)
+ .addReg(VR2);
+ else
+ return false;
+ } else
+ return false;
+ return true;
+}
+
+// Handle qf16 = vadd/vsub(qf16, qf16/hf)
+// Handle vadd/vsub instructions with qf16 operands conditionally
+// isAdd: true if an add instruction is analyzed, false for subtract
+// isFirstOpQf: true if 1st operand is qf16 type, false if hf type
+// isSecOpQf: true if 2nd operand is qf16 type, false if hf type
+bool HexagonXQFloatGenerator::convertAddOpToIEEE16(
+ MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
+ bool isAdd, bool isFirstOpQf, bool isSecOpQf) {
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+ Register VR1;
+ Register VR2;
+ bool firstconvert = false, secondconvert = false;
+
+ // If the first qf16 operand is from add/sub/mul unit,
+ // generate IEEE conversion instruction
+ if (isFirstOpQf) {
+ if (checkIfInputFromAdder16(Reg1) || checkIfInputFromMult16(Reg1)) {
+ VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR1)
+ .addReg(Reg1);
+ firstconvert = true;
+ }
+ }
+ if (isSecOpQf) {
+ // If the second operand is from add/sub/mul unit,
+ // generate IEEE conversion instruction
+ if (checkIfInputFromAdder16(Reg2) || checkIfInputFromMult16(Reg2)) {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
+ .addReg(Reg2);
+ secondconvert = true;
+ }
+ }
+
+ // If both operands are qf16 type, use V6_v[add/sub]_hf instruction
+ // If one of them is of hf type, use V6_v[add/sub]_qf16_mix instruction
+ // Output is qf16
+ if (isFirstOpQf && isSecOpQf) {
+ if (firstconvert && secondconvert) {
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_hf : Hexagon::V6_vsub_hf), Dest)
+ .addReg(VR1)
+ .addReg(VR2);
+ } else if (firstconvert) {
+ if (isAdd)
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf16_mix), Dest)
+ .addReg(Reg2)
+ .addReg(VR1);
+ // For vsub type, for v81 we use a different opcode,
+ // for v79, we convert the 2nd op to IEEE too.
+ else {
+ if (HST->useHVXV81Ops())
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_hf_mix), Dest)
+ .addReg(VR1)
+ .addReg(Reg2);
+ else {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
+ .addReg(Reg2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vsub_hf), Dest)
+ .addReg(VR1)
+ .addReg(VR2);
+ }
+ }
+ } else if (secondconvert) {
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_qf16_mix
+ : Hexagon::V6_vsub_qf16_mix),
+ Dest)
+ .addReg(Reg1)
+ .addReg(VR2);
+ } else { // none of the inputs is from an add/sub/mul unit
+ return false;
+ }
+ // handle vadd/vsub when the 1st op of original instruction is qf type
+ } else if (isFirstOpQf) {
+ if (firstconvert)
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_hf : Hexagon::V6_vsub_hf), Dest)
+ .addReg(VR1)
+ .addReg(Reg2);
+ else
+ return false;
+ // handle vadd/vsub when the 2nd op of original instruction is qf type
+ } else if (isSecOpQf) {
+ if (secondconvert)
+ BuildMI(MBB, MI, DL,
+ HII->get(isAdd ? Hexagon::V6_vadd_hf : Hexagon::V6_vsub_hf), Dest)
+ .addReg(Reg1)
+ .addReg(VR2);
+ else
+ return false;
+ } else
+ return false;
+ return true;
+}
+
+// Create the prolog
+// v0 = #0
+// R1 = #0x80000000
+// v1.sf = vsplat(R1)
+// v2.sf = vmpy(v0.sf, v1.sf)
+void HexagonXQFloatGenerator::createPrologInstructions(MachineInstr &MI,
+ Register &R_mpy) {
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ Register VR0 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vd0), VR0);
+
+ Register R_0 = MRI->createVirtualRegister(&Hexagon::IntRegsRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::A2_tfrsi), R_0).addImm(0x80000000);
+
+ Register VR_0 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_lvsplatw), VR_0).addReg(R_0);
+
+ R_mpy = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_sf), R_mpy)
+ .addReg(VR0)
+ .addReg(VR_0);
+}
+
+bool HexagonXQFloatGenerator::V81normalizeMultF32(
+ MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
+ bool firstconvert, bool secondconvert, bool strictieee) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+ Register input_mpy1, input_mpy2;
+
+ auto Op =
+ strictieee ? Hexagon::V6_vconv_qf32_sf : Hexagon::V6_vconv_qf32_qf32;
+
+ // Normalize both input operands
+ if (firstconvert && secondconvert) {
+ input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Op), input_mpy1).addReg(Reg1);
+ BuildMI(MBB, MI, DL, HII->get(Op), input_mpy2).addReg(Reg2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(input_mpy2);
+ }
+ // Normalize only first operand
+ else if (firstconvert) {
+ input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Op), input_mpy1).addReg(Reg1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(Reg2);
+ }
+ // Normalize only second operand
+ else if (secondconvert) {
+ input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Op), input_mpy2).addReg(Reg2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(Reg1)
+ .addReg(input_mpy2);
+ } else
+ // we do nothing if the inputs are not from adder/sub/mult unit
+ return false;
+
+ return true;
+}
+
+// Normalize qf32 = vmpy(sf, sf) instruction unconditionally
+void HexagonXQFloatGenerator::normalizeMultiplicationInputSF(
+ MachineInstr &MI, Register &Src1, Register &Src2, Register &Dest,
+ Register &R_mpy, bool &PrologCreated) {
+
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ if (HST->useHVXV81Ops()) {
+ Register input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ Register input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ // Normalize both inputs
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_qf32_sf), input_mpy1)
+ .addReg(Src1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_qf32_sf), input_mpy2)
+ .addReg(Src2);
+ // Add the new vmpy
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(input_mpy2);
+ return;
+ }
+
+ if (!PrologCreated) {
+ createPrologInstructions(MI, R_mpy);
+ PrologCreated = true;
+ }
+
+ Register input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ Register input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ // Normalize both inputs
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy1)
+ .addReg(R_mpy)
+ .addReg(Src1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy2)
+ .addReg(R_mpy)
+ .addReg(Src2);
+ // Add the new vmpy
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(input_mpy2);
+}
+
+// Convert and normalize qf32 = vmpy(qf32, qf32) instructions conditionally
+bool HexagonXQFloatGenerator::convertNormalizeMultOp32(
+ MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
+ Register &R_mpy, bool &PrologCreated) {
+
+ Register VR1, VR2;
+ bool firstconvert = false, secondconvert = false;
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ // If the first operand is from add/subtract/multiply unit, generate IEEE
+ // conversion instruction
+ if (checkIfInputFromAdder32(Reg1) || checkIfInputFromMult32(Reg1)) {
+ VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR1).addReg(Reg1);
+ firstconvert = true;
+ }
+
+ if (checkIfInputFromAdder32(Reg2) || checkIfInputFromMult32(Reg2)) {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_sf_qf32), VR2).addReg(Reg2);
+ secondconvert = true;
+ }
+
+ if (HST->useHVXV81Ops()) {
+ if (firstconvert && secondconvert)
+ return V81normalizeMultF32(MI, VR1, VR2, Dest, true, true, true);
+ else if (firstconvert)
+ return V81normalizeMultF32(MI, VR1, Reg2, Dest, true, false, true);
+ else if (secondconvert)
+ return V81normalizeMultF32(MI, Reg1, VR2, Dest, false, true, true);
+ else
+ return false;
+ }
+
+ // create prolog if not already created
+ if (!PrologCreated && (firstconvert || secondconvert)) {
+ createPrologInstructions(MI, R_mpy);
+ PrologCreated = true;
+ }
+
+ Register input_mpy1, input_mpy2;
+
+ // Normalize both IEEE converts
+ if (firstconvert && secondconvert) {
+ input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy1)
+ .addReg(R_mpy)
+ .addReg(VR1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy2)
+ .addReg(R_mpy)
+ .addReg(VR2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(input_mpy2);
+ // Normalize only first operand
+ } else if (firstconvert) {
+ input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy1)
+ .addReg(R_mpy)
+ .addReg(VR1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(Reg2);
+ // Normalize only second operand
+ } else if (secondconvert) {
+ input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32_mix), input_mpy2)
+ .addReg(R_mpy)
+ .addReg(VR2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy2)
+ .addReg(Reg2);
+ } else {
+ // we do nothing if the inputs are not fromadder/subtracter/multiplier unit
+ return false;
+ }
+ return true;
+}
+
+// Convert to IEEE and widen qf16 = vmpy(qf16/hf, qf16) conditionally
+// Then convert qf32 to qf16
+// twoOps: true if the first operand is qf type, false if hf type
+bool HexagonXQFloatGenerator::convertWidenMultOp16(MachineInstr &MI,
+ Register &Reg1,
+ Register &Reg2,
+ Register &Dest,
+ bool twoOps) {
+
+ Register VR1, VR2, output_mpy;
+ bool firstconvert = false,
+ secondconvert = false; // normalize with hf or qf16 operands
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ // If the first operand is from add/sub/mul unit,
+ // generate IEEE conversion instruction
+ if (checkIfInputFromAdder16(Reg1) || checkIfInputFromMult16(Reg1)) {
+ VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR1).addReg(Reg1);
+ firstconvert = true;
+ }
+
+ if (twoOps) {
+ if (checkIfInputFromAdder16(Reg2) || checkIfInputFromMult16(Reg2)) {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
+ .addReg(Reg2);
+ secondconvert = true;
+ }
+ }
+
+ if (twoOps) {
+ // Both operands have been converted to IEEE
+ if (firstconvert && secondconvert) {
+ output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), output_mpy)
+ .addReg(VR1)
+ .addReg(VR2);
+ // Only one operand has been converted to IEEE
+ } else if (firstconvert) {
+ output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), output_mpy)
+ .addReg(Reg2)
+ .addReg(VR1);
+ } else if (secondconvert) {
+ output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), output_mpy)
+ .addReg(Reg1)
+ .addReg(VR2);
+ } else {
+ // Neither have to be transformed
+ return false;
+ }
+ } else {
+ if (firstconvert) {
+ output_mpy = MRI->createVirtualRegister(&Hexagon::HvxWRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), output_mpy)
+ .addReg(VR1)
+ .addReg(Reg2);
+ } else
+ return false;
+ }
+
+ // convert qf32 to qf16
+ generateQF16FromQF32(MI, Dest, output_mpy);
+
+ return true;
+}
+
+// Convert to IEEE and perform qf32 = vmpy(qf16/hf, qf16) conditionally
+// Final output is qf32 type
+bool HexagonXQFloatGenerator::convertWidenMultOp32(MachineInstr &MI,
+ Register &Reg1,
+ Register &Reg2,
+ Register &Dest,
+ bool twoOps) {
+ Register VR1, VR2;
+ bool firstconvert = false,
+ secondconvert = false; // normalize with hf or qf16 operands
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ // If the first operand is from add/subtract/multiply unit, generate IEEE
+ // conversion instruction
+ if (checkIfInputFromAdder16(Reg1) || checkIfInputFromMult16(Reg1)) {
+ VR1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR1).addReg(Reg1);
+ firstconvert = true;
+ }
+
+ if (twoOps) {
+ if (checkIfInputFromAdder16(Reg2) || checkIfInputFromMult16(Reg2)) {
+ VR2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vconv_hf_qf16), VR2)
+ .addReg(Reg2);
+ secondconvert = true;
+ }
+ }
+
+ if (twoOps) {
+ // Both operands have been converted to IEEE
+ if (firstconvert && secondconvert) {
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), Dest)
+ .addReg(VR1)
+ .addReg(VR2);
+ // Only one operand has been converted to IEEE
+ } else if (firstconvert) {
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), Dest)
+ .addReg(Reg2)
+ .addReg(VR1);
+ } else if (secondconvert) {
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_mix_hf), Dest)
+ .addReg(Reg1)
+ .addReg(VR2);
+ } else
+ // Neither have to be transformed
+ return false;
+ } else {
+ if (firstconvert)
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32_hf), Dest)
+ .addReg(VR1)
+ .addReg(Reg2);
+ else
+ return false;
+ }
+
+ return true;
+}
+
+// Normalize instructions of type qf32 = vmpy(qf32, qf32)
+bool HexagonXQFloatGenerator::normalizeMultiplicationInputF32(
+ MachineInstr &MI, Register &Reg1, Register &Reg2, Register &Dest,
+ Register &R_mpy, bool &PrologCreated) {
+ bool firstconvert = false, secondconvert = false;
+ MachineBasicBlock &MBB = *MI.getParent();
+ const DebugLoc &DL = MI.getDebugLoc();
+
+ // We normalize only that operand which comes from add/subtract unit.
+ if (checkIfInputFromAdder32(Reg1))
+ firstconvert = true;
+ if (checkIfInputFromAdder32(Reg2))
+ secondconvert = true;
+
+ // v81 normalization
+ if (HST->useHVXV81Ops())
+ return V81normalizeMultF32(MI, Reg1, Reg2, Dest, firstconvert,
+ secondconvert, false);
+
+ // create normalization operand conditionally for v79
+ if ((!PrologCreated && (firstconvert || secondconvert))) {
+ createPrologInstructions(MI, R_mpy);
+ PrologCreated = true;
+ }
+
+ Register input_mpy1, input_mpy2;
+
+ // Normalize both input operands
+ if (firstconvert && secondconvert) {
+ input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy1)
+ .addReg(R_mpy)
+ .addReg(Reg1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy2)
+ .addReg(R_mpy)
+ .addReg(Reg2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(input_mpy2);
+ // Normalize only first operand
+ } else if (firstconvert) {
+ input_mpy1 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy1)
+ .addReg(R_mpy)
+ .addReg(Reg1);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy1)
+ .addReg(Reg2);
+ // Normalize only second operand
+ } else if (secondconvert) {
+ input_mpy2 = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vadd_qf32), input_mpy2)
+ .addReg(R_mpy)
+ .addReg(Reg2);
+ BuildMI(MBB, MI, DL, HII->get(Hexagon::V6_vmpy_qf32), Dest)
+ .addReg(input_mpy2)
+ .addReg(Reg1);
+ } else {
+ // we do nothing if the inputs are not from adder/subtracter/multiplier unit
+ return false;
+ }
+
+ return true;
+}
+
+bool HexagonXQFloatGenerator::deleteList() {
+ if (OriginalMI.empty())
+ return false;
+ bool Changed = false;
+ for (MachineInstr *origMI : OriginalMI) {
+ LLVM_DEBUG(dbgs() << "deleting redundant instruction");
+ LLVM_DEBUG(origMI->dump());
+ origMI->eraseFromParent();
+ Changed = true;
+ }
+ OriginalMI.clear();
+ return Changed;
+}
+
+// Parent function to handle Loosy subnormal transformations
+bool HexagonXQFloatGenerator::HandleLossySubnormals(MachineFunction &MF) {
+ bool Changed = false;
+ Register R_mpy;
+ for (auto &MBB : MF) {
+ bool PrologCreated = false;
+ for (auto &MI : MBB) {
+ Changed |= deleteList();
+ // Skip if the instruction does not have two operands,
+ // or is a bundle instruction
+ // or is a debug instruction
+ if (MI.getNumOperands() != 3 || MI.isDebugInstr())
+ continue;
+ auto Op1 = MI.getOperand(1);
+ if (!Op1.isReg())
+ continue;
+ auto Op2 = MI.getOperand(2);
+ if (!Op2.isReg())
+ continue;
+ auto Op0 = MI.getOperand(0);
+ if (!Op0.isReg())
+ continue;
+ Register Reg1 = Op1.getReg();
+ Register Reg2 = Op2.getReg();
+ Register Dest = Op0.getReg();
+
+ // FIXME Do not process physical registers as operands
+ if (!Reg1.isVirtual() || !Reg2.isVirtual() || !Dest.isVirtual())
+ continue;
+
+ switch (MI.getOpcode()) {
+ // qf32 = vmpy(qf32, qf32)
+ // Normalize one or both input operands
+ // if from add/sub unit
+ case Hexagon::V6_vmpy_qf32:
+ if (normalizeMultiplicationInputF32(MI, Reg1, Reg2, Dest, R_mpy,
+ PrologCreated))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // qf16 = vmpy(qf16, qf16)
+ // Widening multiply to qf32 and convert back to qf16
+ // if any of the operands are from add/sub unit
+ case Hexagon::V6_vmpy_qf16:
+ if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, Rt.hf)
+ // Splat Rt to vector and then widening multiply
+ // and then convert back to qf16
+ // if first operand is from add/sub unit
+ case Hexagon::V6_vmpy_rt_qf16:
+ if (widenMultiplicationInputF16Rt(MI, Reg1, Reg2, Dest))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, hf)
+ // Widening multiply to qf32 and convert back to qf16
+ // if first operand is from add/sub unit
+ case Hexagon::V6_vmpy_qf16_mix_hf:
+ if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ // Check if use of qf32 generating add/sub/mul instructions
+ // are used as non-HVX operands.
+ // If yes, convert the use to IEEE
+ case Hexagon::V6_vadd_sf:
+ case Hexagon::V6_vadd_qf32:
+ case Hexagon::V6_vadd_qf32_mix:
+ case Hexagon::V6_vsub_sf:
+ case Hexagon::V6_vsub_qf32:
+ case Hexagon::V6_vsub_qf32_mix:
+ case Hexagon::V6_vsub_sf_mix:
+ case Hexagon::V6_vmpy_qf32_qf16:
+ case Hexagon::V6_vmpy_qf32_hf:
+ case Hexagon::V6_vmpy_qf32_mix_hf:
+ case Hexagon::V6_vmpy_rt_sf:
+ case Hexagon::V6_vmpy_qf32_sf:
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ // Check if use of qf16 generating add/sub/mul instructions
+ // are used as non-HVX operands.
+ // If yes, convert the use to IEEE
+ case Hexagon::V6_vadd_hf:
+ case Hexagon::V6_vsub_hf:
+ case Hexagon::V6_vadd_qf16:
+ case Hexagon::V6_vsub_qf16:
+ case Hexagon::V6_vadd_qf16_mix:
+ case Hexagon::V6_vsub_qf16_mix:
+ case Hexagon::V6_vsub_hf_mix:
+ case Hexagon::V6_vmpy_qf16_hf:
+ case Hexagon::V6_vmpy_rt_hf:
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+ if (OriginalMI.empty() || !Changed)
+ return false;
+ return true;
+}
+
+// Parent function to handle all IEEE-754 compliant transformations
+bool HexagonXQFloatGenerator::HandleCompliantIEEE(MachineFunction &MF) {
+ bool Changed = false;
+ Register R_mpy;
+ for (auto &MBB : MF) {
+ bool PrologCreated = false;
+ for (auto &MI : MBB) {
+ Changed |= deleteList();
+ // Skip if the instruction does not have two operands,
+ // or is a bundle instruction
+ // or is a debug instruction
+ if (MI.getNumOperands() != 3 || MI.isDebugInstr())
+ continue;
+
+ auto Op1 = MI.getOperand(1);
+ if (!Op1.isReg())
+ continue;
+ auto Op2 = MI.getOperand(2);
+ if (!Op2.isReg())
+ continue;
+ auto Op0 = MI.getOperand(0);
+ if (!Op0.isReg())
+ continue;
+ Register Reg1 = Op1.getReg();
+ Register Reg2 = Op2.getReg();
+ Register Dest = Op0.getReg();
+ Register VRtSplat;
+
+ // FIXME Do not process physical registers as operands
+ if (!Reg1.isVirtual() || !Reg2.isVirtual() || !Dest.isVirtual())
+ continue;
+
+ switch (MI.getOpcode()) {
+
+ // ==== Handle multiplication instructions ====
+
+ // qf32 = vmpy(sf, Rt.sf)
+ // Splat Rt to a vector
+ // Normalize both input operands unconditionally
+ case Hexagon::V6_vmpy_rt_sf:
+ VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
+ VRtSplat)
+ .addReg(Reg2);
+ normalizeMultiplicationInputSF(MI, Reg1, VRtSplat, Dest, R_mpy,
+ PrologCreated);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // qf32 = vmpy(sf, sf)
+ // Normalize both operands unconditionally
+ case Hexagon::V6_vmpy_qf32_sf:
+ normalizeMultiplicationInputSF(MI, Reg1, Reg2, Dest, R_mpy,
+ PrologCreated);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // qf32 = vmpy(qf32, qf32)
+ // Normalize one or both input operands
+ // if from add/sub unit
+ case Hexagon::V6_vmpy_qf32:
+ if (normalizeMultiplicationInputF32(MI, Reg1, Reg2, Dest, R_mpy,
+ PrologCreated))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // qf16 = vmpy(hf, rt)
+ // Splat Rt to vector and then widening multiply
+ case Hexagon::V6_vmpy_rt_hf:
+ VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
+ VRtSplat)
+ .addReg(Reg2);
+ widenMultiplyInputHF(MI, Reg1, VRtSplat, Dest);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // Widening multiply
+ // qf16 = vmpy(hf, hf)
+ case Hexagon::V6_vmpy_qf16_hf:
+ widenMultiplyInputHF(MI, Reg1, Reg2, Dest);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, qf16)
+ // Widening multiply to qf32 and convert back to qf16
+ // if any of the operands are from add/sub unit
+ case Hexagon::V6_vmpy_qf16:
+ if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, Rt.hf)
+ // Splat Rt to vector and then widening multiply
+ // and then convert back to qf16
+ // if first operand is from add/sub unit
+ case Hexagon::V6_vmpy_rt_qf16:
+ if (widenMultiplicationInputF16Rt(MI, Reg1, Reg2, Dest))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, hf)
+ // Widening multiply to qf32 and convert back to qf16
+ // if first operand is from add/sub unit
+ case Hexagon::V6_vmpy_qf16_mix_hf:
+ if (widenMultiplicationInputF16(MI, Reg1, Reg2, Dest, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // Check if use of qf32/qf16 generating add/sub/mul
+ // instructions are used as non-HVX operands.
+ // If yes, convert the use to IEEE
+ case Hexagon::V6_vadd_sf:
+ case Hexagon::V6_vadd_qf32:
+ case Hexagon::V6_vadd_qf32_mix:
+ case Hexagon::V6_vsub_sf:
+ case Hexagon::V6_vsub_qf32:
+ case Hexagon::V6_vsub_qf32_mix:
+ case Hexagon::V6_vsub_sf_mix:
+ case Hexagon::V6_vmpy_qf32_qf16:
+ case Hexagon::V6_vmpy_qf32_hf:
+ case Hexagon::V6_vmpy_qf32_mix_hf:
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ case Hexagon::V6_vadd_hf:
+ case Hexagon::V6_vsub_hf:
+ case Hexagon::V6_vadd_qf16:
+ case Hexagon::V6_vsub_qf16:
+ case Hexagon::V6_vadd_qf16_mix:
+ case Hexagon::V6_vsub_qf16_mix:
+ case Hexagon::V6_vsub_hf_mix:
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+ if (OriginalMI.empty() || !Changed)
+ return false;
+ return true;
+}
+
+// Parent function to do strict IEEE transformations
+bool HexagonXQFloatGenerator::HandleStrictIEEE(MachineFunction &MF) {
+
+ bool Changed = false;
+ Register R_mpy;
+ for (auto &MBB : MF) {
+ bool PrologCreated = false;
+ for (auto &MI : MBB) {
+ Changed |= deleteList();
+ // Skip if the instruction does not have two operands,
+ // or is a bundle instruction
+ // or is a debug instruction
+ if (MI.getNumOperands() != 3 || MI.isDebugInstr())
+ continue;
+
+ auto Op1 = MI.getOperand(1);
+ if (!Op1.isReg())
+ continue;
+ auto Op2 = MI.getOperand(2);
+ if (!Op2.isReg())
+ continue;
+ auto Op0 = MI.getOperand(0);
+ if (!Op0.isReg())
+ continue;
+ Register Reg1 = Op1.getReg();
+ Register Reg2 = Op2.getReg();
+ Register Dest = Op0.getReg();
+ Register VRtSplat;
+
+ // FIXME Do not process physical registers as operands
+ if (!Reg1.isVirtual() || !Reg2.isVirtual() || !Dest.isVirtual())
+ continue;
+
+ switch (MI.getOpcode()) {
+ // ==== Handle add/subtract instructions ====
+ // Convert one or both the input operands to IEEE 32-bit
+ // if from add/sub/mult unit(s)
+ // qf32 = vadd(qf32, qf32)
+ case Hexagon::V6_vadd_qf32:
+ if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, true, true, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ // qf32 = vsub(qf32, qf32)
+ case Hexagon::V6_vsub_qf32:
+ if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, false, true, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ // Convert only the first input operand to IEEE 32-bit
+ // if it is from add/sub/mult unit
+ // qf32 = vadd(qf32, sf)
+ case Hexagon::V6_vadd_qf32_mix:
+ if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, true, true, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ // qf32 = vsub(qf32, sf)
+ case Hexagon::V6_vsub_qf32_mix:
+ if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, false, true, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ // qf32 = vsub(sf, qf32)
+ case Hexagon::V6_vsub_sf_mix:
+ if (convertAddOpToIEEE32(MI, Reg1, Reg2, Dest, false, false, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ break;
+
+ // Convert one or both the input operands to IEEE 16-bit
+ // if from add/sub/mult unit(s)
+ // qf16 = vadd(qf16, qf16)
+ case Hexagon::V6_vadd_qf16:
+ if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, true, true, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ // qf16 = vsub(qf16, qf16)
+ case Hexagon::V6_vsub_qf16:
+ if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, false, true, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ // Convert only the first input operand IEEE 16-bit
+ // if it is from add/sub/mul unit
+ // qf16 = vadd(qf16, hf)
+ case Hexagon::V6_vadd_qf16_mix:
+ if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, true, true, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ // qf16 = vsub(qf16, hf)
+ case Hexagon::V6_vsub_qf16_mix:
+ if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, false, true, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ // qf16 = vsub(hf, qf16)
+ case Hexagon::V6_vsub_hf_mix:
+ if (convertAddOpToIEEE16(MI, Reg1, Reg2, Dest, false, false, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // ==== Handle multiplication instructions ====
+
+ // qf32 = vmpy(sf, Rt.sf)
+ // Splat Rt to a vector
+ // Normalize both input operands unconditionally
+ case Hexagon::V6_vmpy_rt_sf:
+ VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
+ VRtSplat)
+ .addReg(Reg2);
+ normalizeMultiplicationInputSF(MI, Reg1, VRtSplat, Dest, R_mpy,
+ PrologCreated);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // Normalize both operands unconditionally
+ // qf32 = vmpy(sf, sf)
+ case Hexagon::V6_vmpy_qf32_sf:
+ normalizeMultiplicationInputSF(MI, Reg1, Reg2, Dest, R_mpy,
+ PrologCreated);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ OriginalMI.push_back(&MI);
+ break;
+
+ // Convert one or both input operands to IEEE 32-bit
+ // if from add/sub/mult unit and normalize
+ // qf32 = vmpy(qf32, qf32)
+ case Hexagon::V6_vmpy_qf32:
+ if (convertNormalizeMultOp32(MI, Reg1, Reg2, Dest, R_mpy,
+ PrologCreated))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // Convert one or both input operands to IEEE 16-bit
+ // if from mul/add/sub unit;
+ // then widening multiply to generate qf32
+ // then convert to qf16
+ // qf16 = vmpy(qf16, qf16)
+ case Hexagon::V6_vmpy_qf16:
+ if (convertWidenMultOp16(MI, Reg1, Reg2, Dest, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // Convert one or both input operands to IEEE 16-bit
+ // if from mul/add/sub unit;
+ // then widening multiply to generate qf32
+ // qf32 = vmpy(qf16, qf16)
+ case Hexagon::V6_vmpy_qf32_qf16:
+ if (convertWidenMultOp32(MI, Reg1, Reg2, Dest, true))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+
+ // qf16 = vmpy(hf, rt)
+ // Splat Rt to vector and then widening multiply
+ case Hexagon::V6_vmpy_rt_hf:
+ VRtSplat = MRI->createVirtualRegister(&Hexagon::HvxVRRegClass);
+ BuildMI(MBB, MI, MI.getDebugLoc(), HII->get(Hexagon::V6_lvsplatw),
+ VRtSplat)
+ .addReg(Reg2);
+ widenMultiplyInputHF(MI, Reg1, VRtSplat, Dest);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // Widening multiply, then convert to IEEE
+ // qf16 = vmpy(hf, hf)
+ case Hexagon::V6_vmpy_qf16_hf:
+ widenMultiplyInputHF(MI, Reg1, Reg2, Dest);
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, Rt.hf)
+ // Splat Rt to vector and then widening multiply
+ // and then convert back to qf16
+ // if first operand is from add/sub unit
+ case Hexagon::V6_vmpy_rt_qf16:
+ if (widenMultiplicationInputF16Rt(MI, Reg1, Reg2, Dest))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf16 = vmpy(qf16, hf)
+ // Convert only the first input operans to IEEE 16-bit
+ // if from mul/add/sub unit;
+ // then widening multiply to generate qf32
+ // then convert back to qf16
+ case Hexagon::V6_vmpy_qf16_mix_hf:
+ if (convertWidenMultOp16(MI, Reg1, Reg2, Dest, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+
+ // qf32 = vmpy(qf16, hf)
+ // Convert only the first input operans to IEEE 16-bit
+ // if from mul/add/sub unit;
+ // then widening multiply to generate qf32
+ case Hexagon::V6_vmpy_qf32_mix_hf:
+ if (convertWidenMultOp32(MI, Reg1, Reg2, Dest, false))
+ OriginalMI.push_back(&MI);
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ // Check if use of qf32/qf16 generating add/sub/mul
+ // instructions are used as non-HVX operands.
+ // If yes, convert the use to IEEE
+ case Hexagon::V6_vadd_sf:
+ case Hexagon::V6_vsub_sf:
+ Changed |= convertIfInputToNonHVX(MI, true);
+ break;
+ case Hexagon::V6_vadd_hf:
+ case Hexagon::V6_vsub_hf:
+ Changed |= convertIfInputToNonHVX(MI, false);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+ if (OriginalMI.empty() || !Changed)
+ return false;
+ return true;
+}
+
+// There is no conversions in lossy mode
+bool HexagonXQFloatGenerator::HandleLossyLegacy(MachineFunction &MF) {
+ return false;
+}
+
+bool HexagonXQFloatGenerator::runOnMachineFunction(MachineFunction &MF) {
+ if (!EnableHVXXQFloat || (QFloatModeValue == QFloatMode::Legacy))
+ return false;
+
+ bool Changed = false;
+ HST = &MF.getSubtarget<HexagonSubtarget>();
+ HII = HST->getInstrInfo();
+ MRI = &MF.getRegInfo();
+
+ if (EnableConversionsRemoval &&
+ !(QFloatModeValue == QFloatMode::StrictIEEE)) {
+ VectorConvertRemove VCR(MF, MRI, HST);
+ VCR.run();
+ LLVM_DEBUG(dbgs() << "\nExtraneous conversion instructions removed for "
+ << MF.getName());
+ if (PrintDebug)
+ debug_print(MF);
+ }
+
+ switch (QFloatModeValue) {
+ case QFloatMode::StrictIEEE:
+ LLVM_DEBUG(dbgs() << "\nGenerating code for STRICT-IEEE mode.\n");
+ Changed = HandleStrictIEEE(MF);
+ break;
+ case QFloatMode::IEEE:
+ LLVM_DEBUG(dbgs() << "\nGenerating code for IEEE mode.\n");
+ Changed = HandleCompliantIEEE(MF);
+ break;
+ case QFloatMode::Lossy:
+ LLVM_DEBUG(dbgs() << "\nGenerating code for LOSSY mode.\n");
+ Changed = HandleLossySubnormals(MF);
+ break;
+ case QFloatMode::Legacy:
+ LLVM_DEBUG(dbgs() << "\nGenerating code for LEGACY mode.\n");
+ Changed = HandleLossyLegacy(MF);
+ break;
+ }
+ LLVM_DEBUG(dbgs() << "...fine");
+
+ // Delete the original instructions
+ for (MachineInstr *origMI : OriginalMI) {
+ LLVM_DEBUG(origMI->dump());
+ origMI->eraseFromParent();
+ }
+ OriginalMI.clear();
+
+ return Changed;
+}
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-add-qf.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-add-qf.ll
new file mode 100644
index 0000000000000..30707d01c0ba6
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-add-qf.ll
@@ -0,0 +1,157 @@
+; Tests strict-ieee mode for XQFloat for inputs to add/sub 16 and 32-bits. Tests for only Strict-IEEE mode.
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -hexagon-qfloat-mode=strict-ieee \
+; RUN: -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s --enable-var-scope
+
+; Test qf16 = vadd(hf ,hf) when no input is from adder
+define <64 x half> @add_hf(<64 x half> %a0, <64 x half> %a1) #0 {
+; CHECK-LABEL: add_hf
+; CHECK: {{v[0-9]+}}.qf16 = vadd(v0.hf,v1.hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ ret <64 x half> %v0
+}
+
+; Test qf32 = vadd(sf ,sf) when no input is from adder
+define <32 x float> @add_sf(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: add_sf
+; CHECK: {{v[0-9]+}}.qf32 = vadd(v0.sf,v1.sf)
+label1:
+ %v0 = fadd <32 x float> %a0, %a1
+ ret <32 x float> %v0
+}
+
+; Test qf16 = vadd(qf16 ,hf) when first input is from vadd instruction
+define <64 x half> @add_qf16_1(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: add_qf16_1
+; CHECK: [[V0:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK: [[V1:v[0-9]+]].hf = [[V0]].qf16
+; CHECK: qf16 = vadd([[V1]].hf,v2.hf)
+label2:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fadd <64 x half> %v0, %a2
+ ret <64 x half> %v1
+}
+
+; Test qf32 = vadd(qf32 ,sf) when first input is from vadd instruction
+define <32 x float> @add_qf32_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: add_qf32_1
+; CHECK: [[V0:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK: [[V1:v[0-9]+]].sf = [[V0]].qf32
+; CHECK: qf32 = vadd([[V1]].sf,v2.sf)
+label3:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %v0, %a2
+ ret <32 x float> %v1
+}
+
+; Test qf16 = vadd(qf16 ,qf16) when both inputs are from vadd instruction
+define <64 x half> @add_qf16_2(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: add_qf16_2
+; CHECK-DAG: [[V0:v[0-9]+]].qf16 = vsub(v0.hf,v1.hf)
+; CHECK-DAG: [[V2:v[0-9]+]].hf = [[V0]].qf16
+; CHECK-DAG: [[V1:v[0-9]+]].qf16 = vadd(v0.hf,v2.hf)
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V1]].qf16
+; CHECK: qf16 = vadd([[V2]].hf,[[V3]].hf)
+label4:
+ %v1 = fadd <64 x half> %a0, %a2
+ %v2 = fsub <64 x half> %a0, %a1
+ %v3 = fadd <64 x half> %v2, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf32 = vadd(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @add_qf32_2(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: add_qf32_2
+; CHECK-DAG: [[V0:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V1:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V2:v[0-9]+]].sf = [[V0]].qf32
+; CHECK-DAG: [[V3:v[0-9]+]].sf = [[V1]].qf32
+; CHECK: qf32 = vadd([[V3]].sf,[[V2]].sf)
+label5:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v2 = fadd <32 x float> %a0, %a1
+ %v3 = fadd <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+; Test qf16 = vsub(hf , hf) when no input is from adder
+define <64 x half> @sub_hf(<64 x half> %a0, <64 x half> %a1) #0 {
+; CHECK-LABEL: sub_hf
+; CHECK: {{v[0-9]+}}.qf16 = vsub(v0.hf,v1.hf)
+label6:
+ %v0 = fsub <64 x half> %a0, %a1
+ ret <64 x half> %v0
+}
+
+; Test qf32 = vsub(sf ,sf) when no input is from adder
+define <32 x float> @sub_sf(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: sub_sf
+; CHECK: {{v[0-9]+}}.qf32 = vsub(v0.sf,v1.sf)
+label7:
+ %v0 = fsub <32 x float> %a0, %a1
+ ret <32 x float> %v0
+}
+
+; Test qf16 = vsub(qf16 ,hf) when first input is from vsub instruction
+define <64 x half> @sub_qf16_1(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: sub_qf16_1
+; CHECK: [[V0:v[0-9]+]].qf16 = vsub(v0.hf,v1.hf)
+; CHECK: [[V1:v[0-9]+]].hf = [[V0]].qf16
+; CHECK: qf16 = vsub([[V1]].hf,v2.hf)
+label8:
+ %v0 = fsub <64 x half> %a0, %a1
+ %v1 = fsub <64 x half> %v0, %a2
+ ret <64 x half> %v1
+}
+
+; Test qf32 = vsub(qf32 ,sf) when first input is from vsub instruction
+define <32 x float> @sub_qf32_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: sub_qf32_1
+; CHECK: [[V0:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK: [[V1:v[0-9]+]].sf = [[V0]].qf32
+; CHECK: qf32 = vsub([[V1]].sf,v2.sf)
+label9:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %v0, %a2
+ ret <32 x float> %v1
+}
+
+; Test qf16 = vsub(qf16 ,qf16) when both inputs are from vadd/vsub instruction
+define <64 x half> @sub_qf16_2(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: sub_qf16_2
+; CHECK-DAG: [[V0:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V1:v[0-9]+]].qf16 = vadd(v0.hf,v2.hf)
+; CHECK-DAG: [[V2:v[0-9]+]].hf = [[V0]].qf16
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V1]].qf16
+; CHECK: qf16 = vsub([[V2]].hf,[[V3]].hf)
+label10:
+ %v1 = fadd <64 x half> %a0, %a2
+ %v2 = fadd <64 x half> %a0, %a1
+ %v3 = fsub <64 x half> %v2, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf32 = vsub(qf32 ,qf32) when first input is from vadd/vsub instruction
+define <32 x float> @add_mul_qf32_2(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: add_mul_qf32_2
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V0:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V1:v[0-9]+]] = vxor([[V1]],[[V1]])
+; CHECK-DAG: [[V2:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V3:v[0-9]+]].sf = [[V0]].qf32
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vmpy([[V1]].sf,[[V2]].sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd([[V4]].qf32,v0.sf)
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vadd([[V4]].qf32,v1.sf)
+; CHECK: [[V7:v[0-9]+]].qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+; CHECK: [[V8:v[0-9]+]].sf = [[V7]].qf32
+; CHECK: qf32 = vsub([[V8]].sf,[[V3]].sf)
+label11:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v0 = fsub <32 x float> %v2, %v1
+ ret <32 x float> %v0
+}
+
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,+hvx-qfloat,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-assertion1.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-assertion1.ll
new file mode 100644
index 0000000000000..90d0790989388
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-assertion1.ll
@@ -0,0 +1,84 @@
+; On v79 and above, checks for Assertion `isImm() && "Wrong MachineOperand accessor"' failed
+
+; RUN: llc -march=hexagon -enable-xqf-gen=true -enable-rem-conv=true \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -o /dev/null < %s
+; RUN: llc -march=hexagon -enable-xqf-gen=true -enable-rem-conv=true \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -o /dev/null < %s
+
+
+ at .str.1 = private unnamed_addr constant [66 x i8] c"hvx_ieee_fp_test.c:39 0 && \22ERROR: Failed to acquire HVX unit.\\n\22\00", align 1
+ at __func__.main = private unnamed_addr constant [5 x i8] c"main\00", align 1
+ at .str.2 = private unnamed_addr constant [33 x i8] c"half -3 converted to vhf = %.2f\0A\00", align 1
+ at .str.3 = private unnamed_addr constant [35 x i8] c"uhalf 32k converted to vhf = %.2f\0A\00", align 1
+ at str = private unnamed_addr constant [35 x i8] c"ERROR: Failed to acquire HVX unit.\00", align 1
+
+; Function Attrs: nounwind
+define dso_local i32 @main(i32 noundef %argc, ptr nocapture noundef readnone %argv) local_unnamed_addr #0 {
+entry:
+ %call = tail call i32 @acquire_vector_unit(i8 noundef zeroext 0) #6
+ %tobool.not = icmp eq i32 %call, 0
+ br i1 %tobool.not, label %if.then, label %if.end
+
+if.then: ; preds = %entry
+ %puts = tail call i32 @puts(ptr nonnull dereferenceable(1) @str)
+ tail call void @_Assert(ptr noundef nonnull @.str.1, ptr noundef nonnull @__func__.main) #7
+ unreachable
+
+if.end: ; preds = %entry
+ tail call void @set_double_vector_mode() #6
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 -3)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 32768)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.h.128B(<32 x i32> %0)
+ %bc.i = bitcast <32 x i32> %2 to <64 x half>
+ %3 = extractelement <64 x half> %bc.i, i64 0
+ %conv = fpext half %3 to double
+ %call5 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.2, double noundef %conv) #6
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.uh.128B(<32 x i32> %1)
+ %bc.i9 = bitcast <32 x i32> %4 to <64 x half>
+ %5 = extractelement <64 x half> %bc.i9, i64 0
+ %conv7 = fpext half %5 to double
+ %call8 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.3, double noundef %conv7) #6
+ ret i32 0
+}
+
+declare dso_local i32 @acquire_vector_unit(i8 noundef zeroext) local_unnamed_addr #1
+
+; Function Attrs: nofree nounwind
+declare dso_local noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #2
+
+; Function Attrs: noreturn nounwind
+declare dso_local void @_Assert(ptr noundef, ptr noundef) local_unnamed_addr #3
+
+declare dso_local void @set_double_vector_mode(...) local_unnamed_addr #1
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.h.128B(<32 x i32>) #4
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.uh.128B(<32 x i32>) #4
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #4
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #5
+
+attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,-long-calls" }
+attributes #1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,-long-calls" }
+attributes #2 = { nofree nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvxv79,+v79,-long-calls" }
+attributes #3 = { noreturn nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvxv79,+v79,-long-calls" }
+attributes #4 = { nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #5 = { nofree nounwind }
+attributes #6 = { nounwind }
+attributes #7 = { noreturn nounwind }
+
+!llvm.module.flags = !{!0, !1, !2}
+!llvm.ident = !{!6}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 7, !"frame-pointer", i32 2}
+!2 = !{i32 5, !"CG MDInfo", !3}
+!3 = !{!4, !5}
+!4 = !{!"F", !"no_filename_available", !"", !"", i1 false, !""}
+!5 = !{!"C", !"set_double_vector_mode", !"(void)", !"(...)", i1 true, !""}
+!6 = !{!"QuIC LLVM Hexagon Clang version 8.8-alpha3 Engineering Release: hexagon-clang-88"}
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-check-free-reg.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-check-free-reg.ll
new file mode 100644
index 0000000000000..3a14918a93d49
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-check-free-reg.ll
@@ -0,0 +1,110 @@
+; Tests whether a register is found from the pool of non-live
+; registers. This reg is used to store zeroes for using in converts.
+
+; REQUIRES: asserts
+; RUN: llc --mtriple=hexagon-- -O0 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -debug-only=handle-qfp -o /dev/null < %s 2>&1 | FileCheck %s
+
+; CHECK: Analyzing convert instruction: renamable [[VREG0:\$v[0-9]+]] = V6_vconv_hf_qf16 killed renamable [[VREG0]]
+; CHECK: Using V30 register to store a vector of zeroes
+; CHECK: Inserting new instruction: [[VREG1:\$v[0-9]+]] = V6_vd0
+; CHECK: Inserting new instruction: [[VREG0]] = V6_vadd_hf killed renamable [[VREG0]], killed [[VREG1]]
+
+ at .str.1 = private unnamed_addr constant [9 x i8] c"0x%08lx \00", align 1
+ at .str.3 = private unnamed_addr constant [62 x i8] c"qfloat_test.c:135 0 && \22ERROR: Failed to acquire HVX unit.\\n\22\00", align 1
+ at __func__.main = private unnamed_addr constant [5 x i8] c"main\00", align 1
+ at .str.4 = private unnamed_addr constant [44 x i8] c"The sum of hf %.3f and hf %.3f is %.3f\0A\00", align 1
+ at str = private unnamed_addr constant [35 x i8] c"ERROR: Failed to acquire HVX unit.\00", align 1
+
+; Function Attrs: nofree nounwind optsize
+define dso_local void @print_vector_words(<32 x i32> noundef %x) local_unnamed_addr #0 {
+entry:
+ br label %for.body
+
+for.cond.cleanup: ; preds = %if.end
+ %putchar = tail call i32 @putchar(i32 10)
+ ret void
+
+for.body: ; preds = %entry, %if.end
+ %i.07 = phi i32 [ 0, %entry ], [ %inc, %if.end ]
+ %rem = and i32 %i.07, 7
+ %tobool.not = icmp eq i32 %rem, 0
+ br i1 %tobool.not, label %if.then, label %if.end
+
+if.then: ; preds = %for.body
+ %putchar6 = tail call i32 @putchar(i32 10)
+ br label %if.end
+
+if.end: ; preds = %if.then, %for.body
+ %vecext = extractelement <32 x i32> %x, i32 %i.07
+ %call1 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.1, i32 noundef %vecext) #6
+ %inc = add nuw nsw i32 %i.07, 1
+ %exitcond.not = icmp eq i32 %inc, 32
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+; Function Attrs: nofree nounwind optsize
+declare dso_local noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #0
+
+; Function Attrs: nounwind optsize
+define dso_local i32 @main(i32 noundef %argc, ptr nocapture noundef readnone %argv) local_unnamed_addr #1 {
+entry:
+ %call = tail call i32 @acquire_vector_unit(i8 noundef zeroext 0) #6
+ %tobool.not = icmp eq i32 %call, 0
+ br i1 %tobool.not, label %if.then, label %if.end
+
+if.then: ; preds = %entry
+ %puts = tail call i32 @puts(ptr nonnull dereferenceable(1) @str)
+ tail call void @_Assert(ptr noundef nonnull @.str.3, ptr noundef nonnull @__func__.main) #7
+ unreachable
+
+if.end: ; preds = %entry
+ tail call void @set_double_vector_mode() #6
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 14336)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 13312)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %bc.i = bitcast <32 x i32> %0 to <64 x half>
+ %3 = extractelement <64 x half> %bc.i, i64 0
+ %conv = fpext half %3 to double
+ %bc.i18 = bitcast <32 x i32> %1 to <64 x half>
+ %4 = extractelement <64 x half> %bc.i18, i64 0
+ %conv12 = fpext half %4 to double
+ %5 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %bc.i.i = bitcast <32 x i32> %5 to <64 x half>
+ %6 = extractelement <64 x half> %bc.i.i, i64 0
+ %conv14 = fpext half %6 to double
+ %call15 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.4, double noundef %conv, double noundef %conv12, double noundef %conv14) #6
+ ret i32 0
+}
+
+; Function Attrs: optsize
+declare dso_local i32 @acquire_vector_unit(i8 noundef zeroext) local_unnamed_addr #2
+
+; Function Attrs: noreturn nounwind optsize
+declare dso_local void @_Assert(ptr noundef, ptr noundef) local_unnamed_addr #3
+
+; Function Attrs: optsize
+declare dso_local void @set_double_vector_mode(...) local_unnamed_addr #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32>) #4
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @putchar(i32 noundef) local_unnamed_addr #5
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #5
+
+attributes #0 = { nofree nounwind optsize "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" }
+attributes #1 = { nounwind optsize "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" }
+attributes #2 = { optsize "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" }
+attributes #3 = { noreturn nounwind optsize "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" }
+attributes #4 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #5 = { nofree nounwind }
+attributes #6 = { nounwind optsize }
+attributes #7 = { noreturn nounwind optsize }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-check-qf-instrs.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-check-qf-instrs.ll
new file mode 100644
index 0000000000000..96e497a571cb5
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-check-qf-instrs.ll
@@ -0,0 +1,73 @@
+; For non-legacy mode, the presence of qf instructions is not checked. Post RA pass is run and
+; V30 register is reserved.
+; For legacy mode, it is checked if the function contains qf generating instructions, only
+; then V30 register is reserved and the postRA pass run.
+
+; REQUIRES: asserts
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=ieee -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=IEEE
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=ieee -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=IEEE
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=strict-ieee -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=STRICT
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=strict-ieee -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=STRICT
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=lossy -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=LOSSY
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=lossy -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=LOSSY
+
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=LEGACY
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -debug-only=handle-qfp -o /dev/null < %s \
+; RUN: 2>&1 | FileCheck %s --check-prefix=LEGACY
+
+define dso_local <32 x i32> @test1(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; IEEE: Entering Hexagon Fixup QF spills and refills pass
+; STRICT: Entering Hexagon Fixup QF spills and refills pass
+; LOSSY: Entering Hexagon Fixup QF spills and refills pass
+; LEGACY: Entering Hexagon Fixup QF spills and refills pass
+; IEEE : Handling spills
+; STRICT: Handling spills
+; LOSSY: Handling spills
+; LEGACY: Handling spills
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+define dso_local <32 x i32> @test2(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; IEEE: Entering Hexagon Fixup QF spills and refills pass
+; STRICT: Entering Hexagon Fixup QF spills and refills pass
+; LOSSY: Entering Hexagon Fixup QF spills and refills pass
+; LEGACY: Entering Hexagon Fixup QF spills and refills pass
+; IEEE: Handling spills
+; STRICT: Handling spills
+; LOSSY: Handling spills
+; LEGACY-NOT: Handling spills
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = add nsw <32 x i32> %0, %1
+ ret <32 x i32> %2
+}
+
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32>, <32 x i32>) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf16.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf16.ll
new file mode 100644
index 0000000000000..ed4aaa5ccf076
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf16.ll
@@ -0,0 +1,86 @@
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-rem-conv=true \
+; RUN: -enable-xqf-gen=true -hexagon-qfloat-mode=ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-rem-conv=true \
+; RUN: -enable-xqf-gen=true -hexagon-qfloat-mode=ieee -mattr=+hvxv81,+hvx-length128B < %s | FileCheck %s
+
+; Test qf16 = vmpy(qf16 ,qf16) when both inputs are from vadd instruction
+define <64 x half> @mul_add_3(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vadd(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy([[V4]].qf16,[[V3]].qf16)
+; CHECK-DAG: [[V6:v[0-9]+]].hf = [[V10]].qf32
+; CHECK-DAG: qf16 = vsub([[V6]].hf,[[V5]].hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fadd <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %v0, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf32 = vmpy(qf16 ,qf16) when both inputs are from vadd and vmul instruction
+define <64 x half> @mul_add_mul(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V32:v[0-9]+:[0-9]+]].qf32 = vmpy(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V32]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf16 = vsub([[V3]].hf,[[V5]].hf)
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy([[V4]].qf16,[[V6]].qf16)
+; CHECK-DAG: [[V7:v[0-9]+]].hf = [[V10]].qf32
+; CHECK: qf16 = vsub([[V7]].hf,[[V5]].hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fmul <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %v0, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(sf ,sf)
+define <64 x half> @mul_add_0(<64 x half> %a0, <64 x half> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy(v0.hf,v1.hf)
+; CHECK-DAG: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V10]].qf32
+; CHECK: qf16 = vsub([[V3]].hf,[[V2]].hf)
+label0:
+ %v3 = fmul <64 x half> %a0, %a1
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(qf16 ,qf16) when first input is from vadd instruction
+define <64 x half> @mul_add_1(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy([[V3]].qf16,v2.hf)
+; CHECK-DAG: [[V5:v[0-9]+]].hf = [[V10]].qf32
+; CHECK: qf16 = vsub([[V5]].hf,[[V4]].hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v3 = fmul <64 x half> %v0, %a2
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(qf16 ,qf16) when second input is from vadd instruction
+define <64 x half> @mul_add_2(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_2:
+; CHECK-DAG: [[V54:v[0-9]+:[0-9]+]].qf32 = vmpy(v0.hf,v2.hf)
+; CHECK-DAG: [[V29:v[0-9]+:[0-9]+]].qf32 = vmpy(v1.hf,v2.hf)
+; CHECK-DAG: [[V30:v[0-9]+]] = vxor([[V30]],[[V30]])
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V54]].qf32
+; CHECK-DAG: [[V31:v[0-9]+]].hf = [[V29]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf16 = vsub([[V3]].hf,[[V30]].hf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf16 = vsub([[V31]].hf,[[V30]].hf)
+; CHECK-DAG: [[V32:v[0-9]+:[0-9]+]].qf32 = vmpy([[V6]].qf16,[[V7]].qf16)
+; CHECK-DAG: [[V8:v[0-9]+]].hf = [[V32]].qf32
+; CHECK: qf16 = vsub([[V8]].hf,[[V30]].hf)
+label0:
+ %v1 = fmul <64 x half> %a0, %a2
+ %v2 = fmul <64 x half> %a1, %a2
+ %v3 = fmul <64 x half> %v1, %v2
+ ret <64 x half> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,+hvx-qfloat,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf32.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf32.ll
new file mode 100644
index 0000000000000..5db57fb7e6131
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-compliant-ieee-mul-qf32.ll
@@ -0,0 +1,136 @@
+; Tests compliant IEEE mode for XQFloat multiplication 32-bit
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -enable-rem-conv=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s -check-prefix=CHECK
+
+; Test qf32 = vmpy(sf, sf)
+; Normalization of inputs
+define <32 x float> @mul_add_0(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V2:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V3:v[0-9]+]] = vxor([[V3]],[[V3]])
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vmpy([[V3]].sf,[[V2]].sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd([[V4]].qf32,v0.sf)
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vadd([[V4]].qf32,v1.sf)
+; CHECK: qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+label0:
+ %v3 = fmul <32 x float> %a0, %a1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(sf ,qf32) when only one input is from vadd instruction
+define <32 x float> @mul_add_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V5:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V6:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V5]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,v1.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V6]].sf)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @mul_add_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V6:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V6]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V5]].qf32)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V3]].qf32)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when only first input is from vsub instruction
+define <32 x float> @mul_sub_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_1:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V6:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V5]].sf,[[V4]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,v1.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V6]].sf)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vsub instruction
+define <32 x float> @mul_sub_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V6:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vsub(v0.sf,v1.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V6]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V5]].qf32)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V3]].qf32)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v0 = fsub <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32, qf32) when one is from adder, another from multiplier
+define <32 x float> @mul_add_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V3:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V3]].sf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vadd([[V6]].qf32,v0.sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V6]].qf32,v1.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V6]].qf32,[[V5]].qf32)
+; CHECK-DAG: [[V10:v[0-9]+]].qf32 = vmpy([[V7]].qf32,[[V8]].qf32)
+; CHECK: qf32 = vmpy([[V9]].qf32,[[V10]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+define <32 x float> @mul_mul_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+label0:
+; CHECK-LABEL: mul_mul_mul
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V3:v[0-9]+]] = vxor([[V3]],[[V3]])
+; CHECK-DAG: [[V4:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vmpy([[V3]].sf,[[V4]].sf)
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vadd([[V5]].qf32,v0.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vadd([[V5]].qf32,v2.sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V5]].qf32,v1.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vmpy([[V6]].qf32,[[V7]].qf32)
+; CHECK-DAG: [[V10:v[0-9]+]].qf32 = vmpy([[V6]].qf32,[[V8]].qf32)
+; CHECK: qf32 = vmpy([[V9]].qf32,[[V10]].qf32)
+ %v1 = fmul <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-convert-elim.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-convert-elim.ll
new file mode 100644
index 0000000000000..cd48a7008053b
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-convert-elim.ll
@@ -0,0 +1,77 @@
+; Tests if the sf/hf = qf converts have been done correctly
+
+; REQUIRES: asserts
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat \
+; RUN: -enable-rem-conv=true -enable-xqf-gen=true -hexagon-qfloat-mode=ieee -verify-machineinstrs \
+; RUN: -debug-print < %s 2>&1 -o /dev/null | FileCheck %s
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat \
+; RUN: -enable-rem-conv=true -enable-xqf-gen=true -hexagon-qfloat-mode=lossy -verify-machineinstrs \
+; RUN: -debug-print < %s 2>&1 -o /dev/null | FileCheck %s
+
+; Single use of convert reg. The convert should be deleted.
+define dso_local <32 x i32> @conv1_qf32(<32 x i32> noundef %input1, <32 x i32> noundef %input2) local_unnamed_addr #0 {
+; CHECK: bb.0.entry
+; CHECK: [[VREG2:%[0-9]+]]:hvxvr = V6_vadd_sf [[VREG0:%[0-9]+]]:hvxvr, %1:hvxvr
+; CHECK-NOT: V6_vconv_sf_qf32 killed [[VREG2]]:hvxvr
+; CHECK: V6_vadd_qf32_mix [[VREG2]]:hvxvr, [[VREG0]]:hvxvr
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %input1, <32 x i32> %input2)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %0)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %input1, <32 x i32> %1)
+ ret <32 x i32> %2
+}
+
+; Double use of convert reg. The convert should not be deleted.
+define dso_local <32 x i32> @conv2_qf32(<32 x i32> noundef %input1, <32 x i32> noundef %input2) local_unnamed_addr #0 {
+; CHECK: bb.0.entry
+; CHECK: [[VREG2:%[0-9]+]]:hvxvr = V6_vadd_sf [[VREG0:%[0-9]+]]:hvxvr, [[VREG1:%[0-9]+]]:hvxvr
+; CHECK-NEXT: V6_vconv_sf_qf32 [[VREG2]]:hvxvr
+; CHECK-NEXT: V6_vadd_qf32_mix [[VREG2]]:hvxvr, [[VREG0]]:hvxvr
+; CHECK-NEXT: V6_vadd_qf32_mix [[VREG2]]:hvxvr, [[VREG1]]:hvxvr
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %input1, <32 x i32> %input2)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %0)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %input1, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %input2, <32 x i32> %1)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %2, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; Single use of convert reg. The convert should be deleted.
+define dso_local <32 x i32> @conv1_qf16(<32 x i32> noundef %input1, <32 x i32> noundef %input2) local_unnamed_addr #0 {
+; CHECK: bb.0.entry
+; CHECK: [[VREG2:%[0-9]+]]:hvxvr = V6_vadd_hf [[VREG0:%[0-9]+]]:hvxvr, %1:hvxvr
+; CHECK-NOT: V6_vconv_hf_qf16 killed [[VREG2]]:hvxvr
+; CHECK: V6_vadd_qf16_mix [[VREG2]]:hvxvr, [[VREG0]]:hvxvr
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %input1, <32 x i32> %input2)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %0)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %input1, <32 x i32> %1)
+ ret <32 x i32> %2
+}
+
+; Double use of convert reg. The convert should not be deleted.
+define dso_local <32 x i32> @conv2_qf16(<32 x i32> noundef %input1, <32 x i32> noundef %input2) local_unnamed_addr #0 {
+; CHECK: bb.0.entry
+; CHECK: [[VREG2:%[0-9]+]]:hvxvr = V6_vadd_hf [[VREG0:%[0-9]+]]:hvxvr, [[VREG1:%[0-9]+]]:hvxvr
+; CHECK-NEXT: V6_vconv_hf_qf16 [[VREG2]]:hvxvr
+; CHECK-NEXT: V6_vadd_qf16_mix [[VREG2]]:hvxvr, [[VREG0]]:hvxvr
+; CHECK-NEXT: V6_vadd_qf16_mix [[VREG2]]:hvxvr, [[VREG1]]:hvxvr
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %input1, <32 x i32> %input2)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %0)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %input1, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %input2, <32 x i32> %1)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.128B(<32 x i32> %2, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.128B(<32 x i32>, <32 x i32>) #1
+
+attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-long-calls,-small-data" }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-corner-case1.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-corner-case1.ll
new file mode 100644
index 0000000000000..b1a525bab2ffa
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-corner-case1.ll
@@ -0,0 +1,147 @@
+; This incorporates a condition where a qf type generated from widening
+; multiplication. The hi/lo of the result is used in an add instruction.
+; As a result a COPY instruction is generated to copy the hi/lo bits to
+; another virtual register before use in the add instr. For STRICT-IEEE
+; mode, we need to convert this to IEEE before use in add instruction,
+; and add a new add instr, deleting the original instr. We check for the
+; deletion here.
+; REQUIRES: asserts
+; RUN: llc --mtriple=hexagon-- -O2 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -enable-xqf-gen=true -hexagon-qfloat-mode=strict-ieee \
+; RUN: -debug-only=hexagon-xqf-gen 2>&1 < %s | FileCheck %s --check-prefix STRICT-IEEE
+; RUN: llc --mtriple=hexagon-- -O2 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -enable-xqf-gen=true -hexagon-qfloat-mode=ieee \
+; RUN: -debug-only=hexagon-xqf-gen 2>&1 < %s | FileCheck %s --check-prefix COMPLIANT-IEEE
+; RUN: llc --mtriple=hexagon-- -O2 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -enable-xqf-gen=true -hexagon-qfloat-mode=strict-ieee \
+; RUN: -debug-only=hexagon-xqf-gen 2>&1 < %s | FileCheck %s --check-prefix STRICT-IEEE
+; RUN: llc --mtriple=hexagon-- -O2 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -enable-xqf-gen=true -hexagon-qfloat-mode=ieee \
+; RUN: -debug-only=hexagon-xqf-gen 2>&1 < %s | FileCheck %s --check-prefix COMPLIANT-IEEE
+
+; STRICT-IEEE: Generating code for STRICT-IEEE mode
+; STRICT-IEEE-NEXT: deleting redundant instruction %{{[0-9]+}}:hvxvr = V6_vadd_qf32_mix killed %{{[0-9]+}}:hvxvr, killed %{{[0-9]+}}:hvxvr
+; STRICT-IEEE-NEXT: deleting redundant instruction %{{[0-9]+}}:hvxvr = V6_vadd_qf32_mix killed %{{[0-9]+}}:hvxvr, killed %{{[0-9]+}}:hvxvr
+
+; COMPLIANT-IEEE: Generating code for IEEE mode
+; COMPLIANT-IEEE-NOT: deleting redundant instruction
+
+define dso_local noundef i32 @main() #0 {
+entry:
+ tail call void asm sideeffect "l2fetch($0, $1)", "r,r"(ptr blockaddress(@main, %for.cond6.preheader.lr.ph.i), i32 8421392) #4
+ %vla69 = alloca [128 x half], align 128
+ %vla470 = alloca [32768 x half], align 128
+ %vla771 = alloca [256 x half], align 128
+ br label %for.body
+
+for.body: ; preds = %for.body, %entry
+ %arrayidx.phi = phi ptr [ %vla69, %entry ], [ %arrayidx.inc, %for.body ]
+ %i.077 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+ %call = tail call i32 @rand() #4
+ %rem = srem i32 %call, 10
+ %conv = sitofp i32 %rem to double
+ %mul17 = fmul double %conv, 5.000000e-01
+ %0 = fptrunc double %mul17 to half
+ store half %0, ptr %arrayidx.phi, align 2
+ %inc = add nuw nsw i32 %i.077, 1
+ %exitcond.not = icmp eq i32 %inc, 128
+ %arrayidx.inc = getelementptr half, ptr %arrayidx.phi, i32 1
+ br i1 %exitcond.not, label %for.body24, label %for.body
+
+for.body24: ; preds = %for.body24, %for.body
+ %arrayidx29.phi = phi ptr [ %arrayidx29.inc, %for.body24 ], [ %vla470, %for.body ]
+ %i18.078 = phi i32 [ %inc31, %for.body24 ], [ 0, %for.body ]
+ %call25 = tail call i32 @rand() #4
+ %rem26 = srem i32 %call25, 20
+ %conv27 = sitofp i32 %rem26 to double
+ %mul28 = fmul double %conv27, 2.500000e-01
+ %1 = fptrunc double %mul28 to half
+ store half %1, ptr %arrayidx29.phi, align 2
+ %inc31 = add nuw nsw i32 %i18.078, 1
+ %exitcond79.not = icmp eq i32 %inc31, 32768
+ %arrayidx29.inc = getelementptr half, ptr %arrayidx29.phi, i32 1
+ br i1 %exitcond79.not, label %for.cond6.preheader.lr.ph.i, label %for.body24
+
+for.cond6.preheader.lr.ph.i: ; preds = %for.body24
+ tail call void asm sideeffect "labelsym_startofkernel_${:uid}: .global labelsym_startofkernel_${:uid}", ""() #4
+ %2 = tail call <64 x i32> @llvm.hexagon.V6.vdd0.128B()
+ %3 = tail call <128 x i1> @llvm.hexagon.V6.pred.scalar2v2.128B(i32 128)
+ %4 = tail call <128 x i1> @llvm.hexagon.V6.pred.scalar2.128B(i32 0)
+ br label %for.body9.i
+
+for.cond.cleanup8.i: ; preds = %for.cond.cleanup12.i
+ call void asm sideeffect "labelsym_endofkernel_${:uid}: .global labelsym_endofkernel_${:uid}", ""() #4
+ ret i32 1
+
+for.body9.i: ; preds = %for.cond.cleanup12.i, %for.cond6.preheader.lr.ph.i
+ %arrayidx15.phi.i = phi ptr [ %vla771, %for.cond6.preheader.lr.ph.i ], [ %add.ptr.i.i, %for.cond.cleanup12.i ]
+ %storemerge3140.i = phi i32 [ 0, %for.cond6.preheader.lr.ph.i ], [ %add18.i, %for.cond.cleanup12.i ]
+ br label %for.body13.i
+
+for.cond.cleanup12.i: ; preds = %for.body13.i
+ %5 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %15)
+ %6 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %15)
+ %7 = call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> %5, <32 x i32> %6)
+ %8 = ptrtoint ptr %arrayidx15.phi.i to i32
+ %9 = call <32 x i32> @llvm.hexagon.V6.vror.128B(<32 x i32> %7, i32 128)
+ %10 = call <128 x i1> @llvm.hexagon.V6.pred.scalar2.128B(i32 %8)
+ %11 = call <128 x i1> @llvm.hexagon.V6.pred.and.n.128B(<128 x i1> %3, <128 x i1> %10)
+ call void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1> %11, ptr %arrayidx15.phi.i, <32 x i32> %9)
+ %add.ptr.i.i = getelementptr i8, ptr %arrayidx15.phi.i, i32 128
+ call void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1> %4, ptr nonnull %add.ptr.i.i, <32 x i32> %9)
+ %add18.i = add nuw nsw i32 %storemerge3140.i, 64
+ %cmp7.i = icmp ult i32 %storemerge3140.i, 192
+ br i1 %cmp7.i, label %for.body9.i, label %for.cond.cleanup8.i
+
+for.body13.i: ; preds = %for.body13.i, %for.body9.i
+ %12 = phi <64 x i32> [ %15, %for.body13.i ], [ %2, %for.body9.i ]
+ %arrayidx4.i.phi.i = phi ptr [ %arrayidx4.i.inc.i, %for.body13.i ], [ %vla69, %for.body9.i ]
+ %p.038.i = phi i32 [ %inc.i, %for.body13.i ], [ 0, %for.body9.i ]
+ %mul.i.i = shl nsw i32 %p.038.i, 8
+ %add.i.i = add nuw nsw i32 %mul.i.i, %storemerge3140.i
+ %arrayidx.i.i = getelementptr inbounds i16, ptr %vla470, i32 %add.i.i
+ %13 = load <32 x i32>, ptr %arrayidx.i.i, align 128
+ %14 = load <128 x i8>, ptr %arrayidx4.i.phi.i, align 1
+ %shuffle.i.i = shufflevector <128 x i8> %14, <128 x i8> poison, <128 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
+ %.cast.i = bitcast <128 x i8> %shuffle.i.i to <32 x i32>
+ %15 = call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32> %12, <32 x i32> %.cast.i, <32 x i32> %13)
+ %inc.i = add nuw nsw i32 %p.038.i, 1
+ %exitcond.not.i = icmp eq i32 %inc.i, 128
+ %arrayidx4.i.inc.i = getelementptr i16, ptr %arrayidx4.i.phi.i, i32 1
+ br i1 %exitcond.not.i, label %for.cond.cleanup12.i, label %for.body13.i
+}
+
+; Function Attrs: nounwind
+declare dso_local i32 @rand() local_unnamed_addr #1
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <64 x i32> @llvm.hexagon.V6.vdd0.128B() #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32>, <32 x i32>, <32 x i32>) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vror.128B(<32 x i32>, i32) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <128 x i1> @llvm.hexagon.V6.pred.scalar2.128B(i32) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <128 x i1> @llvm.hexagon.V6.pred.scalar2v2.128B(i32) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <128 x i1> @llvm.hexagon.V6.pred.and.n.128B(<128 x i1>, <128 x i1>) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(write)
+declare void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1>, ptr, <32 x i32>) #3
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #2
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #2
+
+attributes #0 = { norecurse "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,-long-calls" }
+attributes #1 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,-long-calls" }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #3 = { nocallback nofree nosync nounwind willreturn memory(write) }
+attributes #4 = { nounwind }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-fix-invalid-opcode.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-fix-invalid-opcode.ll
new file mode 100644
index 0000000000000..69ee2b1ea2de3
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-fix-invalid-opcode.ll
@@ -0,0 +1,72 @@
+; Test that the correct vadd(qf32, sf) is generated instead of
+; vadd(sf,sf).
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s
+
+; CHECK-LABEL: main:
+; CHECK: [[VREG:v[0-9]+]].qf32 = vmpy({{.*}}.sf,{{.*}}.sf)
+; CHECK: v{{.*}}.qf32 = vadd([[VREG]].qf32,v{{.*}}.sf)
+; CHECK: v{{.*}}.qf32 = vadd([[VREG]].qf32,v{{.*}}.sf)
+
+ at .str.1 = private unnamed_addr constant [9 x i8] c"0x%08lx \00", align 1
+ at .str.3 = private unnamed_addr constant [99 x i8] c"/prj/qct/llvm/devops/test/users/sgundapa/del/test.c:54 0 && \22ERROR: Failed to acquire HVX unit.\\n\22\00", align 1
+ at __func__.main = private unnamed_addr constant [5 x i8] c"main\00", align 1
+ at .str.5 = private unnamed_addr constant [31 x i8] c"sf mpy of 0.5 and -0.25 = %f\0A\00", align 1
+ at str = private unnamed_addr constant [35 x i8] c"ERROR: Failed to acquire HVX unit.\00", align 1
+ at str.6 = private unnamed_addr constant [24 x i8] c"\0Amultiply instructions\0A\00", align 1
+
+; Function Attrs: nofree nounwind
+declare dso_local noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #0
+
+; Function Attrs: nounwind
+define dso_local i32 @main(i32 noundef %argc, ptr nocapture noundef readnone %argv) local_unnamed_addr #1 {
+entry:
+ %call = tail call i32 @acquire_vector_unit(i8 noundef zeroext 0) #6
+ %tobool.not = icmp eq i32 %call, 0
+ br i1 %tobool.not, label %if.then, label %if.end
+
+if.then: ; preds = %entry
+ %puts = tail call i32 @puts(ptr nonnull dereferenceable(1) @str)
+ tail call void @_Assert(ptr noundef nonnull @.str.3, ptr noundef nonnull @__func__.main) #7
+ unreachable
+
+if.end: ; preds = %entry
+ tail call void @set_double_vector_mode() #6
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1056964608)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -1098907648)
+ %puts7 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.6)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %bc.i = bitcast <32 x i32> %2 to <32 x float>
+ %3 = extractelement <32 x float> %bc.i, i64 0
+ %conv = fpext float %3 to double
+ %call6 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.5, double noundef %conv) #6
+ ret i32 0
+}
+
+declare dso_local i32 @acquire_vector_unit(i8 noundef zeroext) local_unnamed_addr #2
+
+; Function Attrs: noreturn nounwind
+declare dso_local void @_Assert(ptr noundef, ptr noundef) local_unnamed_addr #3
+
+declare dso_local void @set_double_vector_mode(...) local_unnamed_addr #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #4
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @putchar(i32 noundef) local_unnamed_addr #5
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #5
+
+attributes #0 = { nofree nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,-long-calls,-small-data" }
+attributes #1 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,-long-calls,-small-data" }
+attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,-long-calls,-small-data" }
+attributes #3 = { noreturn nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,-long-calls,-small-data" }
+attributes #4 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #5 = { nofree nounwind }
+attributes #6 = { nounwind }
+attributes #7 = { noreturn nounwind }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-fixup-qfp1.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-fixup-qfp1.ll
deleted file mode 100644
index 9625a605910c2..0000000000000
--- a/llvm/test/CodeGen/Hexagon/autohvx/xqf-fixup-qfp1.ll
+++ /dev/null
@@ -1,372 +0,0 @@
-; REQUIRES: hexagon-registered-target, silver
-; This tests correct handling of register spills and fills of
-; qf operands during register allocation.
-
-; RUN: llc -mcpu=hexagonv79 -mattr=+hvx-length128b,+hvxv79,+hvx-ieee-fp,+hvx-qfloat,-long-calls -debug-only=handle-qfp %s 2>&1 -o - | FileCheck %s --check-prefixes V79-81,V79
-; RUN: llc -mcpu=hexagonv81 -mattr=+hvx-length128b,+hvxv81,+hvx-ieee-fp,+hvx-qfloat,-long-calls -debug-only=handle-qfp %s 2>&1 -o - | FileCheck %s --check-prefixes V79-81,V81
-
-; V79-81: Finding uses of: renamable $w{{[0-9]+}} = V6_vmpy_qf32_hf
-; V79-81: Inserting after conv: [[VREG0:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG0]]
-; V79-81-NEXT: Inserting after conv: [[VREG1:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG1]]
-; V79-81: Finding uses of: renamable $w{{[0-9]+}} = V6_vmpy_qf32_hf
-; V79-81: Inserting after conv: [[VREG2:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG2]]
-; V79-81-NEXT: Inserting after conv: [[VREG3:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG3]]
-; V79-81: Finding uses of: renamable $w{{[0-9]+}} = V6_vmpy_qf32_hf
-; V79-81-DAG: Inserting after conv: [[VREG4:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG4]]
-; V79-81-DAG: Inserting after conv: [[VREG5:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG5]]
-; V79-81-DAG: Inserting new instruction: $v{{[0-9]+}} = V6_vadd_sf killed renamable [[VREG2]], killed renamable [[VREG0]]
-; V79-81-DAG: Inserting new instruction: $v{{[0-9]+}} = V6_vsub_sf killed renamable $v{{[0-9]+}}, killed renamable $v{{[0-9]+}}
-;
-; V79-81: Analyzing convert instruction: renamable [[VREG6:\$v[0-9]+]] = V6_vconv_hf_qf32 killed renamable $w{{[0-9]+}}
-; V79: Inserting new instruction: [[VREG30:\$v[0-9]+]] = V6_vd0
-; V79-NEXT: Inserting new instruction: [[VREG7:\$v[0-9]+]] = V6_vadd_sf killed renamable [[VREG7]], killed [[VREG30]]
-; V79: Inserting new instruction: [[VREG30]] = V6_vd0
-; V79-NEXT: Inserting new instruction: [[VREG8:\$v[0-9]+]] = V6_vadd_sf killed renamable [[VREG8]], killed [[VREG30]]
-; V81: Inserting new instruction: [[VREG7:\$v[0-9]+]] = V6_vconv_qf32_sf killed renamable [[VREG7]]
-; V81: Inserting new instruction: [[VREG8:\$v[0-9]+]] = V6_vconv_qf32_sf killed renamable [[VREG8]]
-
-; V79-81: Analyzing convert instruction: renamable [[VREG9:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable $v{{[0-9]+}}
-; V79: Inserting new instruction: [[VREG30]] = V6_vd0
-; V79-NEXT: Inserting new instruction: [[VREG10:\$v[0-9]+]] = V6_vadd_sf killed renamable [[VREG10]], killed [[VREG30]]
-; V81: Inserting new instruction: [[VREG8:\$v[0-9]+]] = V6_vconv_qf32_sf killed renamable [[VREG8]]
-
-target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
-target triple = "hexagon"
-
- at .str.1 = private unnamed_addr constant [9 x i8] c"0x%08lx \00", align 1
- at .str.3 = private unnamed_addr constant [173 x i8] c"/prj/qct/llvm/devops/aether/hexbuild/test_trees/MASTER/test/regress/features/hexagon/arch_v68/hvx_ieee_fp/hvx_ieee_fp_test.c:126 0 && \22ERROR: Failed to acquire HVX unit.\\n\22\00", align 1
- at __func__.main = private unnamed_addr constant [5 x i8] c"main\00", align 1
- at .str.5 = private unnamed_addr constant [33 x i8] c"half -3 converted to vhf = %.2f\0A\00", align 1
- at .str.6 = private unnamed_addr constant [35 x i8] c"uhalf 32k converted to vhf = %.2f\0A\00", align 1
- at .str.7 = private unnamed_addr constant [32 x i8] c"sf 0.5 converted to vhf = %.2f\0A\00", align 1
- at .str.8 = private unnamed_addr constant [32 x i8] c"vhf 4.0 conveted to ubyte = %d\0A\00", align 1
- at .str.9 = private unnamed_addr constant [32 x i8] c"vhf 2.0 conveted to uhalf = %d\0A\00", align 1
- at .str.10 = private unnamed_addr constant [30 x i8] c"byte 4 conveted to hf = %.2f\0A\00", align 1
- at .str.11 = private unnamed_addr constant [31 x i8] c"ubyte 4 conveted to hf = %.2f\0A\00", align 1
- at .str.12 = private unnamed_addr constant [27 x i8] c"hf -3 conveted to sf = %f\0A\00", align 1
- at .str.13 = private unnamed_addr constant [31 x i8] c"vhf 4.0 conveted to byte = %d\0A\00", align 1
- at .str.14 = private unnamed_addr constant [31 x i8] c"vhf 4.0 conveted to half = %d\0A\00", align 1
- at .str.16 = private unnamed_addr constant [33 x i8] c"max of hf 2.0 and hf 4.0 = %.2f\0A\00", align 1
- at .str.17 = private unnamed_addr constant [33 x i8] c"min of hf 2.0 and hf 4.0 = %.2f\0A\00", align 1
- at .str.18 = private unnamed_addr constant [32 x i8] c"max of sf 0.5 and sf 0.25 = %f\0A\00", align 1
- at .str.19 = private unnamed_addr constant [32 x i8] c"min of sf 0.5 and sf 0.25 = %f\0A\00", align 1
- at .str.21 = private unnamed_addr constant [25 x i8] c"negate of hf 4.0 = %.2f\0A\00", align 1
- at .str.22 = private unnamed_addr constant [23 x i8] c"abs of hf -6.0 = %.2f\0A\00", align 1
- at .str.23 = private unnamed_addr constant [23 x i8] c"negate of sf 0.5 = %f\0A\00", align 1
- at .str.24 = private unnamed_addr constant [22 x i8] c"abs of sf -0.25 = %f\0A\00", align 1
- at .str.26 = private unnamed_addr constant [32 x i8] c"hf add of 4.0 and -6.0 = %.2f\0A\00", align 1
- at .str.27 = private unnamed_addr constant [32 x i8] c"hf sub of 4.0 and -6.0 = %.2f\0A\00", align 1
- at .str.28 = private unnamed_addr constant [31 x i8] c"sf add of 0.5 and -0.25 = %f\0A\00", align 1
- at .str.29 = private unnamed_addr constant [31 x i8] c"sf sub of 0.5 and -0.25 = %f\0A\00", align 1
- at .str.30 = private unnamed_addr constant [36 x i8] c"sf add of hf 4.0 and hf -6.0 = %f\0A\00", align 1
- at .str.31 = private unnamed_addr constant [36 x i8] c"sf sub of hf 4.0 and hf -6.0 = %f\0A\00", align 1
- at .str.33 = private unnamed_addr constant [32 x i8] c"hf mpy of 4.0 and -6.0 = %.2f\0A\00", align 1
- at .str.34 = private unnamed_addr constant [35 x i8] c"hf accmpy of 4.0 and -6.0 = %.2f\0A\00", align 1
- at .str.35 = private unnamed_addr constant [36 x i8] c"sf mpy of hf 4.0 and hf -6.0 = %f\0A\00", align 1
- at .str.36 = private unnamed_addr constant [39 x i8] c"sf accmpy of hf 4.0 and hf -6.0 = %f\0A\00", align 1
- at .str.37 = private unnamed_addr constant [31 x i8] c"sf mpy of 0.5 and -0.25 = %f\0A\00", align 1
- at .str.39 = private unnamed_addr constant [25 x i8] c"w copy from sf 0.5 = %f\0A\00", align 1
- at str = private unnamed_addr constant [35 x i8] c"ERROR: Failed to acquire HVX unit.\00", align 1
- at str.40 = private unnamed_addr constant [25 x i8] c"\0AConversion intructions\0A\00", align 1
- at str.41 = private unnamed_addr constant [23 x i8] c"\0AMin/Max instructions\0A\00", align 1
- at str.42 = private unnamed_addr constant [23 x i8] c"\0Aabs/neg instructions\0A\00", align 1
- at str.43 = private unnamed_addr constant [23 x i8] c"\0Aadd/sub instructions\0A\00", align 1
- at str.44 = private unnamed_addr constant [24 x i8] c"\0Amultiply instructions\0A\00", align 1
- at str.45 = private unnamed_addr constant [19 x i8] c"\0Acopy instruction\0A\00", align 1
-
-declare dso_local void @print_vector_words(<32 x i32> noundef %x) local_unnamed_addr #0
-
-; Function Attrs: nofree nounwind optsize
-declare dso_local noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #0
-
-; Function Attrs: nounwind optsize
-define dso_local i32 @main(i32 noundef %argc, ptr nocapture noundef readnone %argv) local_unnamed_addr #1 {
-entry:
- %call = tail call i32 @acquire_vector_unit(i8 noundef zeroext 0) #6
- %tobool.not = icmp eq i32 %call, 0
- br i1 %tobool.not, label %if.then, label %if.end
-
-if.then: ; preds = %entry
- %puts = tail call i32 @puts(ptr nonnull dereferenceable(1) @str)
- tail call void @_Assert(ptr noundef nonnull @.str.3, ptr noundef nonnull @__func__.main) #7
- unreachable
-
-if.end: ; preds = %entry
- tail call void @set_double_vector_mode() #6
- %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 16384)
- %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 17408)
- %2 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 -14848)
- %3 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1056964608)
- %4 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1048576000)
- %5 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -1098907648)
- %6 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 -3)
- %7 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 32768)
- %puts147 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.40)
- %8 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.h.128B(<32 x i32> %6)
- %bc.i = bitcast <32 x i32> %8 to <64 x half>
- %9 = extractelement <64 x half> %bc.i, i64 0
- %conv = fpext half %9 to double
- %call12 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.5, double noundef %conv) #6
- %10 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.uh.128B(<32 x i32> %7)
- %bc.i153 = bitcast <32 x i32> %10 to <64 x half>
- %11 = extractelement <64 x half> %bc.i153, i64 0
- %conv14 = fpext half %11 to double
- %call15 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.6, double noundef %conv14) #6
- %12 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> %3, <32 x i32> %3)
- %bc.i155 = bitcast <32 x i32> %12 to <64 x half>
- %13 = extractelement <64 x half> %bc.i155, i64 0
- %conv17 = fpext half %13 to double
- %call18 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.7, double noundef %conv17) #6
- %14 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.ub.hf.128B(<32 x i32> %1, <32 x i32> %1)
- %15 = bitcast <32 x i32> %14 to <128 x i8>
- %conv.i = extractelement <128 x i8> %15, i64 0
- %conv20 = zext i8 %conv.i to i32
- %call21 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.8, i32 noundef %conv20) #6
- %16 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.uh.hf.128B(<32 x i32> %0)
- %17 = bitcast <32 x i32> %16 to <64 x i16>
- %conv.i157 = extractelement <64 x i16> %17, i64 0
- %conv23 = sext i16 %conv.i157 to i32
- %call24 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.9, i32 noundef %conv23) #6
- %18 = tail call <64 x i32> @llvm.hexagon.V6.vcvt.hf.b.128B(<32 x i32> %14)
- %bc.i158 = bitcast <64 x i32> %18 to <128 x half>
- %19 = extractelement <128 x half> %bc.i158, i64 0
- %conv26 = fpext half %19 to double
- %call27 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.10, double noundef %conv26) #6
- %20 = tail call <64 x i32> @llvm.hexagon.V6.vcvt.hf.ub.128B(<32 x i32> %14)
- %bc.i159 = bitcast <64 x i32> %20 to <128 x half>
- %21 = extractelement <128 x half> %bc.i159, i64 0
- %conv29 = fpext half %21 to double
- %call30 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.11, double noundef %conv29) #6
- %22 = tail call <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32> %8)
- %bc.i161 = bitcast <64 x i32> %22 to <64 x float>
- %23 = extractelement <64 x float> %bc.i161, i64 0
- %conv32 = fpext float %23 to double
- %call33 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.12, double noundef %conv32) #6
- %24 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.b.hf.128B(<32 x i32> %1, <32 x i32> %1)
- %25 = bitcast <32 x i32> %24 to <128 x i8>
- %conv.i162 = extractelement <128 x i8> %25, i64 0
- %conv35 = zext i8 %conv.i162 to i32
- %call36 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.13, i32 noundef %conv35) #6
- %26 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.h.hf.128B(<32 x i32> %1)
- %27 = bitcast <32 x i32> %26 to <64 x i16>
- %conv.i163 = extractelement <64 x i16> %27, i64 0
- %conv38 = sext i16 %conv.i163 to i32
- %call39 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.14, i32 noundef %conv38) #6
- %28 = tail call <32 x i32> @llvm.hexagon.V6.vfmax.hf.128B(<32 x i32> %0, <32 x i32> %1)
- %puts148 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.41)
- %bc.i164 = bitcast <32 x i32> %28 to <64 x half>
- %29 = extractelement <64 x half> %bc.i164, i64 0
- %conv42 = fpext half %29 to double
- %call43 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.16, double noundef %conv42) #6
- %30 = tail call <32 x i32> @llvm.hexagon.V6.vfmin.hf.128B(<32 x i32> %0, <32 x i32> %1)
- %bc.i166 = bitcast <32 x i32> %30 to <64 x half>
- %31 = extractelement <64 x half> %bc.i166, i64 0
- %conv45 = fpext half %31 to double
- %call46 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.17, double noundef %conv45) #6
- %32 = tail call <32 x i32> @llvm.hexagon.V6.vfmax.sf.128B(<32 x i32> %3, <32 x i32> %4)
- %bc.i168 = bitcast <32 x i32> %32 to <32 x float>
- %33 = extractelement <32 x float> %bc.i168, i64 0
- %conv48 = fpext float %33 to double
- %call49 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.18, double noundef %conv48) #6
- %34 = tail call <32 x i32> @llvm.hexagon.V6.vfmin.sf.128B(<32 x i32> %3, <32 x i32> %4)
- %bc.i169 = bitcast <32 x i32> %34 to <32 x float>
- %35 = extractelement <32 x float> %bc.i169, i64 0
- %conv51 = fpext float %35 to double
- %call52 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.19, double noundef %conv51) #6
- %puts149 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.42)
- %36 = tail call <32 x i32> @llvm.hexagon.V6.vfneg.hf.128B(<32 x i32> %1)
- %bc.i170 = bitcast <32 x i32> %36 to <64 x half>
- %37 = extractelement <64 x half> %bc.i170, i64 0
- %conv55 = fpext half %37 to double
- %call56 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.21, double noundef %conv55) #6
- %38 = tail call <32 x i32> @llvm.hexagon.V6.vabs.hf.128B(<32 x i32> %2)
- %bc.i172 = bitcast <32 x i32> %38 to <64 x half>
- %39 = extractelement <64 x half> %bc.i172, i64 0
- %conv58 = fpext half %39 to double
- %call59 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.22, double noundef %conv58) #6
- %40 = tail call <32 x i32> @llvm.hexagon.V6.vfneg.sf.128B(<32 x i32> %3)
- %bc.i174 = bitcast <32 x i32> %40 to <32 x float>
- %41 = extractelement <32 x float> %bc.i174, i64 0
- %conv61 = fpext float %41 to double
- %call62 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.23, double noundef %conv61) #6
- %42 = tail call <32 x i32> @llvm.hexagon.V6.vabs.sf.128B(<32 x i32> %5)
- %bc.i175 = bitcast <32 x i32> %42 to <32 x float>
- %43 = extractelement <32 x float> %bc.i175, i64 0
- %conv64 = fpext float %43 to double
- %call65 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.24, double noundef %conv64) #6
- %puts150 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.43)
- %44 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.hf.128B(<32 x i32> %1, <32 x i32> %2)
- %bc.i176 = bitcast <32 x i32> %44 to <64 x half>
- %45 = extractelement <64 x half> %bc.i176, i64 0
- %conv68 = fpext half %45 to double
- %call69 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.26, double noundef %conv68) #6
- %46 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.hf.128B(<32 x i32> %1, <32 x i32> %2)
- %bc.i178 = bitcast <32 x i32> %46 to <64 x half>
- %47 = extractelement <64 x half> %bc.i178, i64 0
- %conv71 = fpext half %47 to double
- %call72 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.27, double noundef %conv71) #6
- %48 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> %3, <32 x i32> %5)
- %bc.i180 = bitcast <32 x i32> %48 to <32 x float>
- %49 = extractelement <32 x float> %bc.i180, i64 0
- %conv74 = fpext float %49 to double
- %call75 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.28, double noundef %conv74) #6
- %50 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32> %3, <32 x i32> %5)
- %bc.i181 = bitcast <32 x i32> %50 to <32 x float>
- %51 = extractelement <32 x float> %bc.i181, i64 0
- %conv77 = fpext float %51 to double
- %call78 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.29, double noundef %conv77) #6
- %52 = tail call <64 x i32> @llvm.hexagon.V6.vadd.sf.hf.128B(<32 x i32> %1, <32 x i32> %2)
- %bc.i182 = bitcast <64 x i32> %52 to <64 x float>
- %53 = extractelement <64 x float> %bc.i182, i64 0
- %conv80 = fpext float %53 to double
- %call81 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.30, double noundef %conv80) #6
- %54 = tail call <64 x i32> @llvm.hexagon.V6.vsub.sf.hf.128B(<32 x i32> %1, <32 x i32> %2)
- %bc.i183 = bitcast <64 x i32> %54 to <64 x float>
- %55 = extractelement <64 x float> %bc.i183, i64 0
- %conv83 = fpext float %55 to double
- %call84 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.31, double noundef %conv83) #6
- %puts151 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.44)
- %56 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.hf.hf.128B(<32 x i32> %1, <32 x i32> %2)
- %bc.i184 = bitcast <32 x i32> %56 to <64 x half>
- %57 = extractelement <64 x half> %bc.i184, i64 0
- %conv87 = fpext half %57 to double
- %call88 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.33, double noundef %conv87) #6
- %58 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.hf.hf.acc.128B(<32 x i32> %56, <32 x i32> %1, <32 x i32> %2)
- %bc.i186 = bitcast <32 x i32> %58 to <64 x half>
- %59 = extractelement <64 x half> %bc.i186, i64 0
- %conv90 = fpext half %59 to double
- %call91 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.34, double noundef %conv90) #6
- %60 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.128B(<32 x i32> %1, <32 x i32> %2)
- %bc.i188 = bitcast <64 x i32> %60 to <64 x float>
- %61 = extractelement <64 x float> %bc.i188, i64 0
- %conv93 = fpext float %61 to double
- %call94 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.35, double noundef %conv93) #6
- %62 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32> %60, <32 x i32> %1, <32 x i32> %2)
- %bc.i189 = bitcast <64 x i32> %62 to <64 x float>
- %63 = extractelement <64 x float> %bc.i189, i64 0
- %conv96 = fpext float %63 to double
- %call97 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.36, double noundef %conv96) #6
- %64 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %3, <32 x i32> %5)
- %bc.i190 = bitcast <32 x i32> %64 to <32 x float>
- %65 = extractelement <32 x float> %bc.i190, i64 0
- %conv99 = fpext float %65 to double
- %call100 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.37, double noundef %conv99) #6
- %puts152 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.45)
- %66 = tail call <32 x i32> @llvm.hexagon.V6.vassign.fp.128B(<32 x i32> %3)
- %bc.i191 = bitcast <32 x i32> %66 to <32 x float>
- %67 = extractelement <32 x float> %bc.i191, i64 0
- %conv103 = fpext float %67 to double
- %call104 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.39, double noundef %conv103) #6
- ret i32 0
-}
-
-; Function Attrs: optsize
-declare dso_local i32 @acquire_vector_unit(i8 noundef zeroext) local_unnamed_addr #2
-
-; Function Attrs: noreturn nounwind optsize
-declare dso_local void @_Assert(ptr noundef, ptr noundef) local_unnamed_addr #3
-
-; Function Attrs: optsize
-declare dso_local void @set_double_vector_mode(...) local_unnamed_addr #2
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.h.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.uh.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.ub.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.uh.hf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vcvt.hf.b.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vcvt.hf.ub.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.b.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vcvt.h.hf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vfmax.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vfmin.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vfmax.sf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vfmin.sf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vfneg.hf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vabs.hf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vfneg.sf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vabs.sf.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vadd.hf.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vsub.hf.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vadd.sf.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vsub.sf.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vmpy.hf.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vmpy.hf.hf.acc.128B(<32 x i32>, <32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32>, <32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32>, <32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.vassign.fp.128B(<32 x i32>) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #4
-
-; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
-declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #4
-
-; Function Attrs: nofree nounwind
-declare noundef i32 @putchar(i32 noundef) local_unnamed_addr #5
-
-; Function Attrs: nofree nounwind
-declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #5
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-handle-conv.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-handle-conv.ll
new file mode 100644
index 0000000000000..21fbad7498081
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-handle-conv.ll
@@ -0,0 +1,180 @@
+; Tests whether convert instruction hf=qf is handled correctly.
+; The live range of qf register goes beyond the convert and is used
+; by a qf instruction.
+
+; REQUIRES: asserts
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 \
+; RUN: -debug-only=handle-qfp < %s 2>&1 -o /dev/null | FileCheck %s --check-prefix=V79
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 \
+; RUN: -debug-only=handle-qfp < %s 2>&1 -o /dev/null | FileCheck %s --check-prefix=V81
+
+; V79: Analyzing convert instruction: renamable [[VREG1:\$v[0-9]+]] = V6_vconv_hf_qf16 renamable [[VREG2:\$v[0-9]+]]
+; V79: Inserting new instruction: [[VREG3:\$v[0-9]+]] = V6_vd0
+; V79: Inserting new instruction: [[VREG2]] = V6_vadd_hf killed renamable [[VREG2]], killed [[VREG3]]
+; V79: Inserting after conv: [[VREG2]] = V6_vconv_hf_qf16 killed renamable [[VREG2]]
+
+; V81: Analyzing convert instruction: renamable [[VREG1:\$v[0-9]+]] = V6_vconv_hf_qf16 renamable [[VREG2:\$v[0-9]+]]
+; V81: Inserting new instruction: [[VREG2]] = V6_vconv_qf16_hf killed renamable [[VREG2]]
+; V81: Inserting after conv: [[VREG2]] = V6_vconv_hf_qf16 killed renamable [[VREG2]]
+
+ at .str.1 = private unnamed_addr constant [9 x i8] c"0x%08lx \00", align 1
+ at .str.3 = private unnamed_addr constant [62 x i8] c"qfloat_test.c:135 0 && \22ERROR: Failed to acquire HVX unit.\\n\22\00", align 1
+ at __func__.main = private unnamed_addr constant [5 x i8] c"main\00", align 1
+ at .str.4 = private unnamed_addr constant [44 x i8] c"The sum of hf %.3f and hf %.3f is %.3f\0A\00", align 1
+ at .str.5 = private unnamed_addr constant [44 x i8] c"The sum of qf16 %.3f and qf16 %.3f is %.3f\0A\00", align 1
+ at .str.6 = private unnamed_addr constant [44 x i8] c"The sum of qf16 %.3f and hf %.3f is %.3f\0A\00", align 1
+ at .str.7 = private unnamed_addr constant [45 x i8] c"The sum of hf %.3f and hf -%.3f is %.3f\0A\00", align 1
+ at .str.8 = private unnamed_addr constant [45 x i8] c"The sum of qf16 %.3f and qf16 -%.3f is %.3f\0A\00", align 1
+ at .str.9 = private unnamed_addr constant [45 x i8] c"The sum of qf16 %.3f and hf -%.3f is %.3f\0A\00", align 1
+ at str = private unnamed_addr constant [35 x i8] c"ERROR: Failed to acquire HVX unit.\00", align 1
+
+; Function Attrs: nofree nounwind
+define dso_local void @print_vector_words(<32 x i32> noundef %x) local_unnamed_addr #0 {
+entry:
+ br label %for.body
+
+for.cond.cleanup: ; preds = %if.end
+ %putchar = tail call i32 @putchar(i32 10)
+ ret void
+
+for.body: ; preds = %entry, %if.end
+ %i.07 = phi i32 [ 0, %entry ], [ %inc, %if.end ]
+ %rem = and i32 %i.07, 7
+ %tobool.not = icmp eq i32 %rem, 0
+ br i1 %tobool.not, label %if.then, label %if.end
+
+if.then: ; preds = %for.body
+ %putchar6 = tail call i32 @putchar(i32 10)
+ br label %if.end
+
+if.end: ; preds = %if.then, %for.body
+ %vecext = extractelement <32 x i32> %x, i32 %i.07
+ %call1 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.1, i32 noundef %vecext) #6
+ %inc = add nuw nsw i32 %i.07, 1
+ %exitcond.not = icmp eq i32 %inc, 32
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+; Function Attrs: nofree nounwind
+declare dso_local noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #0
+
+; Function Attrs: nounwind
+define dso_local i32 @main(i32 noundef %argc, ptr nocapture noundef readnone %argv) local_unnamed_addr #1 {
+entry:
+ %call = tail call i32 @acquire_vector_unit(i8 noundef zeroext 0) #6
+ %tobool.not = icmp eq i32 %call, 0
+ br i1 %tobool.not, label %if.then, label %if.end
+
+if.then: ; preds = %entry
+ %puts = tail call i32 @puts(ptr nonnull dereferenceable(1) @str)
+ tail call void @_Assert(ptr noundef nonnull @.str.3, ptr noundef nonnull @__func__.main) #7
+ unreachable
+
+if.end: ; preds = %entry
+ tail call void @set_double_vector_mode() #6
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 14336)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 13312)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 0)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %2, <32 x i32> %0)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %2, <32 x i32> %1)
+ %5 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %bc.i = bitcast <32 x i32> %0 to <64 x half>
+ %6 = extractelement <64 x half> %bc.i, i64 0
+ %conv = fpext half %6 to double
+ %bc.i71 = bitcast <32 x i32> %1 to <64 x half>
+ %7 = extractelement <64 x half> %bc.i71, i64 0
+ %conv12 = fpext half %7 to double
+ %8 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %5)
+ %bc.i.i = bitcast <32 x i32> %8 to <64 x half>
+ %9 = extractelement <64 x half> %bc.i.i, i64 0
+ %conv14 = fpext half %9 to double
+ %call15 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.4, double noundef %conv, double noundef %conv12, double noundef %conv14) #6
+ %10 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.128B(<32 x i32> %3, <32 x i32> %4)
+ %11 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %3)
+ %bc.i.i73 = bitcast <32 x i32> %11 to <64 x half>
+ %12 = extractelement <64 x half> %bc.i.i73, i64 0
+ %conv17 = fpext half %12 to double
+ %13 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %4)
+ %bc.i.i75 = bitcast <32 x i32> %13 to <64 x half>
+ %14 = extractelement <64 x half> %bc.i.i75, i64 0
+ %conv19 = fpext half %14 to double
+ %15 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %10)
+ %bc.i.i77 = bitcast <32 x i32> %15 to <64 x half>
+ %16 = extractelement <64 x half> %bc.i.i77, i64 0
+ %conv21 = fpext half %16 to double
+ %call22 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.5, double noundef %conv17, double noundef %conv19, double noundef %conv21) #6
+ %17 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %3, <32 x i32> %1)
+ %18 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %17)
+ %bc.i.i83 = bitcast <32 x i32> %18 to <64 x half>
+ %19 = extractelement <64 x half> %bc.i.i83, i64 0
+ %conv28 = fpext half %19 to double
+ %call29 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.6, double noundef %conv17, double noundef %conv12, double noundef %conv28) #6
+ %20 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %21 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %20)
+ %bc.i.i89 = bitcast <32 x i32> %21 to <64 x half>
+ %22 = extractelement <64 x half> %bc.i.i89, i64 0
+ %conv35 = fpext half %22 to double
+ %call36 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.7, double noundef %conv, double noundef %conv12, double noundef %conv35) #6
+ %23 = tail call <32 x i32> @llvm.hexagon.V6.vsub.qf16.128B(<32 x i32> %3, <32 x i32> %4)
+ %24 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %23)
+ %bc.i.i95 = bitcast <32 x i32> %24 to <64 x half>
+ %25 = extractelement <64 x half> %bc.i.i95, i64 0
+ %conv42 = fpext half %25 to double
+ %call43 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.8, double noundef %conv17, double noundef %conv19, double noundef %conv42) #6
+ %26 = tail call <32 x i32> @llvm.hexagon.V6.vsub.qf16.mix.128B(<32 x i32> %3, <32 x i32> %1)
+ %27 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %26)
+ %bc.i.i101 = bitcast <32 x i32> %27 to <64 x half>
+ %28 = extractelement <64 x half> %bc.i.i101, i64 0
+ %conv49 = fpext half %28 to double
+ %call50 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.9, double noundef %conv17, double noundef %conv12, double noundef %conv49) #6
+ ret i32 0
+}
+
+declare dso_local i32 @acquire_vector_unit(i8 noundef zeroext) local_unnamed_addr #2
+
+; Function Attrs: noreturn nounwind
+declare dso_local void @_Assert(ptr noundef, ptr noundef) local_unnamed_addr #3
+
+declare dso_local void @set_double_vector_mode(...) local_unnamed_addr #2
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vsub.qf16.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vsub.qf16.mix.128B(<32 x i32>, <32 x i32>) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #4
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32>) #4
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @putchar(i32 noundef) local_unnamed_addr #5
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #5
+
+attributes #0 = { nofree nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-qfloat,-long-calls,-packets" }
+attributes #1 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-qfloat,-long-calls,-packets" }
+attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-qfloat,-long-calls,-packets" }
+attributes #3 = { noreturn nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-ieee-fp,+hvx-qfloat,-long-calls,-packets" }
+attributes #4 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #5 = { nofree nounwind }
+attributes #6 = { nounwind }
+attributes #7 = { noreturn nounwind }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-input-rt.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-input-rt.ll
new file mode 100644
index 0000000000000..75779f129ed16
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-input-rt.ll
@@ -0,0 +1,63 @@
+; Tests qf operations with Rt types along with hf/sf and qf16 operands
+; for strict-ieee mode
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s --check-prefix=STRICT-IEEE
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -hexagon-qfloat-mode=ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s --check-prefix=COMPLIANT-IEEE
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -hexagon-qfloat-mode=lossy -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s --check-prefix=LOSSY-SUBNORMAL
+
+
+; Tests qf16 = vmpy(hf, Rt32.hf)
+define <32 x i32> @mul_hf_rt(<32 x i32> %a0, i32 %a1) {
+; STRICT-IEEE-LABEL: mul_hf_rt
+; STRICT-IEEE-DAG: [[V1:v[0-9]+]] = vsplat(r0)
+; STRICT-IEEE-DAG: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; STRICT-IEEE: [[V10:v[0-31]+:[0-31]+]].qf32 = vmpy(v0.hf,[[V1]].hf)
+; STRICT-IEEE: [[V3:v[0-9]+]].hf = [[V10]].qf32
+; STRICT-IEEE: qf16 = vsub([[V3]].hf,[[V2]].hf)
+
+; COMPLIANT-IEEE-LABEL: mul_hf_rt
+; COMPLIANT-IEEE-DAG: [[V1:v[0-9]+]] = vsplat(r0)
+; COMPLIANT-IEEE-DAG: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; COMPLIANT-IEEE: [[V10:v[0-31]+:[0-31]+]].qf32 = vmpy(v0.hf,[[V1]].hf)
+; COMPLIANT-IEEE: [[V3:v[0-9]+]].hf = [[V10]].qf32
+; COMPLIANT-IEEE: qf16 = vsub([[V3]].hf,[[V2]].hf)
+
+; LOSSY-SUBNORMAL: qf16 = vmpy(v0.hf,r0.hf)
+
+label0:
+ %v0 = call <32 x i32> @llvm.hexagon.V6.vmpy.rt.hf.128B(<32 x i32> %a0, i32 %a1)
+ ret <32 x i32> %v0
+}
+
+
+; Tests qf32 = vmpy(sf, Rt32.hf)
+define <32 x i32> @mul_sf_rt(<32 x i32> %a0, i32 %a1) {
+; STRICT-IEEE-LABEL: mul_sf_rt
+; STRICT-IEEE-DAG: [[V2:v[0-9]+]] = vsplat(r0)
+; STRICT-IEEE-DAG: [[R2:r[0-9]+]] = ##2147483648
+; STRICT-IEEE-DAG: [[V1:v[0-9]+]] = vxor([[V1]],[[V1]])
+; STRICT-IEEE-DAG: [[V3:v[0-9]+]] = vsplat([[R2]])
+; STRICT-IEEE: [[V4:v[0-9]+]].qf32 = vmpy([[V1]].sf,[[V3]].sf)
+; STRICT-IEEE: [[V5:v[0-9]+]].qf32 = vadd([[V4]].qf32,v0.sf)
+; STRICT-IEEE: [[V6:v[0-9]+]].qf32 = vadd([[V4]].qf32,[[V2]].sf)
+; STRICT-IEEE: qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+
+; COMPLIANT-IEEE-LABEL: mul_sf_rt
+; COMPLIANT-IEEE-DAG: [[V2:v[0-9]+]] = vsplat(r0)
+; COMPLIANT-IEEE-DAG: [[R2:r[0-9]+]] = ##2147483648
+; COMPLIANT-IEEE-DAG: [[V1:v[0-9]+]] = vxor([[V1]],[[V1]])
+; COMPLIANT-IEEE-DAG: [[V3:v[0-9]+]] = vsplat([[R2]])
+; COMPLIANT-IEEE: [[V4:v[0-9]+]].qf32 = vmpy([[V1]].sf,[[V3]].sf)
+; COMPLIANT-IEEE: [[V5:v[0-9]+]].qf32 = vadd([[V4]].qf32,v0.sf)
+; COMPLIANT-IEEE: [[V6:v[0-9]+]].qf32 = vadd([[V4]].qf32,[[V2]].sf)
+; COMPLIANT-IEEE: qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+
+; LOSSY-SUBNORMAL: qf32 = vmpy(v0.sf,r0.sf)
+
+label0:
+ %v0 = call <32 x i32> @llvm.hexagon.V6.vmpy.rt.sf.128B(<32 x i32> %a0, i32 %a1)
+ ret <32 x i32> %v0
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vmpy.rt.hf.128B(<32 x i32>, i32)
+declare <32 x i32> @llvm.hexagon.V6.vmpy.rt.sf.128B(<32 x i32>, i32)
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf16.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf16.ll
new file mode 100644
index 0000000000000..5e330c7166bdb
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf16.ll
@@ -0,0 +1,74 @@
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=lossy -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=lossy -mattr=+hvxv81,+hvx-length128B < %s | FileCheck %s
+
+; Test qf16 = vmpy(qf16 ,qf16) when both inputs are from vadd instruction
+define <64 x half> @mul_add_3(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vadd(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy([[V4]].qf16,[[V3]].qf16)
+; CHECK-DAG: [[V6:v[0-9]+]].hf = [[V10]].qf32
+; CHECK: qf16 = vsub([[V6]].hf,[[V5]].hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fadd <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %v0, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf32 = vmpy(qf16 ,qf16) when both inputs are from vadd and vmul instruction
+define <64 x half> @mul_add_mul(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vmpy(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy([[V4]].qf16,[[V3]].qf16)
+; CHECK-DAG: [[V6:v[0-9]+]].hf = [[V10]].qf32
+; CHECK: qf16 = vsub([[V6]].hf,[[V5]].hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fmul <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %v0, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(sf ,sf)
+define <64 x half> @mul_add_0(<64 x half> %a0, <64 x half> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK: qf16 = vmpy(v0.hf,v1.hf)
+label0:
+ %v3 = fmul <64 x half> %a0, %a1
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(qf16 ,qf16) when first input is from vadd instruction
+define <64 x half> @mul_add_1(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V10:v[0-9]+:[0-9]+]].qf32 = vmpy([[V3]].qf16,v2.hf)
+; CHECK-DAG: [[V5:v[0-9]+]].hf = [[V10]].qf32
+; CHECK: qf16 = vsub([[V5]].hf,[[V4]].hf)
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v3 = fmul <64 x half> %v0, %a2
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(qf16 ,qf16) when second input is from vadd instruction
+define <64 x half> @mul_add_2(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_2:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vmpy(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vmpy(v1.hf,v2.hf)
+; CHECK-DAG: qf16 = vmpy([[V3]].qf16,[[V4]].qf16)
+label0:
+ %v1 = fmul <64 x half> %a0, %a2
+ %v2 = fmul <64 x half> %a1, %a2
+ %v3 = fmul <64 x half> %v1, %v2
+ ret <64 x half> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,+hvx-qfloat,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf32.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf32.ll
new file mode 100644
index 0000000000000..1d9939c5ce312
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-lossy-mul-qf32.ll
@@ -0,0 +1,109 @@
+;n Tests lossy-subnormals mode for XQFloat multiplication 32-bit
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -enable-rem-conv=true -hexagon-qfloat-mode=lossy -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s
+
+; Test qf32 = vmpy(sf, sf)
+; Normalization of inputs
+define <32 x float> @mul_add_0(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK: qf32 = vmpy(v0.sf,v1.sf)
+label0:
+ %v3 = fmul <32 x float> %a0, %a1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(sf ,qf32) when only one input is from vadd instruction
+define <32 x float> @mul_add_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK: [[V4:v[0-9]+]].sf = [[V3]].qf32
+; CHECK: qf32 = vmpy(v1.sf,[[V4]].sf)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @mul_add_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V6:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V6]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V5]].qf32)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V3]].qf32)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when only first input is from vsub instruction
+define <32 x float> @mul_sub_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_1:
+; CHECK: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK: [[V4:v[0-9]+]].sf = [[V3]].qf32
+; CHECK: qf32 = vmpy(v1.sf,[[V4]].sf)
+label0:
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vsub instruction
+define <32 x float> @mul_sub_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V6:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vsub(v0.sf,v1.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V6]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V5]].qf32)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V3]].qf32)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v0 = fsub <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32, qf32) when one is from adder, another from multiplier
+define <32 x float> @mul_add_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy(v0.sf,v1.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V5:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V5]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V6]].qf32,[[V3]].qf32)
+; CHECK-DAG: qf32 = vmpy([[V8]].qf32,[[V7]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32, qf32) when both are from multiplier
+define <32 x float> @mul_mul_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+label0:
+; CHECK-LABEL: mul_mul_mul
+; CHECK: [[V3:v[0-9]+]].qf32 = vmpy(v0.sf,v1.sf)
+; CHECK: [[V4:v[0-9]+]].qf32 = vmpy(v0.sf,v2.sf)
+; CHECK: qf32 = vmpy([[V4]].qf32,[[V3]].qf32)
+ %v1 = fmul <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-mode-flags.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-mode-flags.ll
new file mode 100644
index 0000000000000..691eefd5ba8fe
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-mode-flags.ll
@@ -0,0 +1,76 @@
+; Test that qfloat mode flags invoke correct code generation.
+
+; REQUIRES: asserts
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -verify-machineinstrs \
+; RUN: -hexagon-qfloat-mode=strict-ieee \
+; RUN: -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer -enable-xqf-gen < %s\
+; RUN: 2>&1 | FileCheck %s --check-prefix=STRICT-IEEE
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -verify-machineinstrs \
+; RUN: -hexagon-qfloat-mode=strict-ieee \
+; RUN: -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer -enable-xqf-gen < %s\
+; RUN: 2>&1 | FileCheck %s --check-prefix=STRICT-IEEE
+; STRICT-IEEE: Generating code for STRICT-IEEE mode
+; STRICT-IEEE-NOT: Running QFPOptimzer Pass
+
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -verify-machineinstrs \
+; RUN: -hexagon-qfloat-mode=ieee \
+; RUN: -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer -enable-xqf-gen < %s\
+; RUN: 2>&1 | FileCheck %s --check-prefix=IEEE
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -verify-machineinstrs \
+; RUN: -hexagon-qfloat-mode=ieee \
+; RUN: -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer -enable-xqf-gen < %s\
+; RUN: 2>&1 | FileCheck %s --check-prefix=IEEE
+; IEEE: Generating code for IEEE mode
+; IEEE-NOT: Running QFPOptimzer Pass
+
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -hexagon-qfloat-mode=lossy \
+; RUN: -verify-machineinstrs -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer\
+; RUN: -enable-xqf-gen < %s 2>&1 | FileCheck %s --check-prefix=LOSSY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -hexagon-qfloat-mode=lossy \
+; RUN: -verify-machineinstrs -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer\
+; RUN: -enable-xqf-gen < %s 2>&1 | FileCheck %s --check-prefix=LOSSY
+; LOSSY: Generating code for LOSSY mode
+; LOSSY-NOT: Running QFPOptimzer Pass
+
+
+; The default mode is LEGACY.
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -verify-machineinstrs \
+; RUN: -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer -enable-xqf-gen < %s\
+; RUN: 2>&1 | FileCheck %s --check-prefix=LEGACY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -verify-machineinstrs \
+; RUN: -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer -enable-xqf-gen < %s\
+; RUN: 2>&1 | FileCheck %s --check-prefix=LEGACY
+
+
+; Test that QFloat mode pass is not invoked. Instead we should run
+; the QFPOptimizer pass.
+; LEGACY-NOT: Generating code for LEGACY mode
+; LEGACY: Running QFPOptimzer Pass
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -march=hexagon \
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv79 -hexagon-qfloat-mode=legacy\
+; RUN: -verify-machineinstrs -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer\
+; RUN: -enable-xqf-gen < %s 2>&1 | FileCheck %s --check-prefix=LEGACY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv75 -march=hexagon\
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv75 -hexagon-qfloat-mode=legacy\
+; RUN: -verify-machineinstrs -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer\
+; RUN: -enable-xqf-gen < %s 2>&1 | FileCheck %s --check-prefix=LEGACY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -march=hexagon\
+; RUN: -mattr=+hvx-ieee-fp,+hvx-length128b,+hvxv81 -hexagon-qfloat-mode=legacy\
+; RUN: -verify-machineinstrs -debug-only=hexagon-xqf-gen,hexagon-qfp-optimizer\
+; RUN: -enable-xqf-gen < %s 2>&1 | FileCheck %s --check-prefix=LEGACY
+
+define <64 x half> @add_qf16(<64 x half> %a0, <64 x half> %a1) #0 {
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ ret <64 x half> %v0
+}
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-multi-conv.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-multi-conv.ll
new file mode 100644
index 0000000000000..2f1a517aee1da
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-multi-conv.ll
@@ -0,0 +1,133 @@
+; Checks for presence of any mismatch, ie. an use of qf operand
+; as a sf/hf type or vice versa
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv79,+hvx-length128B \
+; RUN: -enable-postra-xqf-check 2>&1 < %s -o /dev/null | FileCheck %s
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv81,+hvx-length128B \
+; RUN: -enable-postra-xqf-check 2>&1 < %s -o /dev/null | FileCheck %s
+
+; CHECK: Checking for ABI compliance for XQF post register allocation
+; CHECK-NOT: Mismatch:
+
+define i32 @qhmath_hvx_sin_af(ptr noalias noundef %input, ptr noalias noundef %output) #0 {
+entry:
+ %0 = tail call i32 @llvm.hexagon.A2.min(i32 0, i32 64)
+ %cmp10100 = icmp sgt i32 %0, 0
+ br i1 %cmp10100, label %for.body12.lr.ph, label %for.cond.loopexit
+
+for.cond.loopexit: ; preds = %for.body12, %entry
+ ret i32 0
+
+for.body12.lr.ph: ; preds = %entry
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1065353216)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -2147483648)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1067645315)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %1, <32 x i32> zeroinitializer)
+ %5 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -1090519040)
+ %6 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %5, <32 x i32> zeroinitializer)
+ %7 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1026206373)
+ %8 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %7, <32 x i32> zeroinitializer)
+ %9 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -1162475884)
+ %10 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %9, <32 x i32> zeroinitializer)
+ %11 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %12 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 2147483647)
+ br label %for.body12
+
+for.body12: ; preds = %for.body12, %for.body12.lr.ph
+ %j.0104 = phi i32 [ 0, %for.body12.lr.ph ], [ %inc, %for.body12 ]
+ %optr.1102 = phi ptr [ %output, %for.body12.lr.ph ], [ %incdec.ptr14, %for.body12 ]
+ %sline1p.1101 = phi <32 x i32> [ zeroinitializer, %for.body12.lr.ph ], [ %13, %for.body12 ]
+ %13 = load <32 x i32>, ptr null, align 128
+ %14 = tail call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> %13, <32 x i32> %sline1p.1101, i32 0)
+ %15 = tail call <128 x i1> @llvm.hexagon.V6.vgtsf.128B(<32 x i32> %14, <32 x i32> zeroinitializer)
+ %16 = tail call <32 x i32> @llvm.hexagon.V6.vxor.128B(<32 x i32> %14, <32 x i32> %2)
+ %17 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %15, <32 x i32> %14, <32 x i32> %16)
+ %18 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32> %17, <32 x i32> %3)
+ %19 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %18)
+ %20 = tail call <128 x i1> @llvm.hexagon.V6.vgtuw.128B(<32 x i32> %19, <32 x i32> %12)
+ %21 = lshr <32 x i32> %19, <i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23>
+ %and.i.i = and <32 x i32> %21, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
+ %sub.i.i = add nsw <32 x i32> %and.i.i, <i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127, i32 -127>
+ %22 = tail call <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32> %sub.i.i, <32 x i32> zeroinitializer)
+ %sub1.i.i = sub <32 x i32> <i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23, i32 23>, %22
+ %23 = tail call <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32> %sub1.i.i, <32 x i32> zeroinitializer)
+ %shl.i.i = shl nuw <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %22
+ %24 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %shl.i.i)
+ %25 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %20, <32 x i32> zeroinitializer, <32 x i32> %24)
+ %shl5.neg.i.i = shl <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, %23
+ %and7.i.i = and <32 x i32> %shl5.neg.i.i, %19
+ %26 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %and7.i.i)
+ %27 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %26)
+ %28 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %27)
+ %29 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %28)
+ %30 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32> %17, <32 x i32> %29)
+ %31 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %25, <32 x i32> zeroinitializer)
+ %32 = tail call <128 x i1> @llvm.hexagon.V6.veqw.128B(<32 x i32> %31, <32 x i32> zeroinitializer)
+ %33 = tail call <128 x i1> @llvm.hexagon.V6.pred.xor.128B(<128 x i1> %15, <128 x i1> %32)
+ %34 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %30)
+ %35 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %34, <32 x i32> zeroinitializer)
+ %36 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32> %35, <32 x i32> %35)
+ %37 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %11, <32 x i32> %36)
+ %38 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %10, <32 x i32> %37)
+ %39 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %38, <32 x i32> %36)
+ %40 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %8, <32 x i32> %39)
+ %41 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %40, <32 x i32> %36)
+ %42 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %6, <32 x i32> %41)
+ %43 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %42, <32 x i32> %36)
+ %44 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %4, <32 x i32> %43)
+ %45 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %44)
+ %46 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %45, <32 x i32> zeroinitializer)
+ %47 = tail call <32 x i32> @llvm.hexagon.V6.vxor.128B(<32 x i32> %46, <32 x i32> %2)
+ %48 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %33, <32 x i32> %46, <32 x i32> %47)
+ %incdec.ptr14 = getelementptr inbounds <32 x i32>, ptr %optr.1102, i32 1
+ store <32 x i32> %48, ptr %optr.1102, align 4
+ %inc = add nuw nsw i32 %j.0104, 1
+ %exitcond.not = icmp eq i32 %inc, %0
+ br i1 %exitcond.not, label %for.cond.loopexit, label %for.body12
+}
+
+declare i32 @llvm.hexagon.A2.min(i32, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32>, <32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vd0.128B() #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32) #1
+declare <128 x i1> @llvm.hexagon.V6.vgtsf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vxor.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1>, <32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.veqw.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.xor.128B(<128 x i1>, <128 x i1>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.or.128B(<128 x i1>, <128 x i1>) #1
+declare <128 x i1> @llvm.hexagon.V6.vgtuw.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.vgtw.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32>, <32 x i32>, i32) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.scalar2.128B(i32) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.scalar2v2.128B(i32) #1
+declare void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1>, ptr, <32 x i32>) #2
+declare <128 x i1> @llvm.hexagon.V6.veqb.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.or.n.128B(<128 x i1>, <128 x i1>) #1
+declare void @llvm.hexagon.V6.vS32b.nqpred.ai.128B(<128 x i1>, ptr, <32 x i32>) #2
+
+uselistorder ptr @llvm.hexagon.V6.lvsplatw.128B, { 6, 5, 4, 3, 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vadd.sf.128B, { 4, 3, 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vxor.128B, { 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vmux.128B, { 7, 6, 5, 4, 3, 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vmpy.qf32.sf.128B, { 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vconv.sf.qf32.128B, { 3, 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vmpy.qf32.128B, { 3, 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vadd.qf32.128B, { 3, 2, 1, 0 }
+uselistorder ptr @llvm.hexagon.V6.vmaxw.128B, { 1, 0 }
+
+attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-long-calls,-small-data" }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(write) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-multidef.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-multidef.ll
new file mode 100644
index 0000000000000..ac5a6213f559b
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-multidef.ll
@@ -0,0 +1,49 @@
+; This is a unique case where an xqf use has two definitions. One def comes from
+; a sf type, but another comes from a qf generating instruction.
+
+; REQUIRES: asserts
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv81,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s
+
+; CHECK: Instruction: renamable [[V14:\$v[0-9]+]] = V6_vmpy_qf32 killed renamable [[V4:\$v[0-9]+]], killed renamable [[V5:\$v[0-9]+]]
+; CHECK-NEXT: Property: 0 ,1
+; CHECK: Processing: renamable [[V14]] = V6_vmpy_qf32 killed renamable [[V4]], killed renamable [[V5]]
+; CHECK: Inserting new instruction before: [[V4]] = V6_vconv_sf_qf32 killed renamable [[V4]]
+; CHECK: Inserting new instruction: [[V14]] = V6_vmpy_qf32_sf killed renamable [[V4]], killed renamable [[V5]]
+
+target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+target triple = "hexagon"
+
+define i32 @qhmath_hvx_sin_af(ptr %input, i32 %0) {
+entry:
+ br label %for.body12
+
+for.cond.loopexit: ; preds = %for.body12
+ ret i32 0
+
+for.body12: ; preds = %for.body12, %entry
+ %j.0104 = phi i32 [ 0, %entry ], [ %inc, %for.body12 ]
+ %1 = load <32 x i32>, ptr null, align 128
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> zeroinitializer, <32 x i32> %1)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %3, <32 x i32> %2)
+ store <32 x i32> %4, ptr %input, align 4
+ %inc = add i32 %j.0104, 1
+ %exitcond.not = icmp eq i32 %j.0104, %0
+ br i1 %exitcond.not, label %for.cond.loopexit, label %for.body12
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32>, <32 x i32>) #0
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32>, <32 x i32>) #0
+
+; uselistorder directives
+uselistorder ptr @llvm.hexagon.V6.vmpy.qf32.128B, { 1, 0 }
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-normalization-assert.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-normalization-assert.ll
new file mode 100755
index 0000000000000..00586341d120b
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-normalization-assert.ll
@@ -0,0 +1,459 @@
+; Tests if the test crashes due to generation of normalization instructions
+; which do not have equivalent defs in the dominator basicblock.
+
+; RUN: llc -mtriple=hexagon-unknown-elf -O2 -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true -hexagon-qfloat-mode=lossy < %s -o /dev/null
+
+ at c0_coeffs_asin_vhf = internal unnamed_addr constant [32 x float] [float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0xC0AE6428A0000000, float 0xC01AEBA200000000, float 0xBFDAF02840000000, float 0xBFA6CBD5A0000000, float 0xBF75153560000000, float 0xBF3EA129C0000000, float 0xBEF513C980000000, float 0xBE418295E0000000, float 0x3E1AE29C20000000, float 0x3EF4410BA0000000, float 0x3F3D8DCE60000000, float 0x3F749B72E0000000, float 0x3FA666F3E0000000, float 0x3FDA201EC0000000, float 0x40194BFD40000000, float 0x40ABE59AC0000000], align 128
+ at c1_coeffs_asin_vhf = internal unnamed_addr constant [32 x float] [float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0xC0D0766D60000000, float 0xC04129FCE0000000, float 0xBFFB611840000000, float 0x3FE4688DA0000000, float 0x3FEE395540000000, float 0x3FEFC3F260000000, float 0x3FEFFB6980000000, float 0x3FEFFFFB60000000, float 0x3FEFFFFCE0000000, float 0x3FEFFB8EE0000000, float 0x3FEFC5B520000000, float 0x3FEE421260000000, float 0x3FE4960460000000, float 0xBFFA2F48C0000000, float 0xC040284FC0000000, float 0xC0CE476740000000], align 128
+ at c2_coeffs_asin_vhf = internal unnamed_addr constant [32 x float] [float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0xC0DABF49E0000000, float 0xC0517AB3E0000000, float 0xC01A8AD340000000, float 0xBFF2146300000000, float 0xBFCDA870C0000000, float 0xBFA6FE7CC0000000, float 0xBF78FBAA20000000, float 0xBF20DAF2E0000000, float 0x3F1BC13500000000, float 0x3F785EEC80000000, float 0x3FA674DBC0000000, float 0x3FCD305400000000, float 0x3FF1D70400000000, float 0x4019E26E20000000, float 0x40508B4500000000, float 0x40D8A49620000000], align 128
+ at c3_coeffs_asin_vhf = internal unnamed_addr constant [32 x float] [float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0xC0D34EAC80000000, float 0xC04EE8CDC0000000, float 0xC01CD2A460000000, float 0xBFF761B920000000, float 0xBFD2806420000000, float 0x3FA1A14760000000, float 0x3FC1343380000000, float 0x3FC4FB5360000000, float 0x3FC503B6E0000000, float 0x3FC14630A0000000, float 0x3FA2CA8A40000000, float 0xBFD224E9A0000000, float 0xBFF71832E0000000, float 0xC01C2DF8C0000000, float 0xC04D5D9600000000, float 0xC0D1D22280000000], align 128
+ at c4_coeffs_asin_vhf = internal unnamed_addr constant [32 x float] [float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0xC0B4E6B460000000, float 0xC034E86040000000, float 0xC009BCE840000000, float 0xBFEEE8AF00000000, float 0xBFD8FF8A00000000, float 0xBFC6F58460000000, float 0xBFB4E58DC0000000, float 0xBF98DF4700000000, float 0x3F97E3BFA0000000, float 0x3FB4B4B500000000, float 0x3FC6B9A2A0000000, float 0x3FD8CB6B20000000, float 0x3FEEA6B940000000, float 0x4009444400000000, float 0x4033F3EE00000000, float 0x40B353BBC0000000], align 128
+
+; Function Attrs: nounwind
+define i32 @qhmath_hvx_asin_ahf(ptr noalias noundef %input, ptr noalias noundef %output, i32 noundef %size) local_unnamed_addr #0 {
+entry:
+ %and = and i32 %size, 63
+ %mul = shl nuw nsw i32 %and, 1
+ %cmp = icmp eq ptr %input, null
+ %cmp1 = icmp eq ptr %output, null
+ %or.cond = or i1 %cmp, %cmp1
+ %cmp3 = icmp eq i32 %size, 0
+ %or.cond46 = or i1 %or.cond, %cmp3
+ br i1 %or.cond46, label %cleanup, label %if.end
+
+if.end: ; preds = %entry
+ %incdec.ptr = getelementptr inbounds <32 x i32>, ptr %input, i32 1
+ %0 = load <32 x i32>, ptr %input, align 128
+ %cmp4102 = icmp ugt i32 %size, 127
+ br i1 %cmp4102, label %for.body.lr.ph, label %for.cond.cleanup
+
+for.body.lr.ph: ; preds = %if.end
+ %div82 = lshr i32 %size, 6
+ %sub = add nsw i32 %div82, -1
+ %1 = ptrtoint ptr %input to i32
+ %2 = load <32 x i32>, ptr @c0_coeffs_asin_vhf, align 128
+ %3 = load <32 x i32>, ptr @c1_coeffs_asin_vhf, align 128
+ %4 = load <32 x i32>, ptr @c2_coeffs_asin_vhf, align 128
+ %5 = load <32 x i32>, ptr @c3_coeffs_asin_vhf, align 128
+ %6 = load <32 x i32>, ptr @c4_coeffs_asin_vhf, align 128
+ br label %for.body
+
+for.cond.loopexit: ; preds = %for.body12, %if.end8
+ %output_v_ptr.1.lcssa = phi ptr [ %output_v_ptr.0103, %if.end8 ], [ %incdec.ptr14, %for.body12 ]
+ %input_v_ptr.1.lcssa = phi ptr [ %input_v_ptr.0104, %if.end8 ], [ %incdec.ptr13, %for.body12 ]
+ %slinep.1.lcssa = phi <32 x i32> [ %slinep.0105, %if.end8 ], [ %37, %for.body12 ]
+ %cmp4 = icmp sgt i32 %i.0106, 64
+ br i1 %cmp4, label %for.body, label %for.cond.cleanup
+
+for.cond.cleanup: ; preds = %for.cond.loopexit, %if.end
+ %output_v_ptr.0.lcssa = phi ptr [ %output, %if.end ], [ %output_v_ptr.1.lcssa, %for.cond.loopexit ]
+ %input_v_ptr.0.lcssa = phi ptr [ %incdec.ptr, %if.end ], [ %input_v_ptr.1.lcssa, %for.cond.loopexit ]
+ %slinep.0.lcssa = phi <32 x i32> [ %0, %if.end ], [ %slinep.1.lcssa, %for.cond.loopexit ]
+ %cmp18.not = icmp ult i32 %size, 64
+ br i1 %cmp18.not, label %if.end25, label %if.then19
+
+for.body: ; preds = %for.cond.loopexit, %for.body.lr.ph
+ %i.0106 = phi i32 [ %sub, %for.body.lr.ph ], [ %sub5, %for.cond.loopexit ]
+ %slinep.0105 = phi <32 x i32> [ %0, %for.body.lr.ph ], [ %slinep.1.lcssa, %for.cond.loopexit ]
+ %input_v_ptr.0104 = phi ptr [ %incdec.ptr, %for.body.lr.ph ], [ %input_v_ptr.1.lcssa, %for.cond.loopexit ]
+ %output_v_ptr.0103 = phi ptr [ %output, %for.body.lr.ph ], [ %output_v_ptr.1.lcssa, %for.cond.loopexit ]
+ %7 = tail call i32 @llvm.hexagon.A2.min(i32 %i.0106, i32 64)
+ %sub5 = add nsw i32 %i.0106, -64
+ %8 = tail call i32 @llvm.hexagon.A2.min(i32 %sub5, i32 64)
+ %cmp6 = icmp sgt i32 %8, 0
+ br i1 %cmp6, label %if.then7, label %if.end8
+
+if.then7: ; preds = %for.body
+ %add.ptr = getelementptr inbounds <32 x i32>, ptr %input_v_ptr.0104, i32 64
+ %conv.mask.i = and i32 %8, 65535
+ %_HEXAGON_V64_internal_union.sroa.0.0.insert.ext.i = zext i32 %conv.mask.i to i64
+ %_HEXAGON_V64_internal_union.sroa.0.0.insert.insert.i = or i64 %_HEXAGON_V64_internal_union.sroa.0.0.insert.ext.i, 549764202496
+ tail call void asm sideeffect " l2fetch($0,$1) ", "r,r"(ptr nonnull %add.ptr, i64 %_HEXAGON_V64_internal_union.sroa.0.0.insert.insert.i) #3
+ br label %if.end8
+
+if.end8: ; preds = %if.then7, %for.body
+ %cmp1095 = icmp sgt i32 %7, 0
+ br i1 %cmp1095, label %for.body12.lr.ph, label %for.cond.loopexit
+
+for.body12.lr.ph: ; preds = %if.end8
+ %9 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 18430)
+ %10 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15360)
+ %11 = tail call <32 x i32> @llvm.hexagon.V6.vd0.128B()
+ %12 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15)
+ %13 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 4112)
+ %14 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 19456)
+ %15 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 48128)
+ %16 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %9, <32 x i32> %11)
+ %17 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %2, <32 x i32> %11)
+ %18 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %3, <32 x i32> %11)
+ %19 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %4, <32 x i32> %11)
+ %20 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %5, <32 x i32> %11)
+ %21 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %6, <32 x i32> %11)
+ %22 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %17)
+ %23 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %18)
+ %24 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %19)
+ %25 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %20)
+ %26 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %21)
+ %27 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %22)
+ %28 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %22)
+ %29 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %23)
+ %30 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %23)
+ %31 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %24)
+ %32 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %24)
+ %33 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %25)
+ %34 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %25)
+ %35 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %26)
+ %36 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %26)
+ br label %for.body12
+
+for.body12: ; preds = %for.body12, %for.body12.lr.ph
+ %j.099 = phi i32 [ 0, %for.body12.lr.ph ], [ %inc, %for.body12 ]
+ %slinep.198 = phi <32 x i32> [ %slinep.0105, %for.body12.lr.ph ], [ %37, %for.body12 ]
+ %input_v_ptr.197 = phi ptr [ %input_v_ptr.0104, %for.body12.lr.ph ], [ %incdec.ptr13, %for.body12 ]
+ %output_v_ptr.196 = phi ptr [ %output_v_ptr.0103, %for.body12.lr.ph ], [ %incdec.ptr14, %for.body12 ]
+ %incdec.ptr13 = getelementptr inbounds <32 x i32>, ptr %input_v_ptr.197, i32 1
+ %37 = load <32 x i32>, ptr %input_v_ptr.197, align 128
+ %38 = tail call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> %37, <32 x i32> %slinep.198, i32 %1)
+ %39 = tail call <32 x i32> @llvm.hexagon.V6.vdealh.128B(<32 x i32> %38)
+ %40 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %39, <32 x i32> %15)
+ %41 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf16.128B(<32 x i32> %40, <32 x i32> %16)
+ %42 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %41, <32 x i32> %14)
+ %43 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %42)
+ %44 = tail call <32 x i32> @llvm.hexagon.V6.vlsrh.128B(<32 x i32> %43, i32 6)
+ %45 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %44, <32 x i32> %12)
+ %46 = tail call <32 x i32> @llvm.hexagon.V6.vshuffb.128B(<32 x i32> %45)
+ %47 = tail call <32 x i32> @llvm.hexagon.V6.vor.128B(<32 x i32> %46, <32 x i32> %13)
+ %48 = tail call <32 x i32> @llvm.hexagon.V6.vaslw.128B(<32 x i32> %47, i32 16)
+ %49 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %47, <32 x i32> %27, i32 1)
+ %50 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %49, <32 x i32> %48, <32 x i32> %28, i32 1)
+ %51 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %47, <32 x i32> %29, i32 1)
+ %52 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %51, <32 x i32> %48, <32 x i32> %30, i32 1)
+ %53 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %47, <32 x i32> %31, i32 1)
+ %54 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %53, <32 x i32> %48, <32 x i32> %32, i32 1)
+ %55 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %47, <32 x i32> %33, i32 1)
+ %56 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %55, <32 x i32> %48, <32 x i32> %34, i32 1)
+ %57 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %47, <32 x i32> %35, i32 1)
+ %58 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %57, <32 x i32> %48, <32 x i32> %36, i32 1)
+ %59 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32> %38, <32 x i32> %10)
+ %60 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %58)
+ %61 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %59)
+ %62 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %60, <32 x i32> %61)
+ %63 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %56)
+ %64 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %62, <32 x i32> %63)
+ %65 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %64, <32 x i32> %61)
+ %66 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %54)
+ %67 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %65, <32 x i32> %66)
+ %68 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %67, <32 x i32> %61)
+ %69 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %52)
+ %70 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %68, <32 x i32> %69)
+ %71 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %70, <32 x i32> %61)
+ %72 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %50)
+ %73 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %71, <32 x i32> %72)
+ %output_dv.sroa.0.0.vecblend84.i.i = shufflevector <32 x i32> %73, <32 x i32> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+ %74 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %58)
+ %75 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %59)
+ %76 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %74, <32 x i32> %75)
+ %77 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %56)
+ %78 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %76, <32 x i32> %77)
+ %79 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %78, <32 x i32> %75)
+ %80 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %54)
+ %81 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %79, <32 x i32> %80)
+ %82 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %81, <32 x i32> %75)
+ %83 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %52)
+ %84 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %82, <32 x i32> %83)
+ %85 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %84, <32 x i32> %75)
+ %86 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %50)
+ %87 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %85, <32 x i32> %86)
+ %output_dv.sroa.0.128.vec.expand117.i.i = shufflevector <32 x i32> %87, <32 x i32> poison, <64 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
+ %output_dv.sroa.0.128.vecblend118.i.i = shufflevector <64 x i32> %output_dv.sroa.0.0.vecblend84.i.i, <64 x i32> %output_dv.sroa.0.128.vec.expand117.i.i, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 96, i32 97, i32 98, i32 99, i32 100, i32 101, i32 102, i32 103, i32 104, i32 105, i32 106, i32 107, i32 108, i32 109, i32 110, i32 111, i32 112, i32 113, i32 114, i32 115, i32 116, i32 117, i32 118, i32 119, i32 120, i32 121, i32 122, i32 123, i32 124, i32 125, i32 126, i32 127>
+ %88 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf32.128B(<64 x i32> %output_dv.sroa.0.128.vecblend118.i.i)
+ %incdec.ptr14 = getelementptr inbounds <32 x i32>, ptr %output_v_ptr.196, i32 1
+ store <32 x i32> %88, ptr %output_v_ptr.196, align 4
+ %inc = add nuw nsw i32 %j.099, 1
+ %exitcond.not = icmp eq i32 %inc, %7
+ br i1 %exitcond.not, label %for.cond.loopexit, label %for.body12
+
+if.then19: ; preds = %for.cond.cleanup
+ %89 = ptrtoint ptr %input_v_ptr.0.lcssa to i32
+ %and.i = and i32 %89, 127
+ %90 = or i32 %and.i, %and
+ %or.cond47 = icmp eq i32 %90, 0
+ br i1 %or.cond47, label %cond.end, label %cond.false
+
+cond.false: ; preds = %if.then19
+ %incdec.ptr22 = getelementptr inbounds <32 x i32>, ptr %input_v_ptr.0.lcssa, i32 1
+ %91 = load <32 x i32>, ptr %input_v_ptr.0.lcssa, align 128
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %if.then19
+ %input_v_ptr.2 = phi ptr [ %incdec.ptr22, %cond.false ], [ %input_v_ptr.0.lcssa, %if.then19 ]
+ %cond = phi <32 x i32> [ %91, %cond.false ], [ %slinep.0.lcssa, %if.then19 ]
+ %92 = ptrtoint ptr %input to i32
+ %93 = tail call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> %cond, <32 x i32> %slinep.0.lcssa, i32 %92)
+ %94 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 18430)
+ %95 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15360)
+ %96 = tail call <32 x i32> @llvm.hexagon.V6.vd0.128B()
+ %97 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15)
+ %98 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 4112)
+ %99 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 19456)
+ %100 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 48128)
+ %101 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %94, <32 x i32> %96)
+ %102 = load <32 x i32>, ptr @c0_coeffs_asin_vhf, align 128
+ %103 = load <32 x i32>, ptr @c1_coeffs_asin_vhf, align 128
+ %104 = load <32 x i32>, ptr @c2_coeffs_asin_vhf, align 128
+ %105 = load <32 x i32>, ptr @c3_coeffs_asin_vhf, align 128
+ %106 = load <32 x i32>, ptr @c4_coeffs_asin_vhf, align 128
+ %107 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %102, <32 x i32> %96)
+ %108 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %103, <32 x i32> %96)
+ %109 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %104, <32 x i32> %96)
+ %110 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %105, <32 x i32> %96)
+ %111 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %106, <32 x i32> %96)
+ %112 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %107)
+ %113 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %108)
+ %114 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %109)
+ %115 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %110)
+ %116 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %111)
+ %117 = tail call <32 x i32> @llvm.hexagon.V6.vdealh.128B(<32 x i32> %93)
+ %118 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %117, <32 x i32> %100)
+ %119 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf16.128B(<32 x i32> %118, <32 x i32> %101)
+ %120 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %119, <32 x i32> %99)
+ %121 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %120)
+ %122 = tail call <32 x i32> @llvm.hexagon.V6.vlsrh.128B(<32 x i32> %121, i32 6)
+ %123 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %122, <32 x i32> %97)
+ %124 = tail call <32 x i32> @llvm.hexagon.V6.vshuffb.128B(<32 x i32> %123)
+ %125 = tail call <32 x i32> @llvm.hexagon.V6.vor.128B(<32 x i32> %124, <32 x i32> %98)
+ %126 = tail call <32 x i32> @llvm.hexagon.V6.vaslw.128B(<32 x i32> %125, i32 16)
+ %127 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %112)
+ %128 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %125, <32 x i32> %127, i32 1)
+ %129 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %112)
+ %130 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %128, <32 x i32> %126, <32 x i32> %129, i32 1)
+ %131 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %113)
+ %132 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %125, <32 x i32> %131, i32 1)
+ %133 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %113)
+ %134 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %132, <32 x i32> %126, <32 x i32> %133, i32 1)
+ %135 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %114)
+ %136 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %125, <32 x i32> %135, i32 1)
+ %137 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %114)
+ %138 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %136, <32 x i32> %126, <32 x i32> %137, i32 1)
+ %139 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %115)
+ %140 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %125, <32 x i32> %139, i32 1)
+ %141 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %115)
+ %142 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %140, <32 x i32> %126, <32 x i32> %141, i32 1)
+ %143 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %116)
+ %144 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %125, <32 x i32> %143, i32 1)
+ %145 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %116)
+ %146 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %144, <32 x i32> %126, <32 x i32> %145, i32 1)
+ %147 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32> %93, <32 x i32> %95)
+ %148 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %146)
+ %149 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %147)
+ %150 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %148, <32 x i32> %149)
+ %151 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %142)
+ %152 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %150, <32 x i32> %151)
+ %153 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %152, <32 x i32> %149)
+ %154 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %138)
+ %155 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %153, <32 x i32> %154)
+ %156 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %155, <32 x i32> %149)
+ %157 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %134)
+ %158 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %156, <32 x i32> %157)
+ %159 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %158, <32 x i32> %149)
+ %160 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %130)
+ %161 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %159, <32 x i32> %160)
+ %output_dv.sroa.0.0.vecblend84.i.i83 = shufflevector <32 x i32> %161, <32 x i32> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+ %162 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %146)
+ %163 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %147)
+ %164 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %162, <32 x i32> %163)
+ %165 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %142)
+ %166 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %164, <32 x i32> %165)
+ %167 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %166, <32 x i32> %163)
+ %168 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %138)
+ %169 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %167, <32 x i32> %168)
+ %170 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %169, <32 x i32> %163)
+ %171 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %134)
+ %172 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %170, <32 x i32> %171)
+ %173 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %172, <32 x i32> %163)
+ %174 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %130)
+ %175 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %173, <32 x i32> %174)
+ %output_dv.sroa.0.128.vec.expand117.i.i84 = shufflevector <32 x i32> %175, <32 x i32> poison, <64 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
+ %output_dv.sroa.0.128.vecblend118.i.i85 = shufflevector <64 x i32> %output_dv.sroa.0.0.vecblend84.i.i83, <64 x i32> %output_dv.sroa.0.128.vec.expand117.i.i84, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 96, i32 97, i32 98, i32 99, i32 100, i32 101, i32 102, i32 103, i32 104, i32 105, i32 106, i32 107, i32 108, i32 109, i32 110, i32 111, i32 112, i32 113, i32 114, i32 115, i32 116, i32 117, i32 118, i32 119, i32 120, i32 121, i32 122, i32 123, i32 124, i32 125, i32 126, i32 127>
+ %176 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf32.128B(<64 x i32> %output_dv.sroa.0.128.vecblend118.i.i85)
+ %incdec.ptr24 = getelementptr inbounds <32 x i32>, ptr %output_v_ptr.0.lcssa, i32 1
+ store <32 x i32> %176, ptr %output_v_ptr.0.lcssa, align 4
+ br label %if.end25
+
+if.end25: ; preds = %cond.end, %for.cond.cleanup
+ %output_v_ptr.2 = phi ptr [ %incdec.ptr24, %cond.end ], [ %output_v_ptr.0.lcssa, %for.cond.cleanup ]
+ %input_v_ptr.3 = phi ptr [ %input_v_ptr.2, %cond.end ], [ %input_v_ptr.0.lcssa, %for.cond.cleanup ]
+ %slinep.2 = phi <32 x i32> [ %cond, %cond.end ], [ %slinep.0.lcssa, %for.cond.cleanup ]
+ %cmp26.not = icmp eq i32 %and, 0
+ br i1 %cmp26.not, label %cleanup, label %if.then27
+
+if.then27: ; preds = %if.end25
+ %177 = ptrtoint ptr %input_v_ptr.3 to i32
+ %and.i86 = and i32 %177, 127
+ %add.i = add nuw nsw i32 %and.i86, %mul
+ %cmp.i87 = icmp ugt i32 %add.i, 128
+ br i1 %cmp.i87, label %cond.false31, label %cond.end33
+
+cond.false31: ; preds = %if.then27
+ %178 = load <32 x i32>, ptr %input_v_ptr.3, align 128
+ br label %cond.end33
+
+cond.end33: ; preds = %cond.false31, %if.then27
+ %cond34 = phi <32 x i32> [ %178, %cond.false31 ], [ %slinep.2, %if.then27 ]
+ %179 = ptrtoint ptr %input to i32
+ %180 = tail call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> %cond34, <32 x i32> %slinep.2, i32 %179)
+ %181 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 18430)
+ %182 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15360)
+ %183 = tail call <32 x i32> @llvm.hexagon.V6.vd0.128B()
+ %184 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15)
+ %185 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 4112)
+ %186 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 19456)
+ %187 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 48128)
+ %188 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %181, <32 x i32> %183)
+ %189 = load <32 x i32>, ptr @c0_coeffs_asin_vhf, align 128
+ %190 = load <32 x i32>, ptr @c1_coeffs_asin_vhf, align 128
+ %191 = load <32 x i32>, ptr @c2_coeffs_asin_vhf, align 128
+ %192 = load <32 x i32>, ptr @c3_coeffs_asin_vhf, align 128
+ %193 = load <32 x i32>, ptr @c4_coeffs_asin_vhf, align 128
+ %194 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %189, <32 x i32> %183)
+ %195 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %190, <32 x i32> %183)
+ %196 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %191, <32 x i32> %183)
+ %197 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %192, <32 x i32> %183)
+ %198 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %193, <32 x i32> %183)
+ %199 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %194)
+ %200 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %195)
+ %201 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %196)
+ %202 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %197)
+ %203 = tail call <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32> %198)
+ %204 = tail call <32 x i32> @llvm.hexagon.V6.vdealh.128B(<32 x i32> %180)
+ %205 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %204, <32 x i32> %187)
+ %206 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf16.128B(<32 x i32> %205, <32 x i32> %188)
+ %207 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %206, <32 x i32> %186)
+ %208 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %207)
+ %209 = tail call <32 x i32> @llvm.hexagon.V6.vlsrh.128B(<32 x i32> %208, i32 6)
+ %210 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %209, <32 x i32> %184)
+ %211 = tail call <32 x i32> @llvm.hexagon.V6.vshuffb.128B(<32 x i32> %210)
+ %212 = tail call <32 x i32> @llvm.hexagon.V6.vor.128B(<32 x i32> %211, <32 x i32> %185)
+ %213 = tail call <32 x i32> @llvm.hexagon.V6.vaslw.128B(<32 x i32> %212, i32 16)
+ %214 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %199)
+ %215 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %212, <32 x i32> %214, i32 1)
+ %216 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %199)
+ %217 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %215, <32 x i32> %213, <32 x i32> %216, i32 1)
+ %218 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %200)
+ %219 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %212, <32 x i32> %218, i32 1)
+ %220 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %200)
+ %221 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %219, <32 x i32> %213, <32 x i32> %220, i32 1)
+ %222 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %201)
+ %223 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %212, <32 x i32> %222, i32 1)
+ %224 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %201)
+ %225 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %223, <32 x i32> %213, <32 x i32> %224, i32 1)
+ %226 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %202)
+ %227 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %212, <32 x i32> %226, i32 1)
+ %228 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %202)
+ %229 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %227, <32 x i32> %213, <32 x i32> %228, i32 1)
+ %230 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %203)
+ %231 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32> %212, <32 x i32> %230, i32 1)
+ %232 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %203)
+ %233 = tail call <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32> %231, <32 x i32> %213, <32 x i32> %232, i32 1)
+ %234 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32> %180, <32 x i32> %182)
+ %235 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %233)
+ %236 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %234)
+ %237 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %235, <32 x i32> %236)
+ %238 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %229)
+ %239 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %237, <32 x i32> %238)
+ %240 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %239, <32 x i32> %236)
+ %241 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %225)
+ %242 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %240, <32 x i32> %241)
+ %243 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %242, <32 x i32> %236)
+ %244 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %221)
+ %245 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %243, <32 x i32> %244)
+ %246 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %245, <32 x i32> %236)
+ %247 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %217)
+ %248 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %246, <32 x i32> %247)
+ %output_dv.sroa.0.0.vecblend84.i.i89 = shufflevector <32 x i32> %248, <32 x i32> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+ %249 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %233)
+ %250 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %234)
+ %251 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %249, <32 x i32> %250)
+ %252 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %229)
+ %253 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %251, <32 x i32> %252)
+ %254 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %253, <32 x i32> %250)
+ %255 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %225)
+ %256 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %254, <32 x i32> %255)
+ %257 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %256, <32 x i32> %250)
+ %258 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %221)
+ %259 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %257, <32 x i32> %258)
+ %260 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %259, <32 x i32> %250)
+ %261 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %217)
+ %262 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32> %260, <32 x i32> %261)
+ %output_dv.sroa.0.128.vec.expand117.i.i90 = shufflevector <32 x i32> %262, <32 x i32> poison, <64 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
+ %output_dv.sroa.0.128.vecblend118.i.i91 = shufflevector <64 x i32> %output_dv.sroa.0.0.vecblend84.i.i89, <64 x i32> %output_dv.sroa.0.128.vec.expand117.i.i90, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 96, i32 97, i32 98, i32 99, i32 100, i32 101, i32 102, i32 103, i32 104, i32 105, i32 106, i32 107, i32 108, i32 109, i32 110, i32 111, i32 112, i32 113, i32 114, i32 115, i32 116, i32 117, i32 118, i32 119, i32 120, i32 121, i32 122, i32 123, i32 124, i32 125, i32 126, i32 127>
+ %263 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf32.128B(<64 x i32> %output_dv.sroa.0.128.vecblend118.i.i91)
+ %264 = ptrtoint ptr %output_v_ptr.2 to i32
+ %265 = tail call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> %263, <32 x i32> %263, i32 %264)
+ %and.i92 = and i32 %264, 127
+ %add.i93 = add nuw nsw i32 %and.i92, %mul
+ %266 = tail call <128 x i1> @llvm.hexagon.V6.pred.scalar2v2.128B(i32 %add.i93)
+ %267 = tail call <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1> %266, i32 -1)
+ %cmp.i94 = icmp ugt i32 %add.i93, 128
+ br i1 %cmp.i94, label %if.then.i, label %vstu_variable.exit
+
+if.then.i: ; preds = %cond.end33
+ %add.ptr.i = getelementptr inbounds <32 x i32>, ptr %output_v_ptr.2, i32 1
+ tail call void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1> %266, ptr nonnull %add.ptr.i, <32 x i32> %265)
+ %268 = tail call <128 x i1> @llvm.hexagon.V6.veqb.128B(<32 x i32> %265, <32 x i32> %265)
+ %269 = tail call <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1> %268, i32 -1)
+ br label %vstu_variable.exit
+
+vstu_variable.exit: ; preds = %if.then.i, %cond.end33
+ %qr.0.i = phi <32 x i32> [ %269, %if.then.i ], [ %267, %cond.end33 ]
+ %270 = tail call <128 x i1> @llvm.hexagon.V6.pred.scalar2.128B(i32 %264)
+ %271 = tail call <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32> %qr.0.i, i32 -1)
+ %272 = tail call <128 x i1> @llvm.hexagon.V6.pred.or.n.128B(<128 x i1> %270, <128 x i1> %271)
+ tail call void @llvm.hexagon.V6.vS32b.nqpred.ai.128B(<128 x i1> %272, ptr %output_v_ptr.2, <32 x i32> %265)
+ br label %cleanup
+
+cleanup: ; preds = %vstu_variable.exit, %if.end25, %entry
+ %retval.0 = phi i32 [ -1, %entry ], [ 0, %vstu_variable.exit ], [ 0, %if.end25 ]
+ ret i32 %retval.0
+}
+
+declare i32 @llvm.hexagon.A2.min(i32, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32>, <32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vd0.128B() #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vzh.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vdealh.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf16.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vlsrh.128B(<32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vshuffb.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vor.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vaslw.128B(<32 x i32>, i32) #1
+declare <64 x i32> @llvm.hexagon.V6.vlutvwh.128B(<32 x i32>, <32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vlutvwh.oracc.128B(<64 x i32>, <32 x i32>, <32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf32.128B(<64 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32>, <32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.scalar2.128B(i32) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.scalar2v2.128B(i32) #1
+declare void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1>, ptr, <32 x i32>) #2
+declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32) #1
+declare <128 x i1> @llvm.hexagon.V6.veqb.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.or.n.128B(<128 x i1>, <128 x i1>) #1
+declare void @llvm.hexagon.V6.vS32b.nqpred.ai.128B(<128 x i1>, ptr, <32 x i32>) #2
+
+attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvxv79,+v79,-long-calls,-small-data" }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind willreturn memory(write) }
+attributes #3 = { nounwind }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double.mir b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double.mir
new file mode 100644
index 0000000000000..a4e427f2407ed
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double.mir
@@ -0,0 +1,120 @@
+# This test checks that the converts are added based on the qf type of
+# the register defined by the qf producing reaching def of a vector
+# COPY instruction
+
+# REQUIRES: asserts
+# RUN: llc -march=hexagon -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128B \
+# RUN: -run-pass=handle-qfp-spills-refills -verify-machineinstrs \
+# RUN: -debug-only=handle-qfp %s -o /dev/null 2>&1 | FileCheck %s
+# RUN: llc -march=hexagon -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128B \
+# RUN: -run-pass=handle-qfp-spills-refills -verify-machineinstrs \
+# RUN: -debug-only=handle-qfp %s -o /dev/null 2>&1 | FileCheck %s
+
+#CHECK: Handling COPY: renamable $w{{[0-9]+}} = COPY killed renamable $w{{[0-9]+}}
+#CHECK: Inserting convert instruction: [[VREG:\$v[0-9]+]] = V6_vconv_sf_qf32 killed renamable [[VREG]]
+#CHECK: after instruction: renamable [[VREG]] = V6_vmpy_qf32_sf killed renamable $v{{[0-9]+}}, killed renamable $v{{[0-9]+}}
+
+--- |
+ target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+
+ ; Function Attrs: norecurse
+ define dso_local noundef i32 @foo(i32 %in0, i32 %in1, ptr noundef %out) #0 {
+ entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %in0)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %in1)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ store <32 x i32> %2, ptr %out, align 1
+ ret i32 0
+ }
+
+ ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+ declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+
+ ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+ declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32>, <32 x i32>) #1
+
+ attributes #0 = { norecurse "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-long-calls,+hvx-ieee-fp" }
+ attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
+
+...
+---
+name: foo
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHContTarget: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: false
+failsVerification: false
+tracksDebugUserValues: true
+registers: []
+liveins:
+ - { reg: '$r0', virtual-reg: '' }
+ - { reg: '$r1', virtual-reg: '' }
+ - { reg: '$r2', virtual-reg: '' }
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 1
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 0
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ localFrameSize: 0
+ savePoint: []
+ restorePoint: []
+fixedStack: []
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo: {}
+body: |
+ bb.0.entry:
+ successors: %bb.1
+ liveins: $r0, $r1, $r2
+
+ renamable $v0 = V6_lvsplatw killed renamable $r0
+ renamable $v1 = V6_lvsplatw renamable $r1
+ renamable $v3 = V6_lvsplatw killed renamable $r1
+ $r0 = A2_tfrsi 0
+
+ bb.1:
+ successors: %bb.2
+ liveins: $r0, $r2, $v0, $v1, $v3
+
+ renamable $v2 = V6_vmpy_qf32_sf killed renamable $v0, killed renamable $v1
+
+ bb.2:
+ successors: %bb.3
+ liveins: $r0, $r2, $v2, $v3
+
+ renamable $w8 = COPY killed renamable $w1
+
+ bb.3:
+ liveins: $r0, $r2, $w8
+
+ V6_vS32Ub_ai killed renamable $r2, 0, killed renamable $v17 :: (store (s1024) into %ir.out, align 1)
+ PS_jmpret killed $r31, implicit-def $pc, implicit killed $r0
+
+...
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double2.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double2.ll
new file mode 100644
index 0000000000000..beb4d421d5e4d
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-conv-double2.ll
@@ -0,0 +1,28 @@
+; Checks if conversion is only inserted for the spilled register
+; instead of 2 conversions for the W register
+; XFAIL: *
+; NOTE: XFAIL until Hexagon HVX IEEE→QFloat isel translation is upstreamed; remove XFAIL when that lands.
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv81,+hvx-length128B \
+; RUN: -enable-postra-xqf-check -debug-only=handle-qfp 2>&1 < %s -o /dev/null | FileCheck %s
+
+; CHECK: Finding uses of: renamable $v1 = PS_vloadrv_ai %stack.0
+; CHECK: Collecting convert instruction with type Hi Op : renamable $v{{[0-9]+}} = V6_vconv_hf_qf32 killed renamable $w0
+; CHECK: Inserting new instruction: $v1 = V6_vconv_qf32_sf killed renamable $v1
+
+define void @foo(ptr %0) {
+entry:
+ br label %.preheader78.i.i
+
+.preheader78.i.i:
+ %1 = load ptr, ptr %0, align 16
+ tail call void (i32, i32, ptr, ...) %1(i32 0, i32 0, ptr null, ptr null, i32 0, ptr null, ptr null)
+ %2 = load <32 x i32>, ptr %0, align 1
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> %2, <32 x i32> zeroinitializer)
+ %bc.i8.i.i = bitcast <32 x i32> %3 to <64 x i16>
+ %4 = extractelement <64 x i16> %bc.i8.i.i, i64 0
+ store i16 %4, ptr %0, align 2
+ br label %.preheader78.i.i
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>)
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-copy3.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-copy3.ll
new file mode 100644
index 0000000000000..5388c3b1d1573
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-copy3.ll
@@ -0,0 +1,20 @@
+; Tests if the 1st operand of vasr instruction is converted to qf
+; It should NOT be.
+
+; RUN: llc -mtriple=hexagon -O0 -mv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat \
+; RUN: -enable-xqf-gen=true -hexagon-qfloat-mode=lossy < %s -o - | FileCheck %s
+
+; CHECK-LABEL: main:
+; CHECK: v[[SH:[0-9]+]] = vxor(v[[SH]],v[[SH]])
+; CHECK-NOT: qf16
+; CHECK-NOT: qf32
+; CHECK: v{{[0-9]+}}.ub = vasr(v{{[0-9]+:[0-9]+}}.uh,v[[SH]].ub):rnd:sat
+
+define i32 @main() {
+entry:
+ %0 = call <32 x i32> @llvm.hexagon.V6.vasrvuhubrndsat.128B(<64 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, <32 x i32> zeroinitializer)
+ store <32 x i32> %0, ptr null, align 128
+ ret i32 0
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vasrvuhubrndsat.128B(<64 x i32>, <32 x i32>)
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-fakereg.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-fakereg.ll
new file mode 100644
index 0000000000000..486b2b6c07b29
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-fakereg.ll
@@ -0,0 +1,130 @@
+; There should not be any mismatch for xqf with this testcase.
+
+; RUN: llc -O3 -mv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat,+hvx-ieee-fp -enable-xqf-gen=true \
+; RUN: -mtriple=hexagon -hexagon-qfloat-mode=lossy -enable-postra-xqf-check < %s -o - | FileCheck %s
+
+; CHECK-NOT: Mismatch
+
+declare void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1>, ptr, <32 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1>, <32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32>, <32 x i32>) #1
+
+define tailcc void @widget(ptr %arg) {
+bb:
+ %load = load i32, ptr %arg, align 4
+ %getelementptr = getelementptr i8, ptr null, i32 %load
+ br label %bb1
+
+bb1: ; preds = %bb67, %bb
+ %phi = phi i32 [ 0, %bb ], [ 1, %bb67 ]
+ %call = tail call <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>)
+ br label %bb50
+
+bb2: ; preds = %bb50
+ %call3 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1056964608)
+ %call4 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1060439283)
+ %call5 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -2139095041)
+ %call6 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1065353216)
+ %call7 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 255)
+ %call8 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 -2147483648)
+ %call9 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 2147483647)
+ %call10 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 0)
+ %call11 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 2139095040)
+ %call12 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %call59, <32 x i32> %call9)
+ %call13 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %call10, <32 x i32> zeroinitializer)
+ %call14 = tail call <128 x i1> @llvm.hexagon.V6.veqw.128B(<32 x i32> %call12, <32 x i32> %call10)
+ %call15 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, <32 x i32> zeroinitializer)
+ %call16 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %call14, <32 x i32> %call11, <32 x i32> %call15)
+ %call17 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %call6, <32 x i32> %call16)
+ %call18 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>)
+ %call19 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %call17, <32 x i32> %call18)
+ %call20 = tail call <128 x i1> @llvm.hexagon.V6.pred.or.128B(<128 x i1> zeroinitializer, <128 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+ %call21 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %call58, <32 x i32> %call5)
+ %call22 = tail call <32 x i32> @llvm.hexagon.V6.vor.128B(<32 x i32> %call21, <32 x i32> %call3)
+ %call23 = tail call <128 x i1> @llvm.hexagon.V6.vgtsf.128B(<32 x i32> %call4, <32 x i32> %call22)
+ %call24 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %call23, <32 x i32> zeroinitializer, <32 x i32> %call7)
+ %call25 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call26 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %call24, <32 x i32> %call8)
+ %call27 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> %call25, <32 x i32> %call26)
+ %call28 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %call27, <32 x i32> %call13)
+ %call29 = tail call <32 x i32> @llvm.hexagon.V6.vasrw.128B(<32 x i32> %call28, i32 31)
+ %call30 = tail call <128 x i1> @llvm.hexagon.V6.veqw.128B(<32 x i32> %call29, <32 x i32> zeroinitializer)
+ %call31 = tail call <32 x i32> @llvm.hexagon.V6.vasrw.128B(<32 x i32> zeroinitializer, i32 0)
+ %call32 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> %call31, <32 x i32> zeroinitializer)
+ %call33 = tail call <32 x i32> @llvm.hexagon.V6.vsubw.128B(<32 x i32> %call32, <32 x i32> zeroinitializer)
+ %call34 = tail call <32 x i32> @llvm.hexagon.V6.vsubw.128B(<32 x i32> zeroinitializer, <32 x i32> %call33)
+ %call35 = tail call <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32> %call34, <32 x i32> zeroinitializer)
+ %call36 = tail call <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call37 = tail call <32 x i32> @llvm.hexagon.V6.vasrwv.128B(<32 x i32> %call36, <32 x i32> %call35)
+ %call38 = tail call <32 x i32> @llvm.hexagon.V6.vaslwv.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call39 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %call38)
+ %call40 = tail call <32 x i32> @llvm.hexagon.V6.vaddw.128B(<32 x i32> %call39, <32 x i32> %call37)
+ %call41 = tail call <32 x i32> @llvm.hexagon.V6.vsubw.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call42 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %call30, <32 x i32> %call40, <32 x i32> %call41)
+ %call43 = tail call <128 x i1> @llvm.hexagon.V6.vgtsf.128B(<32 x i32> %call58, <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>)
+ %call44 = tail call <32 x i32> @llvm.hexagon.V6.vaddwq.128B(<128 x i1> %call43, <32 x i32> %call42, <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>)
+ %call45 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call46 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %call20, <32 x i32> %call45, <32 x i32> %call44)
+ %call47 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> %call19, <32 x i32> %call46)
+ %icmp = icmp sgt i32 0, 0
+ br i1 %icmp, label %bb48, label %bb67
+
+bb48: ; preds = %bb2
+ %load49 = load i32, ptr null, align 4
+ br label %bb67
+
+bb50: ; preds = %bb50, %bb1
+ %phi51 = phi <64 x i32> [ %call, %bb1 ], [ %call63, %bb50 ]
+ %phi52 = phi i32 [ 0, %bb1 ], [ %add64, %bb50 ]
+ %load53 = load i32, ptr %arg, align 4
+ %add = add i32 %phi, %phi52
+ %mul = mul i32 %add, %load53
+ %getelementptr54 = getelementptr i16, ptr %getelementptr, i32 %mul
+ %load55 = load <32 x i32>, ptr %getelementptr54, align 1
+ %call56 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.128B(<32 x i32> %load55, <32 x i32> zeroinitializer)
+ %call57 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %phi51)
+ %call58 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call56)
+ %call59 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> %call57, <32 x i32> %call58)
+ %call60 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %phi51)
+ %call61 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %call56)
+ %call62 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> %call60, <32 x i32> %call61)
+ %call63 = tail call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %call62, <32 x i32> %call59)
+ %add64 = add i32 %phi52, 1
+ %load65 = load i32, ptr %arg, align 4
+ %icmp66 = icmp slt i32 %phi52, %load65
+ br i1 %icmp66, label %bb50, label %bb2
+
+bb67: ; preds = %bb48, %bb2
+ tail call void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1> zeroinitializer, ptr null, <32 x i32> %call47)
+ %call68 = tail call <128 x i1> @llvm.hexagon.V6.pred.and.128B(<128 x i1> zeroinitializer, <128 x i1> zeroinitializer)
+ %call69 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> %call68, <32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call70 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %call69)
+ %call71 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %call70)
+ %call72 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %call71)
+ %call73 = tail call <32 x i32> @llvm.hexagon.V6.vmux.128B(<128 x i1> zeroinitializer, <32 x i32> %call72, <32 x i32> zeroinitializer)
+ %call74 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call73)
+ tail call void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1> zeroinitializer, ptr null, <32 x i32> %call74)
+ br label %bb1
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.vgtsf.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsubw.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.veqw.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vaddw.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vaslwv.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vand.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vor.128B(<32 x i32>, <32 x i32>) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.or.128B(<128 x i1>, <128 x i1>) #1
+declare <128 x i1> @llvm.hexagon.V6.pred.and.128B(<128 x i1>, <128 x i1>) #1
+declare <32 x i32> @llvm.hexagon.V6.vasrw.128B(<32 x i32>, i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vaddwq.128B(<128 x i1>, <32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vasrwv.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32>, <32 x i32>) #1
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash.ll
new file mode 100644
index 0000000000000..3bd86db22aa68
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash.ll
@@ -0,0 +1,23 @@
+; Tests that the PostRA XQF handle pass does not crash on basic qf16 ops.
+;
+; RUN: llc -mtriple=hexagon-unknown-elf -mcpu=hexagonv79 \
+; RUN: -mattr=+hvx-length128b,+hvxv79,+hvx-ieee-fp,+hvx-qfloat \
+; RUN: < %s -o /dev/null
+
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #0
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #0
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32>, <32 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf16.128B(<32 x i32>, <32 x i32>) #0
+
+define i32 @main() {
+entry:
+ %zero = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 0)
+ %a = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 14336)
+ %b = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 13312)
+ %q1 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %zero, <32 x i32> %a)
+ %q2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %zero, <32 x i32> %b)
+ %q3 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf16.128B(<32 x i32> %q1, <32 x i32> %q2)
+ ret i32 0
+}
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash2.mir b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash2.mir
new file mode 100644
index 0000000000000..56ddae8c3392e
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-crash2.mir
@@ -0,0 +1,86 @@
+# RUN: llc -mtriple=hexagon -run-pass handle-qfp-spills-refills -verify-machineinstrs %s -o - | FileCheck %s
+
+# CHECK: renamable $v{{[0-9]+}} = V6_vd0
+# CHECK-NEXT: renamable $v{{[0-9]+}} = V6_vconv_qf32_sf killed renamable $v{{[0-9]+}}
+# CHECK-NEXT: $v{{[0-9]+}} = V6_vconv_sf_qf32 killed renamable $v{{[0-9]+}}
+# CHECK-NEXT: renamable $v{{[0-9]+}} = COPY renamable $v{{[0-9]+}}
+# CHECK-NEXT: $v[[VREG0:[0-9]+]] = V6_vconv_qf32_sf killed renamable $v[[VREG0]]
+# CHECK-NEXT: $v[[VREG1:[0-9]+]] = V6_vconv_qf32_sf killed renamable $v[[VREG1]]
+# CHECK-NEXT: renamable $v[[VREG0]] = V6_vconv_hf_qf32 renamable $w[[VREG0]]
+# CHECK-NEXT: $v[[VREG1]] = V6_vconv_sf_qf32 killed renamable $v[[VREG1]]
+
+--- |
+ declare void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1>, ptr, <32 x i32>)
+ declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>)
+
+ define i32 @foo(i1 %cmp14106.not, ptr %gep114.us.us) #1 {
+ entry:
+ br i1 %cmp14106.not, label %for.body10.lr.ph.split.us.us, label %for.body10.us6
+
+ for.body10.us6: ; preds = %entry, %for.body10.us6
+ br label %for.body10.us6
+
+ for.cond.cleanup39.us: ; preds = %for.body10.us.us
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ tail call void @llvm.hexagon.V6.vS32b.qpred.ai.128B(<128 x i1> zeroinitializer, ptr null, <32 x i32> %0)
+ ret i32 0
+
+ for.body10.lr.ph.split.us.us: ; preds = %entry
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ br label %for.body10.us.us
+
+ for.body10.us.us: ; preds = %for.body10.us.us, %for.body10.lr.ph.split.us.us
+ store <32 x i32> %1, ptr %gep114.us.us, align 1
+ br i1 %cmp14106.not, label %for.body10.us.us, label %for.cond.cleanup39.us
+ }
+
+ attributes #1 = { "target-cpu"="hexagonv81" "target-features"="+hvxv81,+hvx-length128b" }
+
+...
+---
+name: foo
+body: |
+ bb.0.entry:
+ successors: %bb.4(0x40000000), %bb.1(0x40000000)
+ liveins: $r0, $r1
+
+ renamable $p0 = S2_tstbit_i killed renamable $r0, 0
+ J2_jumpt renamable $p0, %bb.4, implicit-def $pc
+
+ bb.1:
+ successors: %bb.2(0x80000000)
+
+ bb.2.for.body10.us6:
+ successors: %bb.2(0x80000000)
+
+ J2_jump %bb.2, implicit-def dead $pc
+
+ bb.3.for.cond.cleanup39.us:
+ liveins: $w0:0x0000000000000010
+
+ renamable $v0 = COPY renamable $v1
+ $r0 = A2_tfrsi 0
+ renamable $q0 = PS_qfalse
+ renamable $r2 = A2_tfrsi 0
+ renamable $v1 = V6_vconv_hf_qf32 killed renamable $w0
+ V6_vS32b_qpred_ai killed renamable $q0, killed renamable $r2, 0, killed renamable $v1
+ PS_jmpret $r31, implicit-def dead $pc, implicit $r0
+
+ bb.4.for.body10.lr.ph.split.us.us:
+ successors: %bb.5(0x80000000)
+ liveins: $p0, $r1
+
+ renamable $v2 = V6_vd0
+ renamable $v1 = V6_vconv_qf32_sf killed renamable $v2
+ renamable $v0 = COPY renamable $v1
+ renamable $v0 = V6_vconv_hf_qf32 renamable $w0
+
+ bb.5.for.body10.us.us:
+ successors: %bb.5(0x7c000000), %bb.3(0x04000000)
+ liveins: $p0, $r1, $v0, $w0:0x0000000000000010
+
+ V6_vS32Ub_ai renamable $r1, 0, renamable $v0 :: (store (s1024) into %ir.gep114.us.us, align 1)
+ J2_jumpt renamable $p0, %bb.5, implicit-def dead $pc
+ J2_jump %bb.3, implicit-def dead $pc
+
+...
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-qf32-mul.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-qf32-mul.ll
new file mode 100644
index 0000000000000..0749f4b5025c5
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-handle-qf32-mul.ll
@@ -0,0 +1,69 @@
+; Tests the case when an argument of vmpy(qf,qf) after spills/fills
+; is an IEEE-754 type and another is qf type. The qf type is converted
+; to IEEE type and the opcode is changed to handle two IEEE types
+; The converted IEEE types are converted back to qf if there are used
+; after the instruction.
+; XFAIL: *
+; NOTE: XFAIL until Hexagon HVX IEEE→QFloat isel translation is upstreamed; remove XFAIL when that lands.
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp -o /dev/null < %s 2>&1 | FileCheck %s
+
+; CHECK: Instruction: renamable [[V5:\$v[0-9]+]] = V6_vmpy_qf32 killed renamable [[V14:\$v[0-9]+]], killed renamable [[V4:\$v[0-9]+]]
+; CHECK: Property: 1 ,0
+; CHECK: Instruction: renamable [[V9:\$v[0-9]+]] = V6_vmpy_qf32 killed renamable [[V13:\$v[0-9]+]], killed renamable [[V8:\$v[0-9]+]]
+; CHECK: Property: 1 ,0
+; CHECK: Inserting new instruction before: [[V4]] = V6_vconv_sf_qf32 killed renamable [[V4]]
+; CHECK: Inserting new instruction: [[V5]] = V6_vmpy_qf32_sf killed renamable [[V14]], killed renamable [[V4]]
+; CHECK: Inserting new instruction before: [[V8]] = V6_vconv_sf_qf32 killed renamable [[V8]]
+; CHECK: Inserting new instruction: [[V9]] = V6_vmpy_qf32_sf killed renamable [[V13]], killed renamable [[V8]]
+
+
+ at .str = private unnamed_addr constant [16 x i8] c"Vector[%d]= %x\0A\00", align 1
+ at VectorResult = common dso_local global <32 x i32> zeroinitializer, align 128
+ at ptr = common dso_local local_unnamed_addr global [32768 x i8] zeroinitializer, align 8
+ at str = private unnamed_addr constant [65 x i8] c"HVX_Vector : Q6_Vqf32_vmpy_VsfRsf(Q6_V_vsplat_R(0+1),INT32_MIN)\00", align 1
+ at str.3 = private unnamed_addr constant [58 x i8] c"HVX_Vector : Q6_Vqf32_vmpy_VsfRsf(Q6_V_vsplat_R(0+1),-1)\00", align 1
+
+declare dso_local void @print_vector(i32 noundef, ptr nocapture noundef readonly) local_unnamed_addr #0
+
+declare dso_local noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #0
+
+define dso_local i32 @main() local_unnamed_addr #0 {
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.rt.sf.128B(<32 x i32> %0, i32 -2147483648)
+ store <32 x i32> %1, ptr @VectorResult, align 128
+ %puts = tail call i32 @puts(ptr nonnull dereferenceable(1) @str)
+ br label %for.body.i
+
+for.body.i: ; preds = %for.body.i, %entry
+ %counter.06.i = phi i32 [ %inc.i, %for.body.i ], [ 0, %entry ]
+ %pointer.05.i = phi ptr [ %incdec.ptr.i, %for.body.i ], [ @VectorResult, %entry ]
+ %incdec.ptr.i = getelementptr inbounds i16, ptr %pointer.05.i, i32 1
+ %2 = load i16, ptr %pointer.05.i, align 2
+ %conv.i = sext i16 %2 to i32
+ %call.i = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %counter.06.i, i32 noundef %conv.i) #3
+ %inc.i = add nuw nsw i32 %counter.06.i, 1
+ %exitcond.not.i = icmp eq i32 %inc.i, 64
+ br i1 %exitcond.not.i, label %print_vector.exit, label %for.body.i
+
+print_vector.exit: ; preds = %for.body.i
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.rt.sf.128B(<32 x i32> %0, i32 -1)
+ store <32 x i32> %3, ptr @VectorResult, align 128
+ %puts2 = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.3)
+ tail call void @print_vector(i32 noundef 128, ptr noundef nonnull @VectorResult)
+ ret i32 0
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vmpy.rt.sf.128B(<32 x i32>, i32) #1
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+
+; Function Attrs: nofree nounwind
+declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #2
+
+attributes #0 = { nofree nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-ieee-fp,+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls,-small-data" }
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { nofree nounwind }
+attributes #3 = { nounwind }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-legacy-mode.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-legacy-mode.ll
new file mode 100644
index 0000000000000..83f145bf73803
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-legacy-mode.ll
@@ -0,0 +1,30 @@
+; check if Post RA pass handles qf types in all xqf modes.
+; Since legacy mode is the default mode, check if the pass
+; runs without hexagon-qfloat-mode flag explicitly set.
+
+; REQUIRES: asserts
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=legacy -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s --check-prefix LEGACY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s --check-prefix LEGACY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=lossy -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s --check-prefix LOSSY
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=ieee -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s --check-prefix IEEE
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv79,+hvx-length128B -debug-only=handle-qfp \
+; RUN: 2>&1 < %s -o /dev/null | FileCheck %s --check-prefix STRICT-IEEE
+
+; LEGACY: Mode: Legacy
+; LOSSY: Mode: Lossy
+; IEEE: Mode: IEEE
+; STRICT-IEEE: Mode: Strict IEEE
+
+define i32 @foo() {
+entry:
+ ret i32 0
+}
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg2.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg2.ll
new file mode 100644
index 0000000000000..d5704fb6e6956
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg2.ll
@@ -0,0 +1,99 @@
+; Test passes if we don't generate conversion for both
+; of the subregisters since only one is live at the use.
+;
+; UNSUPPORTED: asserts
+
+; REQUIRES: asserts
+; RUN: llc -O2 -mtriple=hexagon -mattr=+hvxv81,+hvx-length128B \
+; RUN: -enable-xqf-gen=true -hexagon-qfloat-mode=lossy \
+; RUN: -debug-only=handle-qfp -enable-postra-xqf-check < %s 2>&1 -o - | FileCheck %s
+
+; CHECK: [HandleConvertToQfCopies] Processing Copy: renamable [[V0:\$v[0-9]+]] = COPY
+; CHECK: Inserting new instruction: [[V0]] = V6_vconv_qf32_sf killed renamable [[V0]]
+; CHECK: [HandleConvertToQfCopies] Processing Copy: renamable [[V1:\$v[0-9]+]] = COPY
+; CHECK: Inserting new instruction: [[V1]] = V6_vconv_qf32_sf killed renamable [[V1]]
+; CHECK: [HandleConvertToQfCopies] Processing Copy: renamable [[V2:\$v[0-9]+]] = COPY
+; CHECK: Inserting new instruction: [[V2]] = V6_vconv_qf32_sf killed renamable [[V2]]
+; CHECK: [HandleConvertToQfCopies] Processing Copy: renamable [[V3:\$v[0-9]+]] = COPY
+; CHECK: Inserting new instruction: [[V3]] = V6_vconv_qf32_sf killed renamable [[V3]]
+
+
+declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #0
+declare <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32>, <32 x i32>, i32) #0
+declare <32 x i32> @llvm.hexagon.V6.vshuffh.128B(<32 x i32>) #0
+declare <64 x i32> @llvm.hexagon.V6.vshufoeh.128B(<32 x i32>, <32 x i32>) #0
+
+define tailcc void @hoge(ptr %arg, ptr %arg1, i1 %arg2, <32 x i32> %arg3, <32 x i32> %arg4, <32 x i32> %arg5) {
+bb:
+ br label %bb6
+
+bb6: ; preds = %bb51, %bb
+ br i1 %arg2, label %bb7, label %bb14
+
+bb7: ; preds = %bb6
+ %call = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, <32 x i32> zeroinitializer)
+ %call8 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %call, <32 x i32> zeroinitializer)
+ %call9 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call8)
+ %call10 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> %call9, <32 x i32> zeroinitializer)
+ %call11 = tail call <32 x i32> @llvm.hexagon.V6.vshuffh.128B(<32 x i32> %call10)
+ %call12 = tail call <64 x i32> @llvm.hexagon.V6.vshufoeh.128B(<32 x i32> %call11, <32 x i32> zeroinitializer)
+ %call13 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call12)
+ br label %bb51
+
+bb14: ; preds = %bb6
+ %load = load ptr, ptr %arg, align 16
+ tail call void (i32, i32, ptr, ...) %load(i32 0, i32 0, ptr null, ptr null, i32 0, ptr null, ptr null)
+ %call15 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>)
+ %call16 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call15)
+ %call17 = tail call <32 x i32> @llvm.hexagon.V6.vshuffh.128B(<32 x i32> %call16)
+ %call18 = tail call <64 x i32> @llvm.hexagon.V6.vshufoeh.128B(<32 x i32> %call17, <32 x i32> zeroinitializer)
+ %call19 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call18)
+ store <32 x i32> %call19, ptr %arg1, align 128
+ %call20 = tail call <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32> %call19, <32 x i32> zeroinitializer, i32 0)
+ %call21 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call20)
+ %load22 = load <32 x i32>, ptr %arg, align 128
+ %call23 = tail call <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32> %load22, <32 x i32> zeroinitializer, i32 0)
+ %call24 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call23)
+ %call25 = tail call <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32> %call21)
+ %call26 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call25)
+ %call27 = tail call <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32> %call24)
+ %call28 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call27)
+ %call29 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32> %call28, <32 x i32> %call26)
+ %call30 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %call29, <32 x i32> zeroinitializer)
+ %call31 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call30)
+ %call32 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32> %call31, <32 x i32> zeroinitializer)
+ %call33 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %call32, <32 x i32> zeroinitializer)
+ %call34 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call33)
+ %call35 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> %call34, <32 x i32> zeroinitializer)
+ %call36 = tail call <32 x i32> @llvm.hexagon.V6.vshuffh.128B(<32 x i32> %call35)
+ %call37 = tail call <64 x i32> @llvm.hexagon.V6.vshufoeh.128B(<32 x i32> %call36, <32 x i32> %arg5)
+ %call38 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call37)
+ store <32 x i32> zeroinitializer, ptr %arg1, align 128
+ %call39 = tail call <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32> zeroinitializer, <32 x i32> %call38, i32 0)
+ %call40 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call39)
+ %call41 = tail call <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32> %call40)
+ %call42 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %call41)
+ %call43 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> %call42, <32 x i32> zeroinitializer)
+ %call44 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32> %call43, <32 x i32> zeroinitializer)
+ %call45 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32> %call44, <32 x i32> zeroinitializer)
+ %call46 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call45)
+ %call47 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call46)
+ %call48 = tail call <32 x i32> @llvm.hexagon.V6.vshuffh.128B(<32 x i32> %call47)
+ %call49 = tail call <64 x i32> @llvm.hexagon.V6.vshufoeh.128B(<32 x i32> %call48, <32 x i32> zeroinitializer)
+ %call50 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %call49)
+ br label %bb51
+
+bb51: ; preds = %bb14, %bb7
+ %phi = phi <32 x i32> [ %call50, %bb14 ], [ %call13, %bb7 ]
+ store <32 x i32> %phi, ptr %arg, align 128
+ br label %bb6
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.sf.128B(<32 x i32>, <32 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.vmpy.sf.sf.128B(<32 x i32>, <32 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.vsub.sf.sf.128B(<32 x i32>, <32 x i32>) #0
+declare <64 x i32> @llvm.hexagon.V6.vcvt.sf.hf.128B(<32 x i32>) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg3.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg3.ll
new file mode 100644
index 0000000000000..ca17d6c01d0a1
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-subreg3.ll
@@ -0,0 +1,45 @@
+; Test passes if there is no mismatch on a convert instruction
+;
+; UNSUPPORTED: asserts
+
+; REQUIRES: asserts
+; RUN: llc -O2 -mtriple=hexagon -mattr=+hvxv81,+hvx-length128B \
+; RUN: -enable-xqf-gen=true -hexagon-qfloat-mode=lossy \
+; RUN: -debug-only=handle-qfp -enable-postra-xqf-check < %s 2>&1 -o - | FileCheck %s
+
+; CHECK: Mismatch: qf32 type used as sf at operand
+; CHECK-NOT: Def: renamable $v{{[0-9]+}} = V6_vconv_qf32_sf renamable
+
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #0
+
+define tailcc void @widget(ptr %arg, ptr %arg1, i1 %arg2, i1 %arg3, <32 x i32> %arg4) {
+bb:
+ br label %bb5
+
+bb5: ; preds = %bb7, %bb
+ br i1 %arg2, label %bb6, label %bb7
+
+bb6: ; preds = %bb5
+ %load = load <32 x i32>, ptr %arg, align 128
+ br label %bb7
+
+bb7: ; preds = %bb6, %bb5
+ %phi = phi <32 x i32> [ %load, %bb6 ], [ zeroinitializer, %bb5 ]
+ %call = tail call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32> zeroinitializer, <32 x i32> %phi, <32 x i32> zeroinitializer)
+ tail call void (i32, i32, ptr, ...) %arg(i32 0, i32 0, ptr null, ptr null, i32 0, ptr null, ptr null)
+ %call8 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32> %call, <32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %call9 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32> %call8, <32 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, <32 x i32> zeroinitializer)
+ %call10 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %call9)
+ %call11 = tail call <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32> zeroinitializer, <32 x i32> %call10)
+ store <32 x i32> %call11, ptr %arg1, align 128
+ br i1 %arg3, label %bb5, label %bb12
+
+bb12: ; preds = %bb12, %bb7
+ store <32 x i32> %arg4, ptr %arg, align 128
+ br label %bb12
+}
+
+declare <64 x i32> @llvm.hexagon.V6.vmpy.sf.hf.acc.128B(<64 x i32>, <32 x i32>, <32 x i32>) #0
+declare <32 x i32> @llvm.hexagon.V6.vcvt.hf.sf.128B(<32 x i32>, <32 x i32>) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-warnings.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-warnings.ll
new file mode 100644
index 0000000000000..16e6c2a9dc75d
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-postra-warnings.ll
@@ -0,0 +1,60 @@
+; Tests for emitted warnings when IEEE type is used as qf and vice-versa
+; post register allocation.
+
+; REQUIRES: asserts
+; RUN: llc --mtriple=hexagon-- -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat \
+; RUN: -enable-xqf-gen=false -enable-postra-xqf-check=true -hexagon-qfloat-mode=ieee \
+; RUN: -verify-machineinstrs 2>&1 < %s -o /dev/null | FileCheck %s
+; RUN: llc --mtriple=hexagon-- -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat \
+; RUN: -enable-xqf-gen=false -enable-postra-xqf-check=true -hexagon-qfloat-mode=ieee \
+; RUN: -verify-machineinstrs 2>&1 < %s -o /dev/null | FileCheck %s
+
+define dso_local inreg <64 x i32> @foo(<32 x i32> noundef %vina, <32 x i32> noundef %vinb) local_unnamed_addr #0{
+;CHECK: Mismatch: hf type used as qf16 at operand 1
+;CHECK-NEXT: Def: renamable [[VREG2:\$v[0-9]+]] = V6_lvsplath
+;CHECK-NEXT: Use: renamable $v{{[0-9]+}} = V6_vadd_qf16_mix killed renamable [[VREG2]]
+;CHECK-NEXT: Mismatch: sf type used as qf32 at operand 1
+;CHECK-NEXT: Def: renamable [[VREG4:\$v[0-9]+]] = V6_lvsplatw
+;CHECK-NEXT: Use: renamable $v{{[0-9]+}} = V6_vadd_qf32_mix killed renamable [[VREG4]]
+;CHECK-NEXT: Mismatch: qf16 type used as hf at operand 2
+;CHECK-NEXT: Def: renamable [[VREG6:\$v[0-9]+]] = V6_vadd_qf16_mix
+;CHECK-NEXT: Use: renamable $w{{[0-9]+}} = V6_vmpy_qf32_hf killed renamable $v{{[0-9]+}}, killed renamable [[VREG6]]
+;CHECK-NEXT: Mismatch: qf32 type used as sf at operand 2
+;CHECK-NEXT: Def: renamable [[VREG7:\$v[0-9]+]] = V6_vadd_qf32_mix
+;CHECK-NEXT: Use: renamable $v{{[0-9]+}} = V6_vsub_qf32_mix killed renamable $v{{[0-9]+}}, renamable [[VREG7]]
+;CHECK: Mismatch: qf32 type used as sf at operand 2
+;CHECK: Def: renamable $v[[R0:[0-9]+]] = V6_vsub_qf32_mix
+;CHECK: Mismatch: qf32 type used as sf at operand 2
+;CHECK: Def: renamable $v[[R1:[0-9]+]] = V6_vsub_qf32
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15360)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 48128)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 44032)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 56320)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %0, <32 x i32> %1)
+ %5 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.mix.128B(<32 x i32> %2, <32 x i32> %3)
+ %6 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.qf16.128B(<32 x i32> %vina, <32 x i32> %4)
+ %7 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32> %vinb, <32 x i32> %4)
+ %8 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %6)
+ %9 = tail call <32 x i32> @llvm.hexagon.V6.vsub.qf32.mix.128B(<32 x i32> %8, <32 x i32> %5)
+ %10 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %7)
+ %11 = tail call <32 x i32> @llvm.hexagon.V6.vsub.qf32.128B(<32 x i32> %10, <32 x i32> %5)
+ %12 = tail call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %11, <32 x i32> %9)
+ ret <64 x i32> %12
+}
+
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf32.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vmpy.qf32.qf16.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.qf32.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32>, <32 x i32>) #1
+declare void @llvm.dbg.value(metadata, metadata, metadata) #2
+
+attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(none) "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-long-calls" "unsafe-fp-math"="true" }
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-strict-ieee-mul-qf16.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-strict-ieee-mul-qf16.ll
new file mode 100644
index 0000000000000..60e9897ec489b
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-strict-ieee-mul-qf16.ll
@@ -0,0 +1,91 @@
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv81,+hvx-length128B < %s | FileCheck %s
+
+; Test qf16 = vmpy(qf16 ,qf16) when both inputs are from vadd instruction
+define <64 x half> @mul_add_3(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vadd(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V6:v[0-9]+]] = vxor([[V6]],[[V6]])
+; CHECK-DAG: [[V5:v[0-9]+]].hf = [[V3]].qf16
+; CHECK-DAG: [[V7:v[0-9]+]].hf = [[V4]].qf16
+; CHECK-DAG: [[V31:v[0-9]+:[0-9]+]].qf32 = vmpy([[V7]].hf,[[V5]].hf)
+; CHECK-DAG: [[V8:v[0-9]+]].hf = [[V31]].qf32
+; CHECK-DAG: [[V9:v[0-9]+]].qf16 = vsub([[V8]].hf,[[V6]].hf)
+; CHECK: hf = [[V9]].qf16
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fadd <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %v0, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf32 = vmpy(qf16 ,qf16) when both inputs are from vadd and vmul instruction
+define <64 x half> @mul_add_mul(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V31:v[0-9]+:[0-9]+]].qf32 = vmpy(v0.hf,v2.hf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V31]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].hf = [[V4]].qf16
+; CHECK-DAG: [[V7:v[0-9]+]].qf16 = vsub([[V3]].hf,[[V5]].hf)
+; CHECK-DAG: [[V8:v[0-9]+]].hf = [[V7]].qf16
+; CHECK-DAG: [[V32:v[0-9]+:[0-9]+]].qf32 = vmpy([[V6]].hf,[[V8]].hf)
+; CHECK-DAG: [[V9:v[0-9]+]].hf = [[V32]].qf32
+; CHECK-DAG: [[V10:v[0-9]+]].qf16 = vsub([[V9]].hf,[[V5]].hf)
+; CHECK: hf = [[V10]].qf16
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v1 = fmul <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %v0, %v1
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(sf ,sf)
+define <64 x half> @mul_add_0(<64 x half> %a0, <64 x half> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK-DAG: [[V31:v[0-9]+:[0-9]+]].qf32 = vmpy(v0.hf,v1.hf)
+; CHECK-DAG: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; CHECK-DAG: [[V3:v[0-9]+]].hf = [[V31]].qf32
+; CHECK-DAG: [[V4:v[0-9]+]].qf16 = vsub([[V3]].hf,[[V2]].hf)
+; CHECK: hf = [[V4]].qf16
+label0:
+ %v3 = fmul <64 x half> %a0, %a1
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(qf16 ,qf16) when first input is from vadd instruction
+define <64 x half> @mul_add_1(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vadd(v0.hf,v1.hf)
+; CHECK-DAG: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; CHECK-DAG: [[V4:v[0-9]+]].hf = [[V3]].qf16
+; CHECK-DAG: [[V31:v[0-9]+:[0-9]+]].qf32 = vmpy({{.*}}.hf,{{.*}}.hf)
+; CHECK-DAG: [[V6:v[0-9]+]].hf = [[V31]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf16 = vsub([[V6]].hf,[[V5]].hf)
+; CHECK: hf = [[V7]].qf16
+label0:
+ %v0 = fadd <64 x half> %a0, %a1
+ %v3 = fmul <64 x half> %v0, %a2
+ ret <64 x half> %v3
+}
+
+; Test qf16 = vmpy(qf16 ,qf16) when second input is from vadd instruction
+define <64 x half> @mul_add_2(<64 x half> %a0, <64 x half> %a1, <64 x half> %a2) #0 {
+; CHECK-LABEL: mul_add_2:
+; CHECK-DAG: [[V3:v[0-9]+]].qf16 = vsub(v0.hf,v2.hf)
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V4:v[0-9]+]].hf = [[V3]].qf16
+; CHECK-DAG: [[V31:v[0-9]+:[0-9]+]].qf32 = vmpy(v1.hf,[[V4]].hf)
+; CHECK-DAG: [[V6:v[0-9]+]].hf = [[V31]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf16 = vsub([[V6]].hf,[[V5]].hf)
+; CHECK-DAG: v0.hf = [[V7]].qf16
+label0:
+ %v1 = fsub <64 x half> %a0, %a2
+ %v3 = fmul <64 x half> %a1, %v1
+ ret <64 x half> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-length128b,+hvx-qfloat,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-strictieee-mul-qf32.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-strictieee-mul-qf32.ll
new file mode 100644
index 0000000000000..e65b4d0ac8f8f
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-strictieee-mul-qf32.ll
@@ -0,0 +1,123 @@
+; Tests strict-ieee mode for XQFloat for multiplication 32-bit
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv79 -force-hvx-float -enable-xqf-gen=true -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv79,+hvx-length128B < %s | FileCheck %s -check-prefix=CHECK
+
+; Test qf32 = vmpy(sf, sf)
+; Normalization of inputs
+define <32 x float> @mul_add_0(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; CHECK: [[R0:r[0-9]+]] = ##2147483648
+; CHECK: [[V3:v[0-9]+]] = vsplat([[R0]])
+; CHECK: [[V4:v[0-9]+]].qf32 = vmpy([[V2]].sf,[[V3]].sf)
+; CHECK: [[V5:v[0-9]+]].qf32 = vadd([[V4]].qf32,v0.sf)
+; CHECK: [[V6:v[0-9]+]].qf32 = vadd([[V4]].qf32,v1.sf)
+; CHECK: qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+label0:
+ %v3 = fmul <32 x float> %a0, %a1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(sf ,qf32) when only one input is from vadd instruction
+define <32 x float> @mul_add_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V4:v[0-9]+]] = vxor([[V4]],[[V4]])
+; CHECK-DAG: [[V5:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].sf,[[V5]].sf)
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V6:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,v1.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V6]].sf)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @mul_add_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V6:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vmpy([[V5]].sf,[[V6]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].sf = [[V4]].qf32
+; CHECK-DAG: [[V10:v[0-9]+]].qf32 = vadd([[V9]].qf32,[[V8]].sf)
+; CHECK-DAG: [[V11:v[0-9]+]].qf32 = vadd([[V9]].qf32,[[V7]].sf)
+; CHECK: qf32 = vmpy([[V10]].qf32,[[V11]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when only first input is from vsub instruction
+define <32 x float> @mul_sub_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_1:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V2:v[0-9]+]] = vxor([[V2]],[[V2]])
+; CHECK-DAG: [[V6:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V2]].sf,[[V4]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V7]].qf32,v1.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V7]].qf32,[[V6]].sf)
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+label0:
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vsub instruction
+define <32 x float> @mul_sub_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V5:v[0-9]+]] = vxor([[V5]],[[V5]])
+; CHECK-DAG: [[V6:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vsub(v0.sf,v1.sf)
+; CHECK-DAG: [[V7:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vmpy([[V5]].sf,[[V6]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].sf = [[V4]].qf32
+; CHECK-DAG: [[V10:v[0-9]+]].qf32 = vadd([[V9]].qf32,[[V8]].sf)
+; CHECK-DAG: [[V11:v[0-9]+]].qf32 = vadd([[V9]].qf32,[[V7]].sf)
+; CHECK: qf32 = vmpy([[V10]].qf32,[[V11]].qf32)
+label0:
+ %v0 = fsub <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when inputs are from vadd and vmul respectively
+; The inputs to both multiplications are converted to IEEE and normalized.
+define <32 x float> @mul_add_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[R0:r[0-9]+]] = ##2147483648
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V3:v[0-9]+]] = vxor([[V3]],[[V3]])
+; CHECK-DAG: [[V4:v[0-9]+]] = vsplat([[R0]])
+; CHECK-DAG: [[V7:v[0-9]+]].sf = [[V5]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vmpy([[V3]].sf,[[V4]].sf)
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = vadd([[V6]].qf32,v0.sf)
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = vadd([[V6]].qf32,v2.sf)
+; CHECK-DAG: [[V12:v[0-9]+]].qf32 = vadd([[V6]].qf32,[[V7]].sf)
+; CHECK-DAG: [[V10:v[0-9]+]].qf32 = vmpy([[V8]].qf32,[[V9]].qf32)
+; CHECK-DAG: [[V11:v[0-9]+]].sf = [[V10]].qf32
+; CHECK-DAG: [[V13:v[0-9]+]].qf32 = vadd([[V6]].qf32,[[V11]].sf)
+; CHECK: vmpy([[V12]].qf32,[[V13]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fmul <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-unary-crash.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-unary-crash.ll
new file mode 100644
index 0000000000000..938f63e906e10
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-unary-crash.ll
@@ -0,0 +1,25 @@
+; Tests that unary qf instruction handling in postRA xqf handler
+; does not cause a crash
+; REQUIRES: asserts
+
+; RUN: llc -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat \
+; RUN: -enable-xqf-gen=true -hexagon-qfloat-mode=ieee %s -o /dev/null
+
+target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+target triple = "hexagon"
+
+declare i32 @printf(...)
+
+define i32 @main(ptr %0) {
+entry:
+ store <32 x i32> zeroinitializer, ptr %0, align 128
+ %call3 = call i32 (...) @printf()
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vilog2.qf16.128B(<32 x i32> zeroinitializer)
+ store <32 x i32> %1, ptr %0, align 128
+ ret i32 0
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
+declare <32 x i32> @llvm.hexagon.V6.vilog2.qf16.128B(<32 x i32>) #0
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-compliant-ieee-mul-qf32.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-compliant-ieee-mul-qf32.ll
new file mode 100644
index 0000000000000..ecf4ffdfc83fa
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-compliant-ieee-mul-qf32.ll
@@ -0,0 +1,109 @@
+; Tests compliant IEEE mode for XQFloat multiplication 32-bit for v81.
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=ieee -mattr=+hvxv81,+hvx-length128B \
+; RUN: < %s | FileCheck %s -check-prefix=CHECK
+
+; Test qf32 = vmpy(sf, sf)
+; Normalization of inputs
+define <32 x float> @mul_add_0(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = v0.sf
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v1.sf
+; CHECK: qf32 = vmpy([[V3]].qf32,[[V4]].qf32)
+label0:
+ %v3 = fmul <32 x float> %a0, %a1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(sf ,qf32) when only one input is from vadd instruction
+define <32 x float> @mul_add_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v1.sf
+; CHECK-DAG: [[V5:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V5]].sf
+; CHECK: qf32 = vmpy([[V4]].qf32,[[V6]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @mul_add_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V4]].qf32
+; CHECK: qf32 = vmpy([[V6]].qf32,[[V5]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when only first input is from vsub instruction
+define <32 x float> @mul_sub_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v1.sf
+; CHECK-DAG: [[V5:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V5]].sf
+; CHECK: qf32 = vmpy([[V4]].qf32,[[V6]].qf32)
+label0:
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vsub instruction
+define <32 x float> @mul_sub_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vsub(v0.sf,v1.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V4]].qf32
+; CHECK: qf32 = vmpy([[V6]].qf32,[[V5]].qf32)
+label0:
+ %v0 = fsub <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32, qf32) when one is from adder, another from multiplier
+define <32 x float> @mul_add_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v0.sf
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = v1.sf
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V3]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V4]].qf32,[[V5]].qf32)
+; CHECK: qf32 = vmpy([[V6]].qf32,[[V7]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+define <32 x float> @mul_mul_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+label0:
+; CHECK-LABEL: mul_mul_mul
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = v0.sf
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v2.sf
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = v1.sf
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vmpy([[V3]].qf32,[[V4]].qf32)
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = vmpy([[V3]].qf32,[[V5]].qf32)
+; CHECK: qf32 = vmpy([[V6]].qf32,[[V7]].qf32)
+ %v1 = fmul <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv81" "target-features"="+hvx-length128b,+hvx-qfloat,+hvxv81,+v81,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-lossy-mul-qf32.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-lossy-mul-qf32.ll
new file mode 100644
index 0000000000000..1fb3e9ee426ff
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-lossy-mul-qf32.ll
@@ -0,0 +1,98 @@
+; Tests lossy-subnormals mode for XQFloat multiplication 32-bit for v81.
+; The normamlization sequence is different than v79.
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true -enable-rem-conv=true \
+; RUN: -hexagon-qfloat-mode=lossy -mattr=+hvxv81,+hvx-length128B < %s | FileCheck %s
+
+; Test qf32 = vmpy(sf, sf)
+; Normalization of inputs
+define <32 x float> @mul_add_0(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: mul_add_0:
+; CHECK: qf32 = vmpy(v0.sf,v1.sf)
+label0:
+ %v3 = fmul <32 x float> %a0, %a1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(sf ,qf32) when only one input is from vadd instruction
+define <32 x float> @mul_add_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK: [[V4:v[0-9]+]].sf = [[V3]].qf32
+; CHECK: qf32 = vmpy(v1.sf,[[V4]].sf)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @mul_add_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = [[V4]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V3]].qf32
+; CHECK: qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when only first input is from vsub instruction
+define <32 x float> @mul_sub_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_1:
+; CHECK: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK: [[V4:v[0-9]+]].sf = [[V3]].qf32
+; CHECK: qf32 = vmpy(v1.sf,[[V4]].sf)
+label0:
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vsub instruction
+define <32 x float> @mul_sub_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vsub(v0.sf,v1.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = [[V4]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V3]].qf32
+; CHECK: qf32 = vmpy([[V5]].qf32,[[V6]].qf32)
+label0:
+ %v0 = fsub <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32, qf32) when one is from adder, another from multiplier
+define <32 x float> @mul_add_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vmpy(v0.sf,v1.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = [[V4]].qf32
+; CHECK-DAG: qf32 = vmpy([[V5]].qf32,[[V3]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32, qf32) when both are from multiplier
+define <32 x float> @mul_mul_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+label0:
+; CHECK-LABEL: mul_mul_mul
+; CHECK: [[V3:v[0-9]+]].qf32 = vmpy(v0.sf,v1.sf)
+; CHECK: [[V4:v[0-9]+]].qf32 = vmpy(v0.sf,v2.sf)
+; CHECK: qf32 = vmpy([[V4]].qf32,[[V3]].qf32)
+ %v1 = fmul <32 x float> %a0, %a2
+ %v2 = fmul <32 x float> %a0, %a1
+ %v3 = fmul <32 x float> %v1, %v2
+ ret <32 x float> %v3
+}
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv81" "target-features"="+hvx-length128b,+hvx-qfloat,+hvxv81,+v81,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-strict-mul-qf32.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-strict-mul-qf32.ll
new file mode 100644
index 0000000000000..c8ccfc075e537
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-strict-mul-qf32.ll
@@ -0,0 +1,119 @@
+; Tests strict-ieee mode for XQFloat for multiplication 32-bit
+
+; RUN: llc -O2 -march=hexagon -mcpu=hexagonv81 -force-hvx-float -enable-xqf-gen=true \
+; RUN: -hexagon-qfloat-mode=strict-ieee -mattr=+hvxv81,+hvx-length128B < %s | FileCheck %s -check-prefix=CHECK
+
+; Test qf32 = vmpy(sf, sf)
+; Normalization of inputs
+define <32 x float> @mul_add_0(<32 x float> %a0, <32 x float> %a1) #0 {
+; CHECK-LABEL: mul_add_0
+; CHECK-DAG: [[V2:v[0-9]+]].qf32 = v0.sf
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = v1.sf
+; CHECK: qf32 = vmpy([[V2]].qf32,[[V3]].qf32)
+label0:
+ %v3 = fmul <32 x float> %a0, %a1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(sf ,qf32) when only one input is from vadd instruction
+define <32 x float> @mul_add_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v1.sf
+; CHECK-DAG: [[V5:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V5]].sf
+; CHECK: qf32 = vmpy([[V4]].qf32,[[V6]].qf32)
+label0:
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vadd instruction
+define <32 x float> @mul_add_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vadd(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].sf = [[V4]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = [[V5]].sf
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = [[V6]].sf
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V7]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fadd <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when only first input is from vsub instruction
+define <32 x float> @mul_sub_1(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_1:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v1.sf
+; CHECK-DAG: [[V5:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = [[V5]].sf
+; CHECK: qf32 = vmpy([[V4]].qf32,[[V6]].qf32)
+label0:
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %a1, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when both inputs are from vsub instruction
+define <32 x float> @mul_sub_3(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_sub_3:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = vsub(v0.sf,v2.sf)
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = vsub(v0.sf,v1.sf)
+; CHECK-DAG: [[V5:v[0-9]+]].sf = [[V3]].qf32
+; CHECK-DAG: [[V6:v[0-9]+]].sf = [[V4]].qf32
+; CHECK-DAG: [[V7:v[0-9]+]].qf32 = [[V5]].sf
+; CHECK-DAG: [[V8:v[0-9]+]].qf32 = [[V6]].sf
+; CHECK: qf32 = vmpy([[V8]].qf32,[[V7]].qf32)
+label0:
+ %v0 = fsub <32 x float> %a0, %a1
+ %v1 = fsub <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Test qf32 = vmpy(qf32 ,qf32) when inputs are from vadd and vmul respectively
+; The inputs to both multiplications are converted to IEEE and normalized.
+define <32 x float> @mul_add_mul(<32 x float> %a0, <32 x float> %a1, <32 x float> %a2) #0 {
+; CHECK-LABEL: mul_add_mul:
+; CHECK-DAG: [[V3:v[0-9]+]].qf32 = v2.sf
+; CHECK-DAG: [[V4:v[0-9]+]].qf32 = v0.sf
+; CHECK-DAG: [[V5:v[0-9]+]].qf32 = vadd(v0.sf,v1.sf)
+; CHECK-DAG: [[V6:v[0-9]+]].qf32 = vmpy([[V4]].qf32,[[V3]].qf32)
+; CHECK-DAG: [[V7:v[0-9]+]].sf = [[V5]].qf32
+; CHECK-DAG: [[V8:v[0-9]+]].sf = [[V6]].qf32
+; CHECK-DAG: [[V9:v[0-9]+]].qf32 = [[V7]].sf
+; CHECK-DAG: [[V10:v[0-9]+]].qf32 = [[V8]].sf
+; CHECK: qf32 = vmpy([[V9]].qf32,[[V10]].qf32)
+label0:
+ %v0 = fadd <32 x float> %a0, %a1
+ %v1 = fmul <32 x float> %a0, %a2
+ %v3 = fmul <32 x float> %v0, %v1
+ ret <32 x float> %v3
+}
+
+; Tests when input to vmul is a mul qf32 and a sf type, and the output stored to memory
+define i32 @mul_intrinsic(ptr %output) {
+; CHECK-LABEL: mul_intrinsic:
+; CHECK: [[V1:v[0-9]+]].qf32 = vmpy(v0.qf32,v0.qf32)
+; CHECK: [[V2:v[0-9]+]].sf = [[V1]].qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = [[V2]].sf
+; CHECK: [[V4:v[0-9]+]].qf32 = vmpy([[V3]].qf32,v0.qf32)
+; CHECK: [[V5:v[0-9]+]].sf = [[V4]].qf32
+; CHECK: vmemu{{.*}} = [[V5]]
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32> %0, <32 x i32> zeroinitializer)
+ store <32 x i32> %1, ptr %output, align 4
+ ret i32 0
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vmpy.qf32.128B(<32 x i32>, <32 x i32>) #0
+uselistorder ptr @llvm.hexagon.V6.vmpy.qf32.128B, { 1, 0 }
+
+attributes #0 = { nofree nosync nounwind "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+hvx-qfloat,-long-calls" "unsafe-fp-math"="true" }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-vsub.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-vsub.ll
new file mode 100644
index 0000000000000..d18ed85713272
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-v81/xqf-v81-vsub.ll
@@ -0,0 +1,164 @@
+; For v81, tests if correct vsub instructions are generated under different conditions
+
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=ieee < %s | FileCheck %s
+
+; The convert instruction before vsub should be removed and vsub opcode changed to take in qf32 as op2
+define dso_local <32 x i32> @sub1_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub1_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vsub([[V1]].sf,[[V3]].qf32)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vsub can be removed and vsub opcode to be changed to take in qf32 type as op1.
+define dso_local <32 x i32> @sub2_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub2_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vsub([[V3]].qf32,[[V1]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instruction before vsub should be removed and vsub opcode changed to take in qf16 as op2
+define dso_local <32 x i32> @sub1_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub1_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vsub([[V1]].hf,[[V3]].qf16)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vsub can be removed and vsub opcode to be changed to take in qf16 type as op1.
+define dso_local <32 x i32> @sub2_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub2_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vsub([[V3]].qf16,[[V1]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vadd can be removed.
+define dso_local <32 x i32> @add1_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add1_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vadd([[V3]].qf32,[[V1]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vadd can be removed and ops to last vadd can be interchanged
+define dso_local <32 x i32> @add2_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add2_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vadd([[V3]].qf32,[[V1]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vadd can be removed.
+define dso_local <32 x i32> @add1_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add1_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vadd([[V3]].qf16,[[V1]].hf)
+entry: %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2) %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2) %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4 }
+
+; The convert instr before vadd can be removed and ops to last vadd can be interchanged
+define dso_local <32 x i32> @add2_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add2_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vadd([[V3]].qf16,[[V1]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vmul can be removed and ops to last vadd can be interchanged
+define dso_local <32 x i32> @mpy2_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: mpy2_qf32
+; CHECK: qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vmpy(v{{[0-9]+}}.qf32,v{{[0-9]+}}.qf32)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf32.sf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vmul can be removed.
+define dso_local <32 x i32> @mpy1_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: mpy1_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf32 = vmpy([[V3]].qf16,[[V1]].hf)
+entry: %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2) %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2) %4 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf16.hf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4 }
+
+; The convert instr before vmul can be removed and ops to last vadd can be interchanged
+define dso_local <32 x i32> @mpy2_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: mpy2_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf32 = vmpy([[V3]].qf16,[[V1]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vmpy.qf16.hf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+
+
+declare <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32>, <32 x i32>) #1
+
+attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-long-calls,-small-data" }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-vsub.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-vsub.ll
new file mode 100644
index 0000000000000..c10ff15791b23
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-vsub.ll
@@ -0,0 +1,130 @@
+; Tests if correct vsub instructions are generated under different conditions
+
+; RUN: llc -mtriple=hexagon-unknown-elf -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -enable-rem-conv=true -hexagon-qfloat-mode=ieee < %s | FileCheck %s
+
+; The convert instruction before vsub should remain as it is.
+define dso_local <32 x i32> @sub1_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub1_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: [[V4:v[0-9]+]].sf = [[V3]].qf32
+; CHECK: qf32 = vsub([[V1]].sf,[[V4]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vsub can be removed and vsub opcode to be changed to take in qf32 type as op1.
+define dso_local <32 x i32> @sub2_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub2_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vsub([[V3]].qf32,[[V1]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instruction before vsub should remain as it is.
+define dso_local <32 x i32> @sub1_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub1_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: [[V4:v[0-9]+]].hf = [[V3]].qf16
+; CHECK: qf16 = vsub([[V1]].hf,[[V4]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vsub can be removed and vsub opcode to be changed to take in qf16 type as op1.
+define dso_local <32 x i32> @sub2_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: sub2_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vsub([[V3]].qf16,[[V1]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vadd can be removed.
+define dso_local <32 x i32> @add1_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add1_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vadd([[V3]].qf32,[[V1]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vsub can be removed and ops to last vadd can be interchanged
+define dso_local <32 x i32> @add2_qf32(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add2_qf32
+; CHECK: [[V3:v[0-9]+]].qf32 = vadd([[V1:v[0-9]+]].sf
+; CHECK: qf32 = vadd([[V3]].qf32,[[V1]].sf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vadd can be removed.
+define dso_local <32 x i32> @add1_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add1_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vadd([[V3]].qf16,[[V1]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %3, <32 x i32> %0)
+ ret <32 x i32> %4
+}
+
+; The convert instr before vsub can be removed and ops to last vadd can be interchanged
+define dso_local <32 x i32> @add2_qf16(i32 noundef %input1, i32 noundef %input2, i32 noundef %size) local_unnamed_addr #0 {
+; CHECK-LABEL: add2_qf16
+; CHECK: [[V3:v[0-9]+]].qf16 = vadd([[V1:v[0-9]+]].hf
+; CHECK: qf16 = vadd([[V3]].qf16,[[V1]].hf)
+entry:
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input1)
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 %input2)
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %1)
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32> %2)
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32> %0, <32 x i32> %3)
+ ret <32 x i32> %4
+}
+
+declare <32 x i32> @llvm.hexagon.V6.vsub.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vconv.hf.qf16.128B(<32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.sf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.hf.128B(<32 x i32>, <32 x i32>) #1
+
+attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvxv79,+v79,-long-calls,-small-data" }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/xqf-warnings.ll b/llvm/test/CodeGen/Hexagon/autohvx/xqf-warnings.ll
new file mode 100644
index 0000000000000..f4a2a02ffacb5
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/xqf-warnings.ll
@@ -0,0 +1,143 @@
+; XFAIL: hexagon-registered-target
+; Tests for emitted warnings when IEEE type is used as qf and vice-versa
+; Test program source code with line number:
+
+;1 #include <hexagon_types.h>
+;2 HEXAGON_Vect2048 foo(HEXAGON_Vect1024 vina, HEXAGON_Vect1024 vinb) {
+;3 const HEXAGON_Vect1024 ishf1 = __builtin_HEXAGON_V6_lvsplath_128B(0x3C00);
+;4 const HEXAGON_Vect1024 ishf2 = __builtin_HEXAGON_V6_lvsplath_128B(0xBC00);
+;5 const HEXAGON_Vect1024 issf1 = __builtin_HEXAGON_V6_lvsplatw_128B(0xAC00);
+;6 const HEXAGON_Vect1024 issf2 = __builtin_HEXAGON_V6_lvsplatw_128B(0xDC00);
+;7 const HEXAGON_Vect1024 isqf16 = __builtin_HEXAGON_V6_vadd_qf16_mix_128B(ishf1, ishf2);
+;8 const HEXAGON_Vect1024 isqf32 = __builtin_HEXAGON_V6_vadd_qf32_mix_128B(issf1, issf2);
+;9 HEXAGON_Vect2048 isqf32_1 = __builtin_HEXAGON_V6_vmpy_qf32_qf16_128B(vina,isqf16);
+;10 HEXAGON_Vect2048 isqf32_2 = __builtin_HEXAGON_V6_vmpy_qf32_hf_128B(vinb,isqf16);
+;11 HEXAGON_Vect1024 add1 = __builtin_HEXAGON_V6_vsub_qf32_mix_128B(__builtin_HEXAGON_V6_hi_128B(isqf32_1),isqf32);
+;12 HEXAGON_Vect1024 add2 = __builtin_HEXAGON_V6_vsub_qf32_128B(__builtin_HEXAGON_V6_hi_128B(isqf32_2),isqf32);
+;13 return __builtin_HEXAGON_V6_vcombine_128B(add2,add1);
+;14 }
+
+; RUN: llc --mtriple=hexagon-- -mhvx -mcpu=hexagonv79 -mattr=+hvxv79,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -verify-machineinstrs \
+; RUN: -hexagon-qfloat-mode=ieee 2>&1 < %s -o /dev/null | FileCheck %s
+; RUN: llc --mtriple=hexagon-- -mhvx -mcpu=hexagonv81 -mattr=+hvxv81,+hvx-length128b,+hvx-qfloat -enable-xqf-gen=true \
+; RUN: -verify-machineinstrs \
+; RUN: -hexagon-qfloat-mode=ieee 2>&1 < %s -o /dev/null | FileCheck %s
+
+define dso_local inreg <64 x i32> @foo(<32 x i32> noundef %vina, <32 x i32> noundef %vinb) local_unnamed_addr #0 !dbg !8 {
+; CHECK-NOT: warning: test.c:3:
+; CHECK-NOT: warning: test.c:4:
+; CHECK-NOT: warning: test.c:5:
+; CHECK-NOT: warning: test.c:6:
+; CHECK: warning: test.c:7:35: in function foo <64 x i32> (<32 x i32>, <32 x i32>): hf type used as qf16 at operand 1
+; CHECK: warning: test.c:8:35: in function foo <64 x i32> (<32 x i32>, <32 x i32>): sf type used as qf32 at operand 1
+; CHECK: warning: test.c:9:31: in function foo <64 x i32> (<32 x i32>, <32 x i32>): hf type used as qf16 at operand 1
+; CHECK: warning: test.c:10:31: in function foo <64 x i32> (<32 x i32>, <32 x i32>): qf16 type used as hf at operand 2
+; CHECK: warning: test.c:11:67: in function foo <64 x i32> (<32 x i32>, <32 x i32>): qf32 type used as sf at operand 1
+; CHECK: warning: test.c:11:27: in function foo <64 x i32> (<32 x i32>, <32 x i32>): sf type used as qf32 at operand 1
+; CHECK: warning: test.c:11:27: in function foo <64 x i32> (<32 x i32>, <32 x i32>): qf32 type used as sf at operand 2
+; CHECK: warning: test.c:12:63: in function foo <64 x i32> (<32 x i32>, <32 x i32>): qf32 type used as sf at operand 1
+; CHECK: warning: test.c:12:27: in function foo <64 x i32> (<32 x i32>, <32 x i32>): sf type used as qf32 at operand 1
+; CHECK-NOT: warning: test.c:12: {{.*}}: qf32 type used as sf at operand 2
+; CHECK-NOT: warning: test.c:12: {{.*}}: sf type used as qf32 at operand 2
+; CHECK: warning: test.c:13:10: in function foo <64 x i32> (<32 x i32>, <32 x i32>): qf32 type used as sf at operand 1
+; CHECK: warning: test.c:13:10: in function foo <64 x i32> (<32 x i32>, <32 x i32>): qf32 type used as sf at operand 2
+entry:
+ call void @llvm.dbg.value(metadata <32 x i32> %vina, metadata !22, metadata !DIExpression()), !dbg !35
+ call void @llvm.dbg.value(metadata <32 x i32> %vinb, metadata !23, metadata !DIExpression()), !dbg !35
+ %0 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 15360), !dbg !36
+ call void @llvm.dbg.value(metadata <32 x i32> %0, metadata !24, metadata !DIExpression()), !dbg !35
+ %1 = tail call <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32 48128), !dbg !37
+ call void @llvm.dbg.value(metadata <32 x i32> %1, metadata !26, metadata !DIExpression()), !dbg !35
+ %2 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 44032), !dbg !38
+ call void @llvm.dbg.value(metadata <32 x i32> %2, metadata !27, metadata !DIExpression()), !dbg !35
+ %3 = tail call <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32 56320), !dbg !39
+ call void @llvm.dbg.value(metadata <32 x i32> %3, metadata !28, metadata !DIExpression()), !dbg !35
+ %4 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32> %0, <32 x i32> %1), !dbg !40
+ call void @llvm.dbg.value(metadata <32 x i32> %4, metadata !29, metadata !DIExpression()), !dbg !35
+ %5 = tail call <32 x i32> @llvm.hexagon.V6.vadd.qf32.mix.128B(<32 x i32> %2, <32 x i32> %3), !dbg !41
+ call void @llvm.dbg.value(metadata <32 x i32> %5, metadata !30, metadata !DIExpression()), !dbg !35
+ %6 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.qf16.128B(<32 x i32> %vina, <32 x i32> %4), !dbg !42
+ call void @llvm.dbg.value(metadata <64 x i32> %6, metadata !31, metadata !DIExpression()), !dbg !35
+ %7 = tail call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32> %vinb, <32 x i32> %4), !dbg !43
+ call void @llvm.dbg.value(metadata <64 x i32> %7, metadata !32, metadata !DIExpression()), !dbg !35
+ %8 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %6), !dbg !44
+ %9 = tail call <32 x i32> @llvm.hexagon.V6.vsub.qf32.mix.128B(<32 x i32> %8, <32 x i32> %5), !dbg !45
+ call void @llvm.dbg.value(metadata <32 x i32> %9, metadata !33, metadata !DIExpression()), !dbg !35
+ %10 = tail call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %7), !dbg !46
+ %11 = tail call <32 x i32> @llvm.hexagon.V6.vsub.qf32.128B(<32 x i32> %10, <32 x i32> %5), !dbg !47
+ call void @llvm.dbg.value(metadata <32 x i32> %11, metadata !34, metadata !DIExpression()), !dbg !35
+ %12 = tail call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %11, <32 x i32> %9), !dbg !48
+ ret <64 x i32> %12, !dbg !49
+}
+
+declare <32 x i32> @llvm.hexagon.V6.lvsplath.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.lvsplatw.128B(i32) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf16.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vadd.qf32.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vmpy.qf32.qf16.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.qf32.mix.128B(<32 x i32>, <32 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #1
+declare <32 x i32> @llvm.hexagon.V6.vsub.qf32.128B(<32 x i32>, <32 x i32>) #1
+declare <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32>, <32 x i32>) #1
+declare void @llvm.dbg.value(metadata, metadata, metadata) #2
+
+attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(none) "approx-func-fp-math"="true" "frame-pointer"="all" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="hexagonv79" "target-features"="+hvx-length128b,+hvx-qfloat,+hvxv79,+v79,-long-calls" "unsafe-fp-math"="true" }
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }
+attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6}
+!llvm.ident = !{!7}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "QuIC LLVM Hexagon Clang version 8.8-alpha2 Engineering Release: hexagon-clang-88-5172", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
+!1 = !DIFile(filename: "test.c", directory: "/local/mnt/workspace/santdas/src/8_8/build")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 7, !"frame-pointer", i32 2}
+!6 = !{i32 7, !"debug-info-assignment-tracking", i1 true}
+!7 = !{!"QuIC LLVM Hexagon Clang version 8.8-alpha2 Engineering Release: hexagon-clang-88-5172"}
+!8 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 2, type: !9, scopeLine: 2, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !21)
+!9 = !DISubroutineType(types: !10)
+!10 = !{!11, !17, !17}
+!11 = !DIDerivedType(tag: DW_TAG_typedef, name: "HEXAGON_Vect2048", file: !12, line: 1223, baseType: !13, align: 2048)
+!12 = !DIFile(filename: "./install/Tools/bin/../target/hexagon/include/hexagon_types.h", directory: "/local/mnt/workspace/santdas/src/8_8/build")
+!13 = !DICompositeType(tag: DW_TAG_array_type, baseType: !14, size: 2048, flags: DIFlagVector, elements: !15)
+!14 = !DIBasicType(name: "long", size: 32, encoding: DW_ATE_signed)
+!15 = !{!16}
+!16 = !DISubrange(count: 64)
+!17 = !DIDerivedType(tag: DW_TAG_typedef, name: "HEXAGON_Vect1024", file: !12, line: 1220, baseType: !18, align: 1024)
+!18 = !DICompositeType(tag: DW_TAG_array_type, baseType: !14, size: 1024, flags: DIFlagVector, elements: !19)
+!19 = !{!20}
+!20 = !DISubrange(count: 32)
+!21 = !{!22, !23, !24, !26, !27, !28, !29, !30, !31, !32, !33, !34}
+!22 = !DILocalVariable(name: "vina", arg: 1, scope: !8, file: !1, line: 2, type: !17)
+!23 = !DILocalVariable(name: "vinb", arg: 2, scope: !8, file: !1, line: 2, type: !17)
+!24 = !DILocalVariable(name: "ishf1", scope: !8, file: !1, line: 3, type: !25)
+!25 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !17)
+!26 = !DILocalVariable(name: "ishf2", scope: !8, file: !1, line: 4, type: !25)
+!27 = !DILocalVariable(name: "issf1", scope: !8, file: !1, line: 5, type: !25)
+!28 = !DILocalVariable(name: "issf2", scope: !8, file: !1, line: 6, type: !25)
+!29 = !DILocalVariable(name: "isqf16", scope: !8, file: !1, line: 7, type: !25)
+!30 = !DILocalVariable(name: "isqf32", scope: !8, file: !1, line: 8, type: !25)
+!31 = !DILocalVariable(name: "isqf32_1", scope: !8, file: !1, line: 9, type: !11)
+!32 = !DILocalVariable(name: "isqf32_2", scope: !8, file: !1, line: 10, type: !11)
+!33 = !DILocalVariable(name: "add1", scope: !8, file: !1, line: 11, type: !17)
+!34 = !DILocalVariable(name: "add2", scope: !8, file: !1, line: 12, type: !17)
+!35 = !DILocation(line: 0, scope: !8)
+!36 = !DILocation(line: 3, column: 34, scope: !8)
+!37 = !DILocation(line: 4, column: 34, scope: !8)
+!38 = !DILocation(line: 5, column: 34, scope: !8)
+!39 = !DILocation(line: 6, column: 34, scope: !8)
+!40 = !DILocation(line: 7, column: 35, scope: !8)
+!41 = !DILocation(line: 8, column: 35, scope: !8)
+!42 = !DILocation(line: 9, column: 31, scope: !8)
+!43 = !DILocation(line: 10, column: 31, scope: !8)
+!44 = !DILocation(line: 11, column: 67, scope: !8)
+!45 = !DILocation(line: 11, column: 27, scope: !8)
+!46 = !DILocation(line: 12, column: 63, scope: !8)
+!47 = !DILocation(line: 12, column: 27, scope: !8)
+!48 = !DILocation(line: 13, column: 10, scope: !8)
+!49 = !DILocation(line: 13, column: 3, scope: !8)
More information about the cfe-commits
mailing list