[llvm] [AMDGPU] Implement IR expansion for frem instruction (PR #130988)
Frederik Harwath via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 31 09:13:44 PDT 2025
================
@@ -37,6 +48,352 @@ static cl::opt<unsigned>
cl::desc("fp convert instructions on integers with "
"more than <N> bits are expanded."));
+namespace {
+/// This class implements a precise expansion of the frem instruction.
+/// The generated code is based on the fmod implementation in the AMD device
+/// libs.
+class FRemExpander {
+ /// The IRBuilder to use for the expansion.
+ IRBuilder<> &B;
+
+ /// Floating point type of the return value and the arguments of the FRem
+ /// instructions that should be expanded.
+ Type *FremTy;
+
+ /// Floating point type to use for the computation. This may be
+ /// wider than the \p FremTy.
+ Type *ComputeFpTy;
+
+ /// Integer type that can hold floating point values of type \p FremTY.
+ Type *IntTy;
+
+ /// Integer type used to hold the exponents returned by frexp.
+ Type *ExTy;
+
+ /// How many bits of the quotient to compute per iteration of the
+ /// algorithm, stored as a value of type \p ExTy.
+ Value *Bits;
+
+ /// Constant 1 of type \p ExTy.
+ Value *One;
+
+ /// The sign bit for floating point values of type \p FremTy.
+ const unsigned long long Signbit;
+
+public:
+ static std::optional<FRemExpander> create(IRBuilder<> &B, Type *Ty) {
+ if (Ty->is16bitFPTy())
+ return FRemExpander{B, Ty, 11, 0x8000, B.getFloatTy(), B.getInt16Ty()};
----------------
frederik-h wrote:
For `float`, this is half the precision and it means that the main compute loop will iterate (at most) twice. For the 16 bit type, this says that we want to compute all bits in one iteration which is possible because we use the extended precision type for the computation. I changed this code to use the `fltSemantics`.
https://github.com/llvm/llvm-project/pull/130988
More information about the llvm-commits
mailing list