[PATCH] D126158: [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.

Mon May 30 00:03:26 PDT 2022

csigg added inline comments.

================
Comment at: mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp:158
+// by the same divisor.
+struct ExpandDivF16 : public ConvertOpToLLVMPattern<LLVM::FDivOp> {
+  using ConvertOpToLLVMPattern<LLVM::FDivOp>::ConvertOpToLLVMPattern;
----------------
herhut wrote:
> This pattern is a bit misplaced here, as `LLVM::FDivOp` is not really a GPU dialect operation. Instead, should this be a special lowering of the arith dialect to NVVM (which we do not have yet) or a rewrite at the LLVM dialect level?
> 
> When lowering to LLVM, we already typically configure a different lowering for math dialect, so configuring the lowering of arith dialect differently seems like an OK option. That would mean a specialized pattern for `arith.divf` with higher priority. That would also give users a choice.
Yes, I agree it's a bit misplaced. I considered it the best of all questionable options.

Adding it to ArithToLLVM doesn't really work, because we don't want it to depend on the NVVM dialect.

How about adding it as a separate pass to `mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126158/new/

https://reviews.llvm.org/D126158