[PATCH] D54696: Implement fptoui.sat and fptosi.sat intrinsics

Mon Nov 19 03:59:32 PST 2018

nikic created this revision.
Herald added subscribers: llvm-commits, kristof.beyls, javed.absar.

Note: This change is not intended to be reviewed in this form. This is the full state of my saturating float-to-int implementation. It needs to be split up into smaller parts for review (see end).

This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison.

The intrinsics have overloaded source and result type and support vector operands:

  i32 @llvm.fptoui.sat.f32.i32(float %f)
  i100 @llvm.fptoui.sat.f64.i100(double %f)
  <4 x i32> @llvm.fptoui.sat.v4f16.v4i32(half %f)
  // etc

On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is a value type operand specifying the saturation width. The idea here is that initially the second operand and the result type are the same, but they may change during type legalization. For example:

  i19 @llvm.fptsi.sat.f32.i19(float %f)
  // builds
  i19 fp_to_sint_sat f, VT:i19
  // type legalizes
  i32 fp_to_sint_sat f, VT:i19

I went for this approach, because saturated conversion does no compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and appropriate fmin/fmax are legal, we can expand to something like:

  f = fmax f, FP(MIN)
  f = fmin f, FP(MAX)
  i = fptoxi f
  i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something like this instead:

  i = fptoxi f
  i = select f ult FP(MIN), MIN, i
  i = select f ogt FP(MAX), MAX, i
  i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi. Targets where this is not the case should not use it.

Target-specific code is also implemented for AArch64 and X86. On AArch64 the native fcvtz instructions already have saturation behavior, so we just use them if possible. If the saturation width does not line up, it may be necessary to use (slightly simplified versions of) the above expansions. For X86 we do basically as the generic lowering, but can save a check here and there based on INDVAL. Furthermore X86 has peculiar fmin/fmax implementations that have to be handled specially.

If software floats are used (or the float type is not supported, or the int type is too wide) we emit rtlib calls. There are 24 new libcalls:

  // f32, f64, f128, f80 to i32, i64, i128
  __fix[sdtx]f[sdt]i_sat
  __fixuns[sdtx]f[sdt]i_sat

Like the SelectionDAG node, each function accepts two arguments, the floating point number and the saturation width in bits.

I have not actually implemented those libcalls, and this is probably the part I dislike most about this endeavor -- this is an obscene amount of new builtins.

---

My further plan is to split up this change into smaller parts more amenable to review:

1. Add intrinsics and ISD opcodes and the fallback expansion. Also add scaffolding for all the legalizations (int, float, vector) but without implementations (they will just assert).
2. Add scalar type legalization not requiring libcalls. This is int result promotion and float operand promotion.
3. Add scalar type legalization requiring libcalls. This is int result expansion and float operand softening.
4. Add vector type legalization. This is vector result scalarization, vector result splitting, vector operand splitting, vector result widening, vector operand widening.
5. Add AArch64 custom lowering.
6. Add X86 custom lowering.

Repository:
  rL LLVM

https://reviews.llvm.org/D54696

Files:
  docs/LangRef.rst
  include/llvm/CodeGen/ISDOpcodes.h
  include/llvm/CodeGen/RuntimeLibcalls.h
  include/llvm/CodeGen/TargetLowering.h
  include/llvm/IR/Intrinsics.td
  include/llvm/IR/RuntimeLibcalls.def
  include/llvm/Target/TargetSelectionDAG.td
  lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
  lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
  lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  lib/CodeGen/SelectionDAG/LegalizeTypes.h
  lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
  lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  lib/CodeGen/SelectionDAG/SelectionDAG.cpp
  lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
  lib/CodeGen/SelectionDAG/TargetLowering.cpp
  lib/CodeGen/TargetLoweringBase.cpp
  lib/Target/AArch64/AArch64ISelLowering.cpp
  lib/Target/AArch64/AArch64ISelLowering.h
  lib/Target/X86/X86ISelLowering.cpp
  lib/Target/X86/X86ISelLowering.h
  test/CodeGen/AArch64/fptoi-sat-scalar.ll
  test/CodeGen/AArch64/fptoi-sat-vector.ll
  test/CodeGen/X86/fptoi-sat-scalar.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D54696.174580.patch
Type: text/x-patch
Size: 210823 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181119/eb1e6976/attachment-0001.bin>