[PATCH] D113107: Support of expression granularity for _Float16.

Mon Aug 22 19:15:30 PDT 2022

rjmccall added inline comments.

================
Comment at: clang/test/CodeGen/X86/Float16-arithmetic.c:207
+// CHECK-NEXT:    [[EXT:%.*]] = fpext half [[TMP0]] to float
+// CHECK-NEXT:    store float [[EXT]], ptr [[RETVAL]], align 2
+// CHECK-NEXT:    [[TMP1:%.*]] = load half, ptr [[RETVAL]], align 2
----------------
pengfei wrote:
> rjmccall wrote:
> > zahiraam wrote:
> > > zahiraam wrote:
> > > > pengfei wrote:
> > > > > Not sure if we need a fptrunc and store the half value. The following tests have the same problem.
> > > > I think that's what we want?
> > > > // CHECK-LABEL: @RealOp(
> > > > // CHECK-NEXT:  entry:
> > > > // CHECK-NEXT:    [[A_ADDR:%.*]] = alloca half, align 2
> > > > // CHECK-NEXT:    store half [[A:%.*]], ptr [[A_ADDR]], align 2
> > > > // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[A_ADDR]], align 2
> > > > // CHECK-NEXT:    [[EXT:%.*]] = fpext half [[TMP0]] to float
> > > > // CHECK-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[EXT]] to half
> > > > // CHECK-NEXT:    ret half [[UNPROMOTION]]
> > > > 
> > > > Do you agree? If this is correct, I will make the change the other operators.
> > > But I feel like we should be returning a float no?  In which case it will be more tricky (need to calculate the Address with the promoted elementype)? @rmjccall?
> > The function is declared as returning `_Float16`, not `float`.  This is therefore a question about when we're allowed to return a value in greater precision than the declared return type, which raises three sub-questions: one about ABI, one about language semantics, and one about our handling in the implementation.
> > 
> > The first is about ABI.  This mode is not supposed to be ABI-breaking, so whenever the ABI is in doubt, and the target makes a `_Float16` return incompatible with the ABI of a `float` return, we must use the former.  That means, at the very least, returning from a function with unknown call sites or calling a function with an unknown implementation.  We could potentially eliminate extra truncations here when we fully understand a call; for example, we could change the return type to `float` when the function is internal to a TU and not address-taken, or we could eliminate a trunc+extend pair after inlining.  It is fair to ask whether that's a good idea, however.
> > 
> > Anyway, concretely we're talking about two ABIs here:
> > - On x86_64, `_Float16` and `float` are not returned compatibly: they're both returned in `xmm0`, but the bit patterns are different, and the caller and callee must agree in order to preserve the value.
> > - On i386, `_Float16` and `float` *are* returned compatibly: they're both returned in `%st0`, promoted to the 80-bit format.
> > 
> > Let's assume for a second that we're interested in avoiding truncations in situations where the ABI doesn't limit us.  Then we have a question of language semantics, which is principally about this: does C's authorization of excess precision in intermediate results allows return values to propagate the excess precision?  The answer that appears to be yes, it does, per the explicit footnote at the end of the standard's description of the `return` statement:
> > 
> > > The return statement is not an assignment. The overlap restriction of 6.5.16.1 does not apply to the case of function return. The representation of floating-point values can have wider range or precision than implied by the type; a cast can be used to remove this extra range and precision.
> > 
> > Okay.  So the third question is about implementation: how should we take advantage of this flexibility, assuming we actually want to?  A major part of the reason we're doing explicit promoted emission in the frontend in this patch is that only the frontend has the required knowledge of when to force truncation; and a big part of *that* is that explicit casts and assignments both force truncation, and neither has a persistent semantic representation in LLVM IR.  We cannot distinguish between a truncation that was only done to satisfy the ABI and a truncation that was required by the language semantics.  Once we have a value as an `fp16` in IR, it's entirely possible that it underwent an operation that required excess precision to be discarded, one which no longer has any representation in IR.  I think the only reasonable way to avoid this would be to allow functions to directly return `float` (and receive `float`s as returns) with some sort of annotation that it actually has to be returned as an `fp16` to satisfy the ABI.  And within Clang, we would handle that by making the call/return emission interact with the promoted emitters we're adding in this patch.
> > 
> > We can put off all of that until later, though.  For now, we should just continue to return `_Float16`.
> > Do you agree? If this is correct, I will make the change the other operators.
> 
> I think it is correct, though it is a bit silly to do fpext + fptrunc for a simple move.
> 
> > On i386, `_Float16` and `float` *are* returned compatibly
> 
> They are not. According to i386 [[ https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI | psABI ]] both `_Float16` and `_Complex _Float16` are passed/returned from XMM0, while `float`/`double` etc. are passed/returned from ST0.
> 
> Other target may have similar problem, e.g., passing/returning `float` from FPR but `_Float16` from GPR.
> 
> IIUC, the current way, i.e., always promote + unpromote each expression, is just for easy implementation. I expected we can distinguish unary and binary operations from multi operations. Only multi operations needs promote + unpromote.
> They are not. According to i386 psABI both _Float16 and _Complex _Float16 are passed/returned from XMM0, while float/double etc. are passed/returned from ST0.

Ah, I see that now in the register-use table.  On the other hand, the section on returning values says "[f]unctions that return scalar floating-point values in registers return them on the top of the x87 register stack, that is, %st0" without making exceptions.  The table should presumably take precedence, but still, someone really needs to update that document properly.

Anyway.  So that puts i386 on the same level as x86_64 in terms of having ABI constraints which do not allow these functions to simply return a `float` without an ABI break, and thus cannot avoid truncations for return values without IPO.

> Other target may have similar problem, e.g., passing/returning float from FPR but _Float16 from GPR.

Right, an ABI mismatch should be the default assumption for all targets.  The i386 thing where some types are promoted for returns is a terrible legacy that everybody else is well warned off of, and as you point out, even there we don't have to worry about it for `_Float16`.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107