[PATCH] D113107: Support of expression granularity for _Float16.

Tue Nov 16 19:37:36 PST 2021

pengfei added inline comments.

================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:1315
+  if ((SrcType->isHalfType() || iSFloat16Allowed) &&
+      !CGF.getContext().getLangOpts().NativeHalfType) {
     // Cast to FP using the intrinsic if the half type itself isn't supported.
----------------
zahiraam wrote:
> zahiraam wrote:
> > pengfei wrote:
> > > rjmccall wrote:
> > > > pengfei wrote:
> > > > > rjmccall wrote:
> > > > > > pengfei wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > Okay, this condition is pretty ridiculous to be repeating in three different places across the compiler.  Especially since you're going to change it when you implement the new option, right?
> > > > > > > > 
> > > > > > > > Can we state this condition more generally?  I'm not sure why this is so narrowly restricted, and the variable name isn't telling me anything, since `_Float16` must by definition be "allowed" if we have an expression of `_Float16` type.
> > > > > > > > since _Float16 must by definition be "allowed" if we have an expression of _Float16 type.
> > > > > > > 
> > > > > > > _Float16 is allowed only on a few targets. https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
> > > > > > > By the way, we should update for X86 since it's not limited to avx512fp16 now.
> > > > > > > _Float16 is allowed only on a few targets.
> > > > > > 
> > > > > > Yes, I know that.  But if `SrcType->isFloat16Type()` is true, we must be on one of those targets, because the type doesn't otherwise exist.
> > > > > I see your point now. The problem here is we want to allow the `_Float16` to be used more broadly. But the target doesn't really support it sometime. Currently full arithmatic operations are supported only on target with AVX512FP16.
> > > > > We should cast for those targets without AVX512FP16 while avoid to do on AVX512FP16.
> > > > I agree that many targets don't natively support arithmetic on this format, but x86 is not the first one that does.  Unless I'm misunderstanding, we already track this property in Clang's TargetInfo as `hasLegalHalfType()`.  `+avx512fp16` presumably ought to set this.
> > > > 
> > > > I'm not sure what the interaction with the `NativeHalfType` LangOpt is supposed to be here.  My understanding is that that option is just supposed to affect `__fp16`, basically turning it into a proper arithmetic type, i.e. essentially `_Float16`.  Whatever effect you want to apply to `_Float16` should presumably happen even if that option not set.
> > > > 
> > > > More broadly, I don't think your approach in this patch is correct.  The type of operations on `_Float16` should not change based on whether the target natively supports `_Float16`.  If we need to emulate those operations on targets that don't provide them natively, we should do that at a lower level than the type system.
> > > > 
> > > > The most appropriate place to do that is going to depend on the exact semantics we want.
> > > > 
> > > > If we want to preserve `half` semantics exactly regardless of target, we should have Clang's IR generation actually emit `half` operations.  Targets that don't support those operations natively will have to lower at least some of those operations into compiler-rt calls, but that's not at all unprecedented.
> > > > 
> > > > If we're okay with playing loose for performance reasons, we can promote to `float` immediately around individual arithmetic operations.  IR generation is probably the most appropriate place to do that.  But I'm quite concerned about that making `_Float16` feel like an unpredictable/unportable type; it seems to me that software emulation is much better.
> > > > 
> > > > If you're proposing the latter, I think you need to raise that more widely than a code review; please make a post on llvm-dev.
> > > > we already track this property in Clang's TargetInfo as `hasLegalHalfType()`
> > > 
> > > That sounds a good approch. Thank you.
> > > 
> > > > The type of operations on `_Float16` should not change based on whether the target natively supports `_Float16`. If we need to emulate those operations on targets that don't provide them natively, we should do that at a lower level than the type system.
> > > 
> > > Unfortunately, we can't do it at low level. The reason is (I'm not expert in frontend, just recalled from last disscussion with GCC folks) we have to do expresssion emulation to respect C/C++ semantics. GCC has option `-fexcess-precision=16` to match the same result with native instructions, but the default is `-fexcess-precision=fast` according to language semantics.
> > > 
> > > > The most appropriate place to do that is going to depend on the exact semantics we want...
> > > 
> > > Note, we are not simply doing emulation in the frontend. It's backend's responsibility to emulate a single `half` operation. But it's frontend's responsibility to choose whether to emit several `half` operations or emit promote + several `float` operations + truncate. As described in the title, this patch is doing for the latter.
> > > Okay, this condition is pretty ridiculous to be repeating in three different places across the compiler.  Especially since you're going to change it when you implement the new option, right?
> > > 
> > > Can we state this condition more generally?  I'm not sure why this is so narrowly restricted, and the variable name isn't telling me anything, since `_Float16` must by definition be "allowed" if we have an expression of `_Float16` type.
> > 
> > I agree  :)  I wasn't aware of this flag. I am still in the process of figuring out how float16 work and what is exactly required for it.
> > Yes may be having 2 separate patches is the right way to go. One that turns Float16 on for x86 without the use of +avx512fp16 feature and another one that does the emulation ( half to float -> float arithmetic-> truncation to half). @pengfei ?
> > > we already track this property in Clang's TargetInfo as `hasLegalHalfType()`
> > 
> > That sounds a good approch. Thank you.
> > 
> > > The type of operations on `_Float16` should not change based on whether the target natively supports `_Float16`. If we need to emulate those operations on targets that don't provide them natively, we should do that at a lower level than the type system.
> > 
> > Unfortunately, we can't do it at low level. The reason is (I'm not expert in frontend, just recalled from last disscussion with GCC folks) we have to do expresssion emulation to respect C/C++ semantics. GCC has option `-fexcess-precision=16` to match the same result with native instructions, but the default is `-fexcess-precision=fast` according to language semantics.
> > 
> > > The most appropriate place to do that is going to depend on the exact semantics we want...
> > 
> > Note, we are not simply doing emulation in the frontend. It's backend's responsibility to emulate a single `half` operation. But it's frontend's responsibility to choose whether to emit several `half` operations or emit promote + several `float` operations + truncate. As described in the title, this patch is doing for the latter.
> 
> The _Float16 type is supported for both C and C++, on x86 systems with SSE2 enabled. 
> From https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html:
> "On x86 targets with SSE2 enabled, **without -mavx512fp16, all operations will be emulated by software emulation** and the float instructions. The default behavior for FLT_EVAL_METHOD is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 instructions. Using -fexcess-precision=16 will force round back after each operation.
> 
> Using -mavx512fp16 will generate AVX512-FP16 instructions instead of software emulation. The default behavior of FLT_EVAL_METHOD is to round after each operation. The same is true with -fexcess-precision=standard and -mfpmath=sse. If there is no -mfpmath=sse, -fexcess-precision=standard alone does the same thing as before, It is useful for code that does not have _Float16 and runs on the x87 FPU."
> This is what we are trying to reach out with this patch.
> 
I planed to enable Float16 and emulation in one patch, i.e., D107082. It's almost backend work. The only thing I can't do in that patch is emulating in C/C++ expression granularity, which has to leverage FE facilities. That's all what I expected for this patch.
I'm ashamed I haven't finished it yet and leaving confusion here :(

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107