[PATCH] D113107: Support of expression granularity for _Float16.

Tue Nov 16 01:17:00 PST 2021

pengfei added inline comments.

================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:1315
+  if ((SrcType->isHalfType() || iSFloat16Allowed) &&
+      !CGF.getContext().getLangOpts().NativeHalfType) {
     // Cast to FP using the intrinsic if the half type itself isn't supported.
----------------
rjmccall wrote:
> pengfei wrote:
> > rjmccall wrote:
> > > pengfei wrote:
> > > > rjmccall wrote:
> > > > > Okay, this condition is pretty ridiculous to be repeating in three different places across the compiler.  Especially since you're going to change it when you implement the new option, right?
> > > > > 
> > > > > Can we state this condition more generally?  I'm not sure why this is so narrowly restricted, and the variable name isn't telling me anything, since `_Float16` must by definition be "allowed" if we have an expression of `_Float16` type.
> > > > > since _Float16 must by definition be "allowed" if we have an expression of _Float16 type.
> > > > 
> > > > _Float16 is allowed only on a few targets. https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
> > > > By the way, we should update for X86 since it's not limited to avx512fp16 now.
> > > > _Float16 is allowed only on a few targets.
> > > 
> > > Yes, I know that.  But if `SrcType->isFloat16Type()` is true, we must be on one of those targets, because the type doesn't otherwise exist.
> > I see your point now. The problem here is we want to allow the `_Float16` to be used more broadly. But the target doesn't really support it sometime. Currently full arithmatic operations are supported only on target with AVX512FP16.
> > We should cast for those targets without AVX512FP16 while avoid to do on AVX512FP16.
> I agree that many targets don't natively support arithmetic on this format, but x86 is not the first one that does.  Unless I'm misunderstanding, we already track this property in Clang's TargetInfo as `hasLegalHalfType()`.  `+avx512fp16` presumably ought to set this.
> 
> I'm not sure what the interaction with the `NativeHalfType` LangOpt is supposed to be here.  My understanding is that that option is just supposed to affect `__fp16`, basically turning it into a proper arithmetic type, i.e. essentially `_Float16`.  Whatever effect you want to apply to `_Float16` should presumably happen even if that option not set.
> 
> More broadly, I don't think your approach in this patch is correct.  The type of operations on `_Float16` should not change based on whether the target natively supports `_Float16`.  If we need to emulate those operations on targets that don't provide them natively, we should do that at a lower level than the type system.
> 
> The most appropriate place to do that is going to depend on the exact semantics we want.
> 
> If we want to preserve `half` semantics exactly regardless of target, we should have Clang's IR generation actually emit `half` operations.  Targets that don't support those operations natively will have to lower at least some of those operations into compiler-rt calls, but that's not at all unprecedented.
> 
> If we're okay with playing loose for performance reasons, we can promote to `float` immediately around individual arithmetic operations.  IR generation is probably the most appropriate place to do that.  But I'm quite concerned about that making `_Float16` feel like an unpredictable/unportable type; it seems to me that software emulation is much better.
> 
> If you're proposing the latter, I think you need to raise that more widely than a code review; please make a post on llvm-dev.
> we already track this property in Clang's TargetInfo as `hasLegalHalfType()`

That sounds a good approch. Thank you.

> The type of operations on `_Float16` should not change based on whether the target natively supports `_Float16`. If we need to emulate those operations on targets that don't provide them natively, we should do that at a lower level than the type system.

Unfortunately, we can't do it at low level. The reason is (I'm not expert in frontend, just recalled from last disscussion with GCC folks) we have to do expresssion emulation to respect C/C++ semantics. GCC has option `-fexcess-precision=16` to match the same result with native instructions, but the default is `-fexcess-precision=fast` according to language semantics.

> The most appropriate place to do that is going to depend on the exact semantics we want...

Note, we are not simply doing emulation in the frontend. It's backend's responsibility to emulate a single `half` operation. But it's frontend's responsibility to choose whether to emit several `half` operations or emit promote + several `float` operations + truncate. As described in the title, this patch is doing for the latter.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107