[PATCH] D113107: Support of expression granularity for _Float16.

Wed Jun 22 07:38:10 PDT 2022

zahiraam added inline comments.

================
Comment at: clang/lib/CodeGen/CGExprComplex.cpp:896
+
+ComplexPairTy ComplexExprEmitter::EmitPromoted(const Expr *E) {
+  if (auto *BinOp = dyn_cast<BinaryOperator>(E->IgnoreParens())) {
----------------
pengfei wrote:
> zahiraam wrote:
> > pengfei wrote:
> > > rjmccall wrote:
> > > > pengfei wrote:
> > > > > rjmccall wrote:
> > > > > > zahiraam wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > `EmitPromoted` should take the promotion type.
> > > > > > > > 
> > > > > > > > You are missing the logic in this file which handles *unpromoted* emission (where you have a binary operator but the context wants an unpromoted value) by recognizing that you need to do a promoted operation and then truncate.
> > > > > > > Sorry but not sure what you mean here. A case where we don't want any promotion would be:
> > > > > > > 
> > > > > > > float _Complex add(float _Complex a, float _Complex b) {
> > > > > > >   return a + b;
> > > > > > > }
> > > > > > > 
> > > > > > > In this case, getPromotionType would return the null type and EmitBinOps would just go through the  "older" control path where there is no promotion.
> > > > > > > Unless I misunderstood your comment?
> > > > > > > 
> > > > > > > 
> > > > > > I'm talking about the unpromoted emission path for an operation that we want to *internally* promote.  So like your example but using `_Float16` — we want to emit the binary `+` as a `float` operation, but the context needs an unpromoted value because it's going to be returned.
> > > > > > 
> > > > > > The promoted emission path (the various `EmitPromoted` methods) represents a request by the caller to produce a result that doesn't match the formal type of the expression.  The normal emission path `Visit` etc.) represents a request by the caller to produce a result normally, i.e. one that matches the formal type.  In general, we always start in the latter because arbitrary contexts always expect a value of the formal type; it's only these recursive calls within promoted emitters that contextually want a promoted value.
> > > > > > 
> > > > > > In your current patch, you're entering the promoted path by special-casing one context that frequently terminates promoted emission: `EmitComplexExprIntoLValue`, which is used for things like assignment.  That's not the right way to handle it, though.  Instead, you should do like I asked you to do with the scalar emitter, which is to recognize in the normal emission path for a promotable operation (like binary `+`) that you need to promote the operation and then unpromote the result.  Then things like `EmitComplexExprIntoLValue` will continue to simply enter the normal path because that's the kind of value they need, and the promotion logic will do the right thing from there to ensure we don't emit a `_Float16` operation in LLVM IR.
> > > > > > 
> > > > > > Incidentally, I thought of another context that you ought to do promotion for: when we're converting a promotable value to a larger floating-point type, we should presumably convert the promoted value directly rather than truncating it to the unpromoted type before the conversion.
> > > > > I'm not familiar with the FE details, but I understand the background is just to promote expressions that have more than one operation. So I agree with @zahiraam, and
> > > > > > to ensure we don't emit a _Float16 operation in LLVM IR
> > > > > is not what we expected here.
> > > > > 
> > > > > `_Float16` is a ABI type for which a target supported has to lower any `_Float16` operations in LLVM IR. See what we have done on X86 backend by D107082.
> > > > > 
> > > > > That said, a single `a + b` can be and should be emitted as `%c = fadd half %a, %b`.
> > > > Well, you can provide operation-by-operation lowering support if you want to, but what's going to come out of Clang IRGen in this mode is not going to rely on it.  It's far easier to just emit these operations differently, always using `float`, than to try to recognize the special cases where all the inputs are naturally `half` and so you might as well emit a `half` operation.
> > > > 
> > > > And I think the postcondition that `half` operations don't come out of IRGen is a generally useful property even if you've already written this lowering code.  Among other things, it makes it easy to test that you haven't missed a case where you should be doing promoted emission.
> > > We define `_Float16` as ABI type and require targets to support the ABI lowering. See https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point Promoting in FE doesn't mean it works on any targets by default, so I think it's not much useful.
> > > We need to support `-fexcess-precision=16` the same as GCC does, it's easier to do nothing than promote and then truncate each of the operations in FE.
> > > I think in general, lowering operations for a type is a business of backend. We still need backend to do so for other FEs even we do it in Clang. A recent example is BF16, which people chose to lower in backend rather than mlir, see D126444.
> > > 
> > > > Among other things, it makes it easy to test that you haven't missed a case where you should be doing promoted emission.
> > > I don't understand, how can we check the missed cases if the promotion was done in FE?
> > This patch is doing by default what GCC is doing with the -fexcess-precision=16. And that would be for expressions with any number of operators. But it looks like you want promotion/truncation only for expression with more than one operator?  I think I agree with @rjmccall that it is easier done this way than trying to recognize the special cases.
> > 
> > What you want is for a+b to generate fadd half (not extension, and no truncation ) and for a+b*c to generated a set of fpext, fadd float and fptrunc?
> > Does GCC use their option for special cases too? I would think that when this option is used all expression (regardless of the number of operators) are extended/truncated right?
> > 
> > I am not sure I like the idea of lowering some cases in the FE and some others in the BE. I think it would be cleaner to decide that lowering be done in FE for all expressions and have BE react to it. 
> > 
> > @rmjccall not sure what you mean by:
> > "And I think the postcondition that half operations don't come out of IRGen is a generally useful property even if you've already written this lowering code. Among other things, it makes it easy to test that you haven't missed a case where you should be doing promoted emission."
> > 
> > 
> > 
> > 
> Oh, we may misunderstand each other. Let me clarify my understanding about `expression` so that I can make sure we are talking about the same thing.
> When I mentioned `expression`, I meant to the C/C++ expressions like:
> ```
> _Float16 t1 = a + b;
> _Float16 t2 = a + b + c;
> ```
> So when I talk about `expression granularity`, I mean we do truncation once not matter how many actual operations in the expression. This is an example of the differences when GCC generates code for them https://godbolt.org/z/YndnP8eM4
> 
> > This patch is doing by default what GCC is doing with the -fexcess-precision=16. And that would be for expressions with any number of operators.
> 
> I thought this patch is doing the GCC default behavior that without `-fexcess-precision=16`. For `-fexcess-precision=16`, we don't need to do any promotion at all in FE.
> 
> > Does GCC use their option for special cases too? I would think that when this option is used all expression (regardless of the number of operators) are extended/truncated right?
> 
> GCC needs the option because it only truncates once per expression by default. Is it the same mean as you mentioned `all expression are extended/truncated`?
> 
> > I am not sure I like the idea of lowering some cases in the FE and some others in the BE. I think it would be cleaner to decide that lowering be done in FE for all expressions and have BE react to it.
> 
> The `expression granularity` I expected can only be done in the FE, because we don't have the scope information of C/C++ expression in IR. FE can choose to do the promotion/truncation pre operation, or leave it the backend which is what I suggested here.
>>I thought this patch is doing the GCC default behavior that without -fexcess-precision=16. For -fexcess-precision=16, we don't need to do any promotion at all in FE.
Yes you are right. I was mistaken about what the option was doing. 
We should have the same default's behavior than GCC, IMHO. So if we want to mimic what GCC is doing with the option, then we should have an option too. I don't think we should do with no option what GCC is doing with the option?
Meanwhile I am updating this patch with corrections to all your comments and see what @rjmccall thinks.
Thanks.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107