[PATCH] D113107: Support of expression granularity for _Float16.

Mon Nov 22 23:22:19 PST 2021

rjmccall added a comment.

For that example, yes, approach #3 would result in that exact same IR on targets that lack direct hardware support for `_Float16` operations.  But getting that behavior right in general requires a different implementation than is provided by this patch, which is implementing approach #4 and inappropriately changing the formal types of expressions.

In contrast, approach #1 would produce IR like this:

  define dso_local arm_aapcscc half @foo(half %a, half %b, half %c) #0 {
  entry:
    %a.addr = alloca half, align 2
    %b.addr = alloca half, align 2
    %c.addr = alloca half, align 2
    store half %a, half* %a.addr, align 2
    store half %b, half* %b.addr, align 2
    store half %c, half* %c.addr, align 2
    %0 = load half, half* %a.addr, align 2
    %conv = fpext half %0 to float
    %1 = load half, half* %b.addr, align 2
    %conv1 = fpext half %1 to float
    %add = fadd float %conv, %conv1
    %trunc = fptrunc float %add to half
    %ext = fpext half %trunc to float
    %2 = load half, half* %c.addr, align 2
    %conv2 = fpext half %2 to float
    %add3 = fadd float %ext, %conv2
    %3 = fptrunc float %add3 to half
    ret half %3
  }

I was under the impression that `-fexcess-precision` had some sort of strict mode that forces this pattern, but apparently not, and the choices are just between `standard` (truncation is only forced at casts and assignments) and `fast` (optimizer has free rein to remove truncations).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107