[PATCH] D113107: Support of expression granularity for _Float16.

Mon Nov 22 10:53:11 PST 2021

zahiraam added a comment.

Question about this comment:

3. “ Approaches #1 and #2 require a lot of intermediate conversions when hardware isn't available. In our example, a + b + c has to be calculated as (_Float16) ((float) (_Float16) ((float) a + (float) b) + (float) c), where the result of one addition is converted down and then converted back again. You can avoid this by specifically recognizing this pattern and eliminating the conversion from sub-operations that happen to be of type float, so that in our example, a + b + c would be calculated as (_Float16) ((float) a + (float) b + (float) c). This is actually allowed by the C standard by default as a form of FP contraction; in fact, I believe C's rules for FP contraction were originally designed for exactly this kind of situation, except that it was emulating float with double on hardware that only provided arithmetic on the latter. Obviously, this can change results.”

Without any changes to clang this test case:
// RUN: %clang_cc1 -triple x86_64-linux  -emit-llvm  < %s
_Float16 foo (_Float16 a, _Float16 b, _Float16 c) {

  return (_Float16) ((float) a + (float) b + (float) c);

}
Generates this IR:
target triple = "x86_64-unknown-linux"

; Function Attrs: noinline nounwind optnone
define dso_local half @foo(half %a, half %b, half %c) #0 {
entry:

  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  %c.addr = alloca half, align 2
  store half %a, half* %a.addr, align 2
  store half %b, half* %b.addr, align 2
  store half %c, half* %c.addr, align 2
  %0 = load half, half* %a.addr, align 2
  %conv = fpext half %0 to float
  %1 = load half, half* %b.addr, align 2
  %conv1 = fpext half %1 to float
  %add = fadd float %conv, %conv1
  %2 = load half, half* %c.addr, align 2
  %conv2 = fpext half %2 to float
  %add3 = fadd float %add, %conv2
  %conv4 = fptrunc float %add3 to half
  ret half %conv4

}

And this case:
__fp16 foo (__fp16 a, __fp16 b, __fp16 c) {

  return a + b + c;

}
Compiled with these options:

-c -Xclang "-triple" -Xclang "armv7a-linux-gnu" -target arm -emit-llvm -S
Generates this IR:
target triple = "armv7a-unknown-linux-gnu"

; Function Attrs: noinline nounwind optnone
define dso_local arm_aapcscc half @foo(half %a, half %b, half %c) #0 {
entry:

  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  %c.addr = alloca half, align 2
  store half %a, half* %a.addr, align 2
  store half %b, half* %b.addr, align 2
  store half %c, half* %c.addr, align 2
  %0 = load half, half* %a.addr, align 2
  %conv = fpext half %0 to float
  %1 = load half, half* %b.addr, align 2
  %conv1 = fpext half %1 to float
  %add = fadd float %conv, %conv1
  %2 = load half, half* %c.addr, align 2
  %conv2 = fpext half %2 to float
  %add3 = fadd float %add, %conv2
  %3 = fptrunc float %add3 to half
  ret half %3

}

I see no difference in the IR generated.
So this:
// RUN: %clang_cc1 -triple x86_64-linux  -emit-llvm  < %s
_Float16 foo (_Float16 a, _Float16 b, _Float16 c) {

  return a + b + c;

}

Should also generate this same IR right?

================
Comment at: clang/test/CodeGen/X86/Float16-aritmetic.c:8-9
+  // CHECK: alloca half
+  // CHECK: store half {{.*}}, half*
+  // CHECK: store half {{.*}}, half*
+  // CHECK: load half, half*
----------------
pengfei wrote:
> This isn't correct without the ABI code change. I did some work in D107082. I plan to refactor (if I have enough time)
If this the output we want to generate, should the changes D107082 happen before the changes in this patch?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107