[llvm-dev] TypePromoteFloat loses intermediate rounding operations

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Tue Dec 10 12:18:15 PST 2019


For the following C code

__fp16 x, y, z, w;

void foo() {
x = y + z;
x = x + w;
}

clang produces IR that extends each operand to float and then truncates to
half before assigning to x. Like this

define dso_local void @foo() #0 !dbg !18 {
%1 = load half, half* @y, align 2, !dbg !21
%2 = fpext half %1 to float, !dbg !21
%3 = load half, half* @z, align 2, !dbg !22
%4 = fpext half %3 to float, !dbg !22
%5 = fadd float %2, %4, !dbg !23
%6 = fptrunc float %5 to half, !dbg !21
store half %6, half* @x, align 2, !dbg !24
%7 = load half, half* @x, align 2, !dbg !25
%8 = fpext half %7 to float, !dbg !25
%9 = load half, half* @w, align 2, !dbg !26
%10 = fpext half %9 to float, !dbg !26
%11 = fadd float %8, %10, !dbg !27
%12 = fptrunc float %11 to half, !dbg !25
store half %12, half* @x, align 2, !dbg !28
ret void, !dbg !29
}

InstCombine then comes along and gets rid of all of the fpext and fptrunc.
Leaving

define dso_local void @foo() local_unnamed_addr #0 !dbg !18 {
%1 = load half, half* @y, align 2, !dbg !21, !tbaa !22
%2 = load half, half* @z, align 2, !dbg !26, !tbaa !22
%3 = fadd half %1, %2, !dbg !21
%4 = load half, half* @w, align 2, !dbg !27, !tbaa !22
%5 = fadd half %3, %4, !dbg !28
store half %5, half* @x, align 2, !dbg !29, !tbaa !22
ret void, !dbg !30
}


Then SelectionDAG type legalization comes along and creates this as the
final assembly

pushq %rax
.cfi_def_cfa_offset 16
movzwl y(%rip), %edi
callq __gnu_h2f_ieee
movss %xmm0, 4(%rsp) # 4-byte Spill
movzwl z(%rip), %edi
callq __gnu_h2f_ieee
addss 4(%rsp), %xmm0 # 4-byte Folded Reload
movss %xmm0, 4(%rsp) # 4-byte Spill
movzwl w(%rip), %edi
callq __gnu_h2f_ieee
addss 4(%rsp), %xmm0 # 4-byte Folded Reload
callq __gnu_f2h_ieee
movw %ax, x(%rip)
popq %rax


I assumed SelectionDAG should produce something equivalent to the original
clang code with 4 total extends to f32 and 2 truncates. Instead we got 3
extends and 1 truncate. So we lost the intermediate rounding between the 2
adds that was in the original clang IR.

I believe this occurs because the TypePromoteFloat legalization converts
all arithmetic operations to their f32 equivalents, but does not place
conversions to/from half around them. Instead fp_to_f16 and f16_to_fp nodes
are only generated at loads, stores, bitcasts, and a probably a few other
places. Basically only the place where the 16-bit size is needed to make
the operation possible. Basically what we have is a very similar
implementation to promoting integers, but that doesn't work for FP because
we lose out on intermediate rounding.

It seems like what we should instead do is insert fp16_to_fp and fp_to_fp16
in the libcall and arithmetic op handling. And use i16 to connect the
legalized pieces together. Similar to how we use integer types when
softening operations. I'm not sure if there would still be rounding issues
with this, but it seems closer to matching the IR.

Unfortunately, I think this would have the side effect of changing half
arguments and return types to i16 instead of float, which would be an ABI
change. At least on some targets __fp16 can't be used as an argument or
return type so maybe that won't be a real problem?

Anyone else have any thoughts on this?

~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191210/f962716b/attachment.html>


More information about the llvm-dev mailing list