[PATCH] D113107: Support of expression granularity for _Float16.
John McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Nov 22 23:22:19 PST 2021
rjmccall added a comment.
For that example, yes, approach #3 would result in that exact same IR on targets that lack direct hardware support for `_Float16` operations. But getting that behavior right in general requires a different implementation than is provided by this patch, which is implementing approach #4 and inappropriately changing the formal types of expressions.
In contrast, approach #1 would produce IR like this:
define dso_local arm_aapcscc half @foo(half %a, half %b, half %c) #0 {
entry:
%a.addr = alloca half, align 2
%b.addr = alloca half, align 2
%c.addr = alloca half, align 2
store half %a, half* %a.addr, align 2
store half %b, half* %b.addr, align 2
store half %c, half* %c.addr, align 2
%0 = load half, half* %a.addr, align 2
%conv = fpext half %0 to float
%1 = load half, half* %b.addr, align 2
%conv1 = fpext half %1 to float
%add = fadd float %conv, %conv1
%trunc = fptrunc float %add to half
%ext = fpext half %trunc to float
%2 = load half, half* %c.addr, align 2
%conv2 = fpext half %2 to float
%add3 = fadd float %ext, %conv2
%3 = fptrunc float %add3 to half
ret half %3
}
I was under the impression that `-fexcess-precision` had some sort of strict mode that forces this pattern, but apparently not, and the choices are just between `standard` (truncation is only forced at casts and assignments) and `fast` (optimizer has free rein to remove truncations).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D113107/new/
https://reviews.llvm.org/D113107
More information about the cfe-commits
mailing list