[llvm-dev] Is it legal to pass a half by value on x86_64?

Fri Mar 5 12:57:23 PST 2021

Hi Craig,

I am sorry for my poor example, probably better to take me out of the middle.
I have attached the complete IR for the example on which I am working.  c2_foo() is where we break down.

Cheers.

JP

________________________________
From: Craig Topper <craig.topper at gmail.com>
Sent: Friday, March 5, 2021 1:23 PM
To: Wang, Pengfei <pengfei.wang at intel.com>
Cc: Jason Hafer <jhafer at mathworks.com>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: Is it legal to pass a half by value on x86_64?

For this code the half store from the IR appears to have been removed because it is a local variable that was never read from. The store that says "4-byte Spill" is a different store and seems to be some -O0 artifact. With -O2 the whole thing becomes just a ret.

define void @foo(i8, i8, i8, i8, half) {
; CHECK-I686:    callq __gnu_f2h_ieee

  %6 = alloca half
  store half %4, half* %6, align 1
  ret void
}

x86_64-pc-windows gives:
push rax
.seh_stackalloc 8
.seh_endprologue
movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero
movss dword ptr [rsp + 4], xmm0 # 4-byte Spill
pop rax
ret
.seh_handlerdata
.text
.seh_endproc

As an experiment, I tried this which does produce a call to __gnu_f2h_ieee on windows with llvm 8.0 and llvm 10.0

define void @foo(half*, i8, i8, half) {
store half %3, half* %0, align 1
ret void
}

For this assembly you provided, I don't see any reads from xmm0, or any word stores. So it's hard for me to determine what might be going wrong. Can provide the assembly where xmm0 is eventually used?

mov rax, qword ptr [rsp + 424]
 movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero  # <-- moves the data like it wants to convert but never does
 mov qword ptr [rsp + 344], rcx
 mov qword ptr [rsp + 336], rdx
 mov qword ptr [rsp + 328], r8
 mov qword ptr [rsp + 320], r9
 mov qword ptr [rsp + 304], 0
 mov qword ptr [rsp + 296], 0
 mov qword ptr [rsp + 288], 0
 mov qword ptr [rsp + 280], 0
 mov rcx, qword ptr [rsp + 328]
 mov qword ptr [rsp + 272], rcx
 mov rcx, qword ptr [rsp + 328]
 mov rcx, qword ptr [rcx + 8]
 mov qword ptr [rsp + 264], rcx
 mov rcx, qword ptr [rsp + 336]
 mov rcx, qword ptr [rcx + 56]
 mov qword ptr [rsp + 256], rcx
 mov dword ptr [rsp + 312], 0
 mov qword ptr [rsp + 248], rax # 8-byte Spill
 movss dword ptr

~Craig

On Fri, Mar 5, 2021 at 6:46 AM Wang, Pengfei <pengfei.wang at intel.com<mailto:pengfei.wang at intel.com>> wrote:

Hi Jason,

The different behavior between Linux and Windows comes form the difference of the calling conversion. Windows uses 4 registers for arguments passing which Linux uses 6.

https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing<https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#parameter-passing>

Thanks

Pengfei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:21 PM
To: Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64?

Hi All,

Thank you very much for all the great information.  This is awesome!

To circle back on Craig's questions.

I did notice LLVM 11 behave very differently.

** Per: What does "incorrect math operations" mean?

The half is passed to the function as a float.  The function does operations with other half numbers.  On Windows when we don't get the float to half conversation the input is always truncated to 0.0.

** Per: "Do you have a more complete IR file for Windows that I can take a look at?"

I can get you our IR if you want, but I think it is more convoluted than required.  I was working on a unit test and I think all one needs to see the anomaly is:

define void @foo(i8, i8, i8, i8, half) {

; CHECK-I686:    callq __gnu_f2h_ieee

  %6 = alloca half

  store half %4, half* %6, align 1

  ret void

}

x86_64-pc-windows gives:
push rax

.seh_stackalloc 8

.seh_endprologue

movss xmm0, dword ptr [rsp + 48] # xmm0 = mem[0],zero,zero,zero

movss dword ptr [rsp + 4], xmm0 # 4-byte Spill

pop rax

ret

.seh_handlerdata

.text

.seh_endproc

What I find extremely interesting is the behavior seems has something to do with the stack?  For dropping the inputs by one then even Windows will generate the conversion.

define void @foo(i8, i8, i8, half) {

; CHECK-I686:    callq __gnu_f2h_ieee

  %5 = alloca half

  store half %3, half* %5, align 1

  ret void

}

x86_64-pc-windows gives:

sub rsp, 40

.seh_stackalloc 40

.seh_endprologue

movabs rax, offset __gnu_f2h_ieee

movaps xmm0, xmm3

call rax

mov word ptr [rsp + 38], ax

add rsp, 40

ret

.seh_handlerdata

.text

.seh_endproc

** If interested, here is a dissection of our real asm.
For both Windows and Linux our IR calls c2_foo() with a half(2):

...

call void @c2_foo(i8* %S_6, [21 x i8*]* %ptr_gvar_instance_7, %emlrtStack* %c2_b_st_, [18 x float]* @15, half 0xH4000, [18 x i8]* %t10)

They both register this in c2_foo as:

...

  %c2_in2_ = alloca half

  store half %c2_in2, half* %c2_in2_, align 1

When we compile them, they both send 0x40000000 to c2_foo (a single).

The Linux c2_foo() asm addresses this with a float2half conversion:

...

 mov qword ptr [rsp + 448], rdi

 mov qword ptr [rsp + 440], rsi

 mov qword ptr [rsp + 432], rdx

 mov qword ptr [rsp + 424], rcx

 movabs rcx, offset __gnu_f2h_ieee     # <---Convert Here

 mov qword ptr [rsp + 336], r8 # 8-byte Spill

 call rcx

 mov word ptr [rsp + 422], ax

 mov rcx, qword ptr [rsp + 336] # 8-byte Reload

 mov qword ptr [rsp + 408], rcx

 mov qword ptr [rsp + 392], 0

 mov qword ptr [rsp + 384], 0

 mov qword ptr [rsp + 376], 0

 mov qword ptr [rsp + 368], 0

 mov rdx, qword ptr [rsp + 432]

 mov qword ptr [rsp + 360], rdx

 mov rdx, qword ptr [rsp + 432]

 mov rdx, qword ptr [rdx + 8]

 mov qword ptr [rsp + 352], rdx

 mov rdx, qword ptr [rsp + 440]

 mov rdx, qword ptr [rdx + 56]

 mov qword ptr [rsp + 344], rdx

 mov dword ptr [rsp + 400], 0

 jmp .LBB9_9

The Windows c2_foo() asm is missing this conversion but treats the value as if it has been converted.

...

 mov rax, qword ptr [rsp + 424]

 movss xmm0, dword ptr [rsp + 416] # xmm0 = mem[0],zero,zero,zero  # <-- moves the data like it wants to convert but never does

 mov qword ptr [rsp + 344], rcx

 mov qword ptr [rsp + 336], rdx

 mov qword ptr [rsp + 328], r8

 mov qword ptr [rsp + 320], r9

 mov qword ptr [rsp + 304], 0

 mov qword ptr [rsp + 296], 0

 mov qword ptr [rsp + 288], 0

 mov qword ptr [rsp + 280], 0

 mov rcx, qword ptr [rsp + 328]

 mov qword ptr [rsp + 272], rcx

 mov rcx, qword ptr [rsp + 328]

 mov rcx, qword ptr [rcx + 8]

 mov qword ptr [rsp + 264], rcx

 mov rcx, qword ptr [rsp + 336]

 mov rcx, qword ptr [rcx + 56]

 mov qword ptr [rsp + 256], rcx

 mov dword ptr [rsp + 312], 0

 mov qword ptr [rsp + 248], rax # 8-byte Spill

 movss dword ptr

________________________________

From: Wang, Pengfei <pengfei.wang at intel.com<mailto:pengfei.wang at intel.com>>
Sent: Friday, March 5, 2021 7:30 AM
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>>; Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: RE: Is it legal to pass a half by value on x86_64?

I guess it’s designed for language portability. You can use this type across different platforms. Nevertheless, I’m not a FE expert, so I cannot think out other intentions.

The _Float16 is a primitive type in the latest x86 ABI, but there’s no X86 target that supports it yet. So you cannot use it on X86 by now. I think that’s the difference from __fp16 and why should use it.

We also have some discussion here. https://reviews.llvm.org/D97318<https://reviews.llvm.org/D97318>

Thanks

Pengfei

From: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>>
Sent: Friday, March 5, 2021 5:49 PM
To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>>; Wang, Pengfei <pengfei.wang at intel.com<mailto:pengfei.wang at intel.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: Is it legal to pass a half by value on x86_64?

__fp16 is a pure storage format. You cannot pass it by value, because only ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be passed by value while __fp16 is not one of them.

Yep. Any specific reason to use a pure storage format? The native type is _Float16 and would give some benefits, but this is not yet supported on x86, see also:

https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point<https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point>

Cheers,
Sjoerd.

________________________________

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Wang, Pengfei via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: 05 March 2021 06:28
To: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [llvm-dev] Is it legal to pass a half by value on x86_64?

Hi Jason,

__fp16 is a pure storage format. You cannot pass it by value, because only ABI<https://gitlab.com/x86-psABIs/x86-64-ABI> permissive types can be passed by value while __fp16 is not one of them.

  *   if "define void @foo(i8, i8, i8, i8, half) " is even legal to use

half as a target independent type is legal for LLVM. It’s not legal for unsupported target like X86. The behavior depends on how we lowering it. But I don’t know why there’s differences between Linux and Windows. Maybe because “__gnu_f2h_ieee” is a Linux only function?

Thanks

Pengfei

From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Jason Hafer via llvm-dev
Sent: Friday, March 5, 2021 10:46 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: Jason Hafer <jhafer at mathworks.com<mailto:jhafer at mathworks.com>>
Subject: [llvm-dev] Is it legal to pass a half by value on x86_64?

Hello,

I am attempting to understand an anomaly I am seeing when dealing with half on Windows and could use some help.

Using LLVM 8 or 10, if I have IR of the flavor below:
define void @foo(i8, i8, i8, i8, half) {

  %6 = alloca half

  store half %4, half* %6, align 1

  ...

  ret void

}

Using x86_64-pc-linux, we convert the float passed in with __gnu_f2h_ieee.

Using x86_64-pc-windows I do not get the conversion, so we end up with incorrect math operations.

While investigating I noticed clang gave me the error below:

error: parameters cannot have __fp16 type; did you forget * ?
void foo(int dc1, int dc2,int dc3,int dc4, __fp16 in)

So, this got me wondering if "define void @foo(i8, i8, i8, i8, half) " is even legal to use or if I should rather pass by ref?  I have yet to find documentation to convince me one way or the other.  Thus, I was hoping someone here might be able to shed some light on the issue.

Thank you in advance!

Cheers,

JP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/8aaf2854/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: halfOpAnom.ll'
Type: application/octet-stream
Size: 53573 bytes
Desc: halfOpAnom.ll'
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210305/8aaf2854/attachment-0001.obj>