[llvm-dev] unnecessary reload of 8-byte struct on i386
Seth Brenith via llvm-dev
llvm-dev at lists.llvm.org
Fri Oct 25 14:28:42 PDT 2019
I've recently been looking at the generated code for a few functions in Chromium while investigating crashes, and I came across a curious pattern. A smallish repro case is available at https://godbolt.org/z/Dsu1WI . In that case, the function Assembler::emit_arith receives a struct (Operand) by value and passes it by value to another function. That struct is 8 bytes long, so the -O3 generated code uses movsd to copy it up the stack. However, we end up with some loads that aren't needed, as in the following chunk:
movsd xmm0, qword ptr [ecx] # xmm0 = mem,zero
mov dword ptr [esp + 24], edx
movsd qword ptr [esp + 40], xmm0
movsd xmm0, qword ptr [esp + 40] # xmm0 = mem,zero
movsd qword ptr [esp + 8], xmm0
As far as I can tell, the fourth line has no effect. On its own, that seems like a small missed opportunity for optimization. However, this sequence of instructions also appears to trigger a hardware bug on a small fraction of devices which sometimes end up storing zero at esp+8. A more in-depth discussion of that issue can be found here: https://bugs.chromium.org/p/v8/issues/detail?id=9774 .
I'm hoping that getting rid of the second load in the sequence above would appease these misbehaving machines (though of course I don't know that it would), as well as making the code a little smaller for everybody else. Does that sound like a reasonable idea? Would LLVM be interested in a patch related to eliminating reloads like this? Does anybody have advice about where I should start looking, or any reasons it would be very hard to achieve the result I'm hoping for?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev