[llvm-dev] [EXTERNAL] Re: unnecessary reload of 8-byte struct on i386

Seth Brenith via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 30 08:54:45 PDT 2019

Very interesting, thanks for the analysis and further reduction! Over-aligning the struct seems like a simple workaround that would have little impact elsewhere.

From: David Zarzycki <dave at znu.io>
Sent: Monday, October 28, 2019 11:23 PM
To: Seth Brenith <Seth.Brenith at microsoft.com>
Cc: llvm-dev at lists.llvm.org
Subject: [EXTERNAL] Re: [llvm-dev] unnecessary reload of 8-byte struct on i386

This just looks like a temporary stack variable wasn’t properly eliminated because the compiler modeled `Operand` internally as an “i64” which i386 doesn’t natively support. A further reduction, with notes about changes that sidestep the bug:


On Oct 26, 2019, at 12:28 AM, Seth Brenith via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hello folks,

I’ve recently been looking at the generated code for a few functions in Chromium while investigating crashes, and I came across a curious pattern. A smallish repro case is available at https://godbolt.org/z/Dsu1WI<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgodbolt.org%2Fz%2FDsu1WI&data=02%7C01%7CSeth.Brenith%40microsoft.com%7C9a7bff8d4a864a3bd7a908d75c3880fb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079270882137742&sdata=OeSk0j2ISpza%2F%2FVUWDsmUcGUP8MX3WRwkAtyIyO3YxQ%3D&reserved=0> . In that case, the function Assembler::emit_arith receives a struct (Operand) by value and passes it by value to another function. That struct is 8 bytes long, so the -O3 generated code uses movsd to copy it up the stack. However, we end up with some loads that aren’t needed, as in the following chunk:

movsd xmm0, qword ptr [ecx] # xmm0 = mem[0],zero
mov dword ptr [esp + 24], edx
movsd qword ptr [esp + 40], xmm0
movsd xmm0, qword ptr [esp + 40] # xmm0 = mem[0],zero
movsd qword ptr [esp + 8], xmm0

As far as I can tell, the fourth line has no effect. On its own, that seems like a small missed opportunity for optimization. However, this sequence of instructions also appears to trigger a hardware bug on a small fraction of devices which sometimes end up storing zero at esp+8. A more in-depth discussion of that issue can be found here: https://bugs.chromium.org/p/v8/issues/detail?id=9774<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.chromium.org%2Fp%2Fv8%2Fissues%2Fdetail%3Fid%3D9774&data=02%7C01%7CSeth.Brenith%40microsoft.com%7C9a7bff8d4a864a3bd7a908d75c3880fb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079270882137742&sdata=ujiic5Tb0%2F%2BsHM7sH6JXCJlpsQSKD6txl5g3HtEYOuo%3D&reserved=0> .

I’m hoping that getting rid of the second load in the sequence above would appease these misbehaving machines (though of course I don’t know that it would), as well as making the code a little smaller for everybody else. Does that sound like a reasonable idea? Would LLVM be interested in a patch related to eliminating reloads like this? Does anybody have advice about where I should start looking, or any reasons it would be very hard to achieve the result I’m hoping for?

LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191030/c06ae3c7/attachment-0001.html>

More information about the llvm-dev mailing list