[llvm] r324835 - [X86] Reduce Store Forward Block issues in HW
Martin Böhme via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 14 07:59:38 PST 2018
Thanks for the heads up!
I'll test to see whether our internal miscompiles are now eliminated.
On Wed, 14 Feb 2018 at 16:41, Saba, Lama <lama.saba at intel.com> wrote:
> Okay, recommitted, will keep track.
> Thanks,
> Lama
>
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> Hans Wennborg
> Sent: Tuesday, February 13, 2018 4:16 PM
> To: Saba, Lama <lama.saba at intel.com>
> Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in HW
>
> It's possible but takes some work. I think if your change fixes Martin's
> test case, it's likely that it fixes ours also, and it's easiest to commit
> and let our buildbots test automatically.
>
> On Tue, Feb 13, 2018 at 3:12 PM, Saba, Lama <lama.saba at intel.com> wrote:
> > I believe I have a fix, is there a way to run Chromium before committing
> to make sure the issues are gone?
> >
> > -----Original Message-----
> > From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> > Hans Wennborg
> > Sent: Tuesday, February 13, 2018 3:02 PM
> > To: Saba, Lama <lama.saba at intel.com>
> > Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> > Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues
> > in HW
> >
> > Great!
> >
> > It also looks like the new code is less efficient, with 8 moves instead
> of two wide ones. Is this also part of the AA problem, or is it something
> else?
> >
> > On Tue, Feb 13, 2018 at 1:52 PM, Saba, Lama <lama.saba at intel.com> wrote:
> >> Hi,
> >>
> >> Thank you, I found the issue, some of the AA info (mem offsets) is not
> correctly updated when building the new copies, therefore the compiler
> allows himself to move some of the instructions up and overwriting the
> correct value.
> >> (the first 4 added instructions should be after movw $100,
> 20(%rsp) )
> >> I am working on a fix.
> >>
> >> -----Original Message-----
> >> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> >> Hans Wennborg
> >> Sent: Tuesday, February 13, 2018 2:44 PM
> >> To: Saba, Lama <lama.saba at intel.com>
> >> Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> >> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues
> >> in HW
> >>
> >> Attaching preprocessed source and a creduced version of that.
> >>
> >> I think maybe the creduced version was reduced to aggressively, the
> final "printf(&p)" looks very fishy.
> >>
> >> But looking at the asm difference from the full preprocessed might be
> good enough. Here's the diff before and after r324835.
> >>
> >> (I'm not entirely sure what's happening, but it looks like the new
> >> code fails to copy the "d\0" part of the string, $100.)
> >>
> >> --- /tmp/good 2018-02-13 13:35:15.230627030 +0100
> >> +++ /tmp/bad 2018-02-13 13:35:10.194682062 +0100
> >> @@ -1,47 +1,53 @@
> >> .text
> >> .file "a.cc"
> >> .globl main # -- Begin function main
> >> .p2align 4, 0x90
> >> .type main, at function
> >> main: # @main
> >> .cfi_startproc
> >> # %bb.0: #
> >> %_ZN9__gnu_cxx17__sso_string_baseIcSt11char_traitsIcESaIcEED2Ev.exit
> >> pushq %rbx
> >> .cfi_def_cfa_offset 16
> >> subq $64, %rsp
> >> .cfi_def_cfa_offset 80
> >> .cfi_offset %rbx, -16
> >> leaq 48(%rsp), %rbx
> >> movq %rbx, 32(%rsp)
> >> leaq 16(%rsp), %rax
> >> movq %rax, (%rsp)
> >> movl $1819438967, 16(%rsp) # imm = 0x6C726F77
> >> + movl 16(%rsp), %eax
> >> + movl %eax, 48(%rsp)
> >> + movzwl 20(%rsp), %eax
> >> + movw %ax, 52(%rsp)
> >> movw $100, 20(%rsp)
> >> - movups 16(%rsp), %xmm0
> >> - movups %xmm0, 48(%rsp)
> >> + movq 22(%rsp), %rax
> >> + movq %rax, 54(%rsp)
> >> + movzwl 30(%rsp), %eax
> >> + movw %ax, 62(%rsp)
> >> movq $5, 40(%rsp)
> >> movq %rbx, %rdi
> >> callq puts
> >> movq 32(%rsp), %rdi
> >> cmpq %rbx, %rdi
> >> je .LBB0_2
> >> # %bb.1: # %if.then.i.i.i
> >> callq _ZdlPv
> >> .LBB0_2: #
> >>
> %_ZN21ExplicitlyConstructedIN9__gnu_cxx14__versa_stringIcSt11char_traitsIcESaIcENS0_17__sso_string_baseEEEE8DestructEv.exit
> >> xorl %eax, %eax
> >> addq $64, %rsp
> >> popq %rbx
> >> retq
> >> .Lfunc_end0:
> >> .size main, .Lfunc_end0-main
> >> .cfi_endproc
> >> # -- End function
> >> .type .L.str, at object # @.str
> >> .section .rodata.str1.1,"aMS", at progbits,1
> >> .L.str:
> >> .asciz "world"
> >> .size .L.str, 6
> >>
> >>
> >> - .ident "clang version 7.0.0 (trunk 324828) (llvm/trunk 324834)"
> >> + .ident "clang version 7.0.0 (trunk 324828) (llvm/trunk 324835)"
> >> .section ".note.GNU-stack","", at progbits
> >>
> >> On Tue, Feb 13, 2018 at 12:20 PM, Saba, Lama <lama.saba at intel.com>
> wrote:
> >>> Thanks! I will check it
> >>>
> >>> -----Original Message-----
> >>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf
> >>> Of Hans Wennborg
> >>> Sent: Tuesday, February 13, 2018 12:03 PM
> >>> To: Saba, Lama <lama.saba at intel.com>
> >>> Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> >>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>> issues in HW
> >>>
> >>> Martin was able to reduce his repro to a stand-alone C++ program, and
> I managed to get it working outside our internal environment.
> >>>
> >>> a.cc:
> >>>
> >>> #include <cstdint>
> >>> #include <iostream>
> >>> #include "/usr/include/c++/6/ext/vstring.h"
> >>>
> >>> using int64 = std::int64_t;
> >>>
> >>> template <typename T>
> >>> class ExplicitlyConstructed {
> >>> public:
> >>> void DefaultConstruct() {
> >>> new (&union_) T();
> >>> }
> >>>
> >>> void Destruct() {
> >>> get_mutable()->~T();
> >>> }
> >>>
> >>> constexpr const T& get() const { return reinterpret_cast<const
> T&>(union_); }
> >>> T* get_mutable() { return reinterpret_cast<T*>(&union_); }
> >>>
> >>> private:
> >>> // Prefer c++14 aligned_storage, but for compatibility this will do.
> >>> union AlignedUnion {
> >>> char space[sizeof(T)];
> >>> int64 align_to_int64;
> >>> void* align_to_ptr;
> >>> } union_;
> >>> };
> >>>
> >>> int main() {
> >>> ExplicitlyConstructed<__gnu_cxx::__versa_string<char>> str;
> >>> str.DefaultConstruct();
> >>> *str.get_mutable() = __gnu_cxx::__versa_string<char>("world", 5);
> >>> if (str.get() != "world") {
> >>> std::cout << "Expected 'world' but got '" << str.get() << "'\n";
> >>> }
> >>> str.Destruct();
> >>>
> >>> return 0;
> >>> }
> >>>
> >>> bin/clang++ -O2 /tmp/a.cc && ./a.out Expected 'world' but got 'worl'
> >>>
> >>>
> >>> I'll reduce further to get rid off the library dependencies.
> >>>
> >>>
> >>>
> >>> On Mon, Feb 12, 2018 at 5:58 PM, Saba, Lama <lama.saba at intel.com>
> wrote:
> >>>> Hmm.. I'm not too familiar with protobufs, is there a way I can take
> >>>> a look at the generated assembly or ll?
> >>>>
> >>>> I see that the last 4 bytes are not getting copied, so yeah it's
> >>>> probably the memcpy at initialization/return but my attempts to
> >>>> reproduce did not work..
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> From: Martin Böhme [mailto:mboehme at google.com]
> >>>> Sent: Monday, February 12, 2018 4:56 PM
> >>>> To: Saba, Lama <lama.saba at intel.com>
> >>>> Cc: hans at chromium.org
> >>>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>>> issues in HW
> >>>>
> >>>>
> >>>>
> >>>> I've got an internal repro that essentially looks like this:
> >>>>
> >>>>
> >>>>
> >>>> proto file:
> >>>>
> >>>>
> >>>>
> >>>> message MyProto {
> >>>>
> >>>> optional bytes foo = 1 [default = "world"];
> >>>>
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> test file:
> >>>>
> >>>>
> >>>>
> >>>> TEST(ProtoReproTest, Repro) {
> >>>>
> >>>> MyProto proto;
> >>>>
> >>>> EXPECT_EQ("world", proto.foo());
> >>>>
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> This fails with an actual value for proto.foo() of "worl\0". I
> >>>> assume it's probably not a coincidence that four bytes are getting
> copied correctly.
> >>>> Maybe this already gives you a theory to start with?
> >>>>
> >>>>
> >>>>
> >>>> I'm still working to get this building in a way that reproduces
> externally.
> >>>> My hope is that I'll be able to get a repro that doesn't even use
> >>>> protos -- it looks to me as if the error is already happening when
> >>>> the string gets initialized.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, 12 Feb 2018 at 15:42, Saba, Lama <lama.saba at intel.com> wrote:
> >>>>
> >>>> Okay thanks.
> >>>>
> >>>> -----Original Message-----
> >>>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf
> >>>> Of Hans Wennborg
> >>>> Sent: Monday, February 12, 2018 4:39 PM
> >>>> To: Saba, Lama <lama.saba at intel.com>; Martin Böhme
> >>>> <mboehme at google.com>
> >>>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>>> issues in HW
> >>>>
> >>>> I don't have anything small, it would require building a part of
> Chromium.
> >>>>
> >>>> Martin also saw miscompiles internally, concerning initialization of
> >>>> protobuf fields. I think he's working on a repro.
> >>>>
> >>>> On Mon, Feb 12, 2018 at 3:36 PM, Saba, Lama <lama.saba at intel.com>
> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Thanks for notifying, I am fixing the assertion problem, regarding
> >>>>> the other failures:
> >>>>> In some of Chromium's 64-bit builds, we're seeing test failures
> >>>>> that started around the same time frame as this change, for example:
> >>>>> https://ci.chromium.org/buildbot/chromium.clang/ToTWin64/920
> >>>>> So besides the assert, I suspect this is causing miscompiles.
> >>>>>
> >>>>> Is there a way I can get a reproducer for this?
> >>>>>
> >>>>> Thanks,
> >>>>> Lama
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf
> >>>>> Of Hans Wennborg
> >>>>> Sent: Monday, February 12, 2018 2:47 PM
> >>>>> To: Saba, Lama <lama.saba at intel.com>
> >>>>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
> >>>>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>>>> issues in HW
> >>>>>
> >>>>> I've reverted this in r324887 as it broke the Chromium build, see
> >>>>> https://bugs.llvm.org/show_bug.cgi?id=36346
> >>>>>
> >>>>> On Sun, Feb 11, 2018 at 10:34 AM, Lama Saba via llvm-commits
> >>>>> <llvm-commits at lists.llvm.org> wrote:
> >>>>>> Author: lsaba
> >>>>>> Date: Sun Feb 11 01:34:12 2018
> >>>>>> New Revision: 324835
> >>>>>>
> >>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=324835&view=rev
> >>>>>> Log:
> >>>>>> [X86] Reduce Store Forward Block issues in HW
> >>>>>>
> >>>>>> If a load follows a store and reloads data that the store has
> >>>>>> written to memory, Intel microarchitectures can in many cases
> >>>>>> forward the data directly from the store to the load, This "store
> >>>>>> forwarding" saves cycles by enabling the load to directly obtain
> >>>>>> the data instead of accessing the data from cache or memory.
> >>>>>> A "store forward block" occurs in cases that a store cannot be
> >>>>>> forwarded to the load. The most typical case of store forward
> >>>>>> block on Intel Core microarchiticutre that a small store cannot be
> forwarded to a large load.
> >>>>>> The estimated penalty for a store forward block is ~13 cycles.
> >>>>>>
> >>>>>> This pass tries to recognize and handle cases where "store forward
> block"
> >>>>>> is created by the compiler when lowering memcpy calls to a
> >>>>>> sequence of a load and a store.
> >>>>>>
> >>>>>> The pass currently only handles cases where memcpy is lowered to
> >>>>>> XMM/YMM registers, it tries to break the memcpy into smaller copies.
> >>>>>> breaking the memcpy should be possible since there is no atomicity
> >>>>>> guarantee for loads and stores to XMM/YMM.
> >>>>>>
> >>>>>> Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
> >>>>>>
> >>>>>> Added:
> >>>>>> llvm/trunk/lib/Target/X86/X86FixupSFB.cpp
> >>>>>> llvm/trunk/test/CodeGen/X86/fixup-sfb.ll
> >>>>>> Modified:
> >>>>>> llvm/trunk/lib/Target/X86/CMakeLists.txt
> >>>>>> llvm/trunk/lib/Target/X86/X86.h
> >>>>>> llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
> >>> ---------------------------------------------------------------------
> >>> Intel Israel (74) Limited
> >>>
> >>> This e-mail and any attachments may contain confidential material for
> >>> the sole use of the intended recipient(s). Any review or distribution
> >>> by others is strictly prohibited. If you are not the intended
> >>> recipient, please contact the sender and delete all copies.
> >> ---------------------------------------------------------------------
> >> Intel Israel (74) Limited
> >>
> >> This e-mail and any attachments may contain confidential material for
> >> the sole use of the intended recipient(s). Any review or distribution
> >> by others is strictly prohibited. If you are not the intended
> >> recipient, please contact the sender and delete all copies.
> > ---------------------------------------------------------------------
> > Intel Israel (74) Limited
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
--
Martin Böhme
Software Engineer
mboehme at google.com
Google Germany GmbH
Erika-Mann-Straße 33
80363 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und
löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please do
not forward it, please inform the sender, and please erase this e-mail
including any attachments. Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180214/4029b985/attachment.html>
More information about the llvm-commits
mailing list