[llvm] r324835 - [X86] Reduce Store Forward Block issues in HW

Martin Böhme via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 14 07:59:38 PST 2018


Thanks for the heads up!

I'll test to see whether our internal miscompiles are now eliminated.


On Wed, 14 Feb 2018 at 16:41, Saba, Lama <lama.saba at intel.com> wrote:

> Okay, recommitted, will keep track.
> Thanks,
> Lama
>
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> Hans Wennborg
> Sent: Tuesday, February 13, 2018 4:16 PM
> To: Saba, Lama <lama.saba at intel.com>
> Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in HW
>
> It's possible but takes some work. I think if your change fixes Martin's
> test case, it's likely that it fixes ours also, and it's easiest to commit
> and let our buildbots test automatically.
>
> On Tue, Feb 13, 2018 at 3:12 PM, Saba, Lama <lama.saba at intel.com> wrote:
> > I believe I have a fix, is there a way to run Chromium before committing
> to make sure the issues are gone?
> >
> > -----Original Message-----
> > From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> > Hans Wennborg
> > Sent: Tuesday, February 13, 2018 3:02 PM
> > To: Saba, Lama <lama.saba at intel.com>
> > Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> > Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues
> > in HW
> >
> > Great!
> >
> > It also looks like the new code is less efficient, with 8 moves instead
> of two wide ones. Is this also part of the AA problem, or is it something
> else?
> >
> > On Tue, Feb 13, 2018 at 1:52 PM, Saba, Lama <lama.saba at intel.com> wrote:
> >> Hi,
> >>
> >> Thank you, I found the issue, some of the AA info (mem offsets) is not
> correctly updated when building the new copies, therefore the compiler
> allows himself to move some of the instructions up  and overwriting the
> correct value.
> >> (the first 4 added instructions should be after   movw    $100,
> 20(%rsp) )
> >> I am working on a fix.
> >>
> >> -----Original Message-----
> >> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> >> Hans Wennborg
> >> Sent: Tuesday, February 13, 2018 2:44 PM
> >> To: Saba, Lama <lama.saba at intel.com>
> >> Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> >> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues
> >> in HW
> >>
> >> Attaching preprocessed source and a creduced version of that.
> >>
> >> I think maybe the creduced version was reduced to aggressively, the
> final "printf(&p)" looks very fishy.
> >>
> >> But looking at the asm difference from the full preprocessed might be
> good enough. Here's the diff before and after r324835.
> >>
> >> (I'm not entirely sure what's happening, but it looks like the new
> >> code fails to copy the "d\0" part of the string, $100.)
> >>
> >> --- /tmp/good   2018-02-13 13:35:15.230627030 +0100
> >> +++ /tmp/bad    2018-02-13 13:35:10.194682062 +0100
> >> @@ -1,47 +1,53 @@
> >>         .text
> >>         .file   "a.cc"
> >>         .globl  main                    # -- Begin function main
> >>         .p2align        4, 0x90
> >>         .type   main, at function
> >>  main:                                   # @main
> >>         .cfi_startproc
> >>  # %bb.0:                                #
> >> %_ZN9__gnu_cxx17__sso_string_baseIcSt11char_traitsIcESaIcEED2Ev.exit
> >>         pushq   %rbx
> >>         .cfi_def_cfa_offset 16
> >>         subq    $64, %rsp
> >>         .cfi_def_cfa_offset 80
> >>         .cfi_offset %rbx, -16
> >>         leaq    48(%rsp), %rbx
> >>         movq    %rbx, 32(%rsp)
> >>         leaq    16(%rsp), %rax
> >>         movq    %rax, (%rsp)
> >>         movl    $1819438967, 16(%rsp)   # imm = 0x6C726F77
> >> +       movl    16(%rsp), %eax
> >> +       movl    %eax, 48(%rsp)
> >> +       movzwl  20(%rsp), %eax
> >> +       movw    %ax, 52(%rsp)
> >>         movw    $100, 20(%rsp)
> >> -       movups  16(%rsp), %xmm0
> >> -       movups  %xmm0, 48(%rsp)
> >> +       movq    22(%rsp), %rax
> >> +       movq    %rax, 54(%rsp)
> >> +       movzwl  30(%rsp), %eax
> >> +       movw    %ax, 62(%rsp)
> >>         movq    $5, 40(%rsp)
> >>         movq    %rbx, %rdi
> >>         callq   puts
> >>         movq    32(%rsp), %rdi
> >>         cmpq    %rbx, %rdi
> >>         je      .LBB0_2
> >>  # %bb.1:                                # %if.then.i.i.i
> >>         callq   _ZdlPv
> >>  .LBB0_2:                                #
> >>
> %_ZN21ExplicitlyConstructedIN9__gnu_cxx14__versa_stringIcSt11char_traitsIcESaIcENS0_17__sso_string_baseEEEE8DestructEv.exit
> >>         xorl    %eax, %eax
> >>         addq    $64, %rsp
> >>         popq    %rbx
> >>         retq
> >>  .Lfunc_end0:
> >>         .size   main, .Lfunc_end0-main
> >>         .cfi_endproc
> >>                                          # -- End function
> >>         .type   .L.str, at object          # @.str
> >>         .section        .rodata.str1.1,"aMS", at progbits,1
> >>  .L.str:
> >>         .asciz  "world"
> >>         .size   .L.str, 6
> >>
> >>
> >> -       .ident  "clang version 7.0.0 (trunk 324828) (llvm/trunk 324834)"
> >> +       .ident  "clang version 7.0.0 (trunk 324828) (llvm/trunk 324835)"
> >>         .section        ".note.GNU-stack","", at progbits
> >>
> >> On Tue, Feb 13, 2018 at 12:20 PM, Saba, Lama <lama.saba at intel.com>
> wrote:
> >>> Thanks! I will check it
> >>>
> >>> -----Original Message-----
> >>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf
> >>> Of Hans Wennborg
> >>> Sent: Tuesday, February 13, 2018 12:03 PM
> >>> To: Saba, Lama <lama.saba at intel.com>
> >>> Cc: mboehme at google.com; llvm-commits <llvm-commits at lists.llvm.org>
> >>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>> issues in HW
> >>>
> >>> Martin was able to reduce his repro to a stand-alone C++ program, and
> I managed to get it working outside our internal environment.
> >>>
> >>> a.cc:
> >>>
> >>> #include <cstdint>
> >>> #include <iostream>
> >>> #include "/usr/include/c++/6/ext/vstring.h"
> >>>
> >>> using int64 = std::int64_t;
> >>>
> >>> template <typename T>
> >>> class ExplicitlyConstructed {
> >>>  public:
> >>>   void DefaultConstruct() {
> >>>     new (&union_) T();
> >>>   }
> >>>
> >>>   void Destruct() {
> >>>     get_mutable()->~T();
> >>>   }
> >>>
> >>>   constexpr const T& get() const { return reinterpret_cast<const
> T&>(union_); }
> >>>   T* get_mutable() { return reinterpret_cast<T*>(&union_); }
> >>>
> >>>  private:
> >>>   // Prefer c++14 aligned_storage, but for compatibility this will do.
> >>>   union AlignedUnion {
> >>>     char space[sizeof(T)];
> >>>     int64 align_to_int64;
> >>>     void* align_to_ptr;
> >>>   } union_;
> >>> };
> >>>
> >>> int main() {
> >>>   ExplicitlyConstructed<__gnu_cxx::__versa_string<char>> str;
> >>>   str.DefaultConstruct();
> >>>   *str.get_mutable() = __gnu_cxx::__versa_string<char>("world", 5);
> >>>   if (str.get() != "world") {
> >>>     std::cout << "Expected 'world' but got '" << str.get() << "'\n";
> >>>   }
> >>>   str.Destruct();
> >>>
> >>>   return 0;
> >>> }
> >>>
> >>> bin/clang++ -O2 /tmp/a.cc && ./a.out Expected 'world' but got 'worl'
> >>>
> >>>
> >>> I'll reduce further to get rid off the library dependencies.
> >>>
> >>>
> >>>
> >>> On Mon, Feb 12, 2018 at 5:58 PM, Saba, Lama <lama.saba at intel.com>
> wrote:
> >>>> Hmm.. I'm not too familiar with protobufs, is there a way I can take
> >>>> a look at the generated assembly or ll?
> >>>>
> >>>> I see that the last 4 bytes are not getting copied, so yeah it's
> >>>> probably the memcpy at initialization/return but my attempts to
> >>>> reproduce did not work..
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> From: Martin Böhme [mailto:mboehme at google.com]
> >>>> Sent: Monday, February 12, 2018 4:56 PM
> >>>> To: Saba, Lama <lama.saba at intel.com>
> >>>> Cc: hans at chromium.org
> >>>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>>> issues in HW
> >>>>
> >>>>
> >>>>
> >>>> I've got an internal repro that essentially looks like this:
> >>>>
> >>>>
> >>>>
> >>>> proto file:
> >>>>
> >>>>
> >>>>
> >>>> message MyProto {
> >>>>
> >>>>   optional bytes foo = 1 [default = "world"];
> >>>>
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> test file:
> >>>>
> >>>>
> >>>>
> >>>> TEST(ProtoReproTest, Repro) {
> >>>>
> >>>>   MyProto proto;
> >>>>
> >>>>   EXPECT_EQ("world", proto.foo());
> >>>>
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> This fails with an actual value for proto.foo() of "worl\0". I
> >>>> assume it's probably not a coincidence that four bytes are getting
> copied correctly.
> >>>> Maybe this already gives you a theory to start with?
> >>>>
> >>>>
> >>>>
> >>>> I'm still working to get this building in a way that reproduces
> externally.
> >>>> My hope is that I'll be able to get a repro that doesn't even use
> >>>> protos -- it looks to me as if the error is already happening when
> >>>> the string gets initialized.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, 12 Feb 2018 at 15:42, Saba, Lama <lama.saba at intel.com> wrote:
> >>>>
> >>>> Okay thanks.
> >>>>
> >>>> -----Original Message-----
> >>>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf
> >>>> Of Hans Wennborg
> >>>> Sent: Monday, February 12, 2018 4:39 PM
> >>>> To: Saba, Lama <lama.saba at intel.com>; Martin Böhme
> >>>> <mboehme at google.com>
> >>>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>>> issues in HW
> >>>>
> >>>> I don't have anything small, it would require building a part of
> Chromium.
> >>>>
> >>>> Martin also saw miscompiles internally, concerning initialization of
> >>>> protobuf fields. I think he's working on a repro.
> >>>>
> >>>> On Mon, Feb 12, 2018 at 3:36 PM, Saba, Lama <lama.saba at intel.com>
> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Thanks for notifying, I am fixing the assertion problem, regarding
> >>>>> the other failures:
> >>>>> In some of Chromium's 64-bit builds, we're seeing test failures
> >>>>> that started around the same time frame as this change, for example:
> >>>>> https://ci.chromium.org/buildbot/chromium.clang/ToTWin64/920
> >>>>> So besides the assert, I suspect this is causing miscompiles.
> >>>>>
> >>>>> Is there a way I can get a reproducer for this?
> >>>>>
> >>>>> Thanks,
> >>>>> Lama
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf
> >>>>> Of Hans Wennborg
> >>>>> Sent: Monday, February 12, 2018 2:47 PM
> >>>>> To: Saba, Lama <lama.saba at intel.com>
> >>>>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
> >>>>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block
> >>>>> issues in HW
> >>>>>
> >>>>> I've reverted this in r324887 as it broke the Chromium build, see
> >>>>> https://bugs.llvm.org/show_bug.cgi?id=36346
> >>>>>
> >>>>> On Sun, Feb 11, 2018 at 10:34 AM, Lama Saba via llvm-commits
> >>>>> <llvm-commits at lists.llvm.org> wrote:
> >>>>>> Author: lsaba
> >>>>>> Date: Sun Feb 11 01:34:12 2018
> >>>>>> New Revision: 324835
> >>>>>>
> >>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=324835&view=rev
> >>>>>> Log:
> >>>>>> [X86] Reduce Store Forward Block issues in HW
> >>>>>>
> >>>>>> If a load follows a store and reloads data that the store has
> >>>>>> written to memory, Intel microarchitectures can in many cases
> >>>>>> forward the data directly from the store to the load, This "store
> >>>>>> forwarding" saves cycles by enabling the load to directly obtain
> >>>>>> the data instead of accessing the data from cache or memory.
> >>>>>> A "store forward block" occurs in cases that a store cannot be
> >>>>>> forwarded to the load. The most typical case of store forward
> >>>>>> block on Intel Core microarchiticutre that a small store cannot be
> forwarded to a large load.
> >>>>>> The estimated penalty for a store forward block is ~13 cycles.
> >>>>>>
> >>>>>> This pass tries to recognize and handle cases where "store forward
> block"
> >>>>>> is created by the compiler when lowering memcpy calls to a
> >>>>>> sequence of a load and a store.
> >>>>>>
> >>>>>> The pass currently only handles cases where memcpy is lowered to
> >>>>>> XMM/YMM registers, it tries to break the memcpy into smaller copies.
> >>>>>> breaking the memcpy should be possible since there is no atomicity
> >>>>>> guarantee for loads and stores to XMM/YMM.
> >>>>>>
> >>>>>> Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
> >>>>>>
> >>>>>> Added:
> >>>>>>     llvm/trunk/lib/Target/X86/X86FixupSFB.cpp
> >>>>>>     llvm/trunk/test/CodeGen/X86/fixup-sfb.ll
> >>>>>> Modified:
> >>>>>>     llvm/trunk/lib/Target/X86/CMakeLists.txt
> >>>>>>     llvm/trunk/lib/Target/X86/X86.h
> >>>>>>     llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
> >>> ---------------------------------------------------------------------
> >>> Intel Israel (74) Limited
> >>>
> >>> This e-mail and any attachments may contain confidential material for
> >>> the sole use of the intended recipient(s). Any review or distribution
> >>> by others is strictly prohibited. If you are not the intended
> >>> recipient, please contact the sender and delete all copies.
> >> ---------------------------------------------------------------------
> >> Intel Israel (74) Limited
> >>
> >> This e-mail and any attachments may contain confidential material for
> >> the sole use of the intended recipient(s). Any review or distribution
> >> by others is strictly prohibited. If you are not the intended
> >> recipient, please contact the sender and delete all copies.
> > ---------------------------------------------------------------------
> > Intel Israel (74) Limited
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>


-- 

Martin Böhme

Software Engineer

mboehme at google.com

Google Germany GmbH
Erika-Mann-Straße 33
80363 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und
löschen Sie die E-Mail und alle Anhänge. Vielen Dank.



This e-mail is confidential. If you are not the right addressee please do
not forward it, please inform the sender, and please erase this e-mail
including any attachments. Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180214/4029b985/attachment.html>


More information about the llvm-commits mailing list