[llvm] r324835 - [X86] Reduce Store Forward Block issues in HW

Martin Böhme via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 13 02:06:17 PST 2018


Thanks, Hans!

The ExplicitlyConstructed is from here, by the way:

https://github.com/google/protobuf/blob/9021f623e1420f513268a01a5ad43a23618a84ba/src/google/protobuf/message_lite.h#L106


On Tue, 13 Feb 2018 at 11:02, Hans Wennborg <hans at chromium.org> wrote:

> Martin was able to reduce his repro to a stand-alone C++ program, and
> I managed to get it working outside our internal environment.
>
> a.cc:
>
> #include <cstdint>
> #include <iostream>
> #include "/usr/include/c++/6/ext/vstring.h"
>
> using int64 = std::int64_t;
>
> template <typename T>
> class ExplicitlyConstructed {
>  public:
>   void DefaultConstruct() {
>     new (&union_) T();
>   }
>
>   void Destruct() {
>     get_mutable()->~T();
>   }
>
>   constexpr const T& get() const { return reinterpret_cast<const
> T&>(union_); }
>   T* get_mutable() { return reinterpret_cast<T*>(&union_); }
>
>  private:
>   // Prefer c++14 aligned_storage, but for compatibility this will do.
>   union AlignedUnion {
>     char space[sizeof(T)];
>     int64 align_to_int64;
>     void* align_to_ptr;
>   } union_;
> };
>
> int main() {
>   ExplicitlyConstructed<__gnu_cxx::__versa_string<char>> str;
>   str.DefaultConstruct();
>   *str.get_mutable() = __gnu_cxx::__versa_string<char>("world", 5);
>   if (str.get() != "world") {
>     std::cout << "Expected 'world' but got '" << str.get() << "'\n";
>   }
>   str.Destruct();
>
>   return 0;
> }
>
> bin/clang++ -O2 /tmp/a.cc && ./a.out
> Expected 'world' but got 'worl'
>
>
> I'll reduce further to get rid off the library dependencies.
>
>
>
> On Mon, Feb 12, 2018 at 5:58 PM, Saba, Lama <lama.saba at intel.com> wrote:
> > Hmm.. I'm not too familiar with protobufs, is there a way I can take a
> look
> > at the generated assembly or ll?
> >
> > I see that the last 4 bytes are not getting copied, so yeah it's probably
> > the memcpy at initialization/return but my attempts to reproduce did not
> > work..
> >
> >
> >
> >
> >
> > From: Martin Böhme [mailto:mboehme at google.com]
> > Sent: Monday, February 12, 2018 4:56 PM
> > To: Saba, Lama <lama.saba at intel.com>
> > Cc: hans at chromium.org
> > Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in
> HW
> >
> >
> >
> > I've got an internal repro that essentially looks like this:
> >
> >
> >
> > proto file:
> >
> >
> >
> > message MyProto {
> >
> >   optional bytes foo = 1 [default = "world"];
> >
> > }
> >
> >
> >
> > test file:
> >
> >
> >
> > TEST(ProtoReproTest, Repro) {
> >
> >   MyProto proto;
> >
> >   EXPECT_EQ("world", proto.foo());
> >
> > }
> >
> >
> >
> > This fails with an actual value for proto.foo() of "worl\0". I assume
> it's
> > probably not a coincidence that four bytes are getting copied correctly.
> > Maybe this already gives you a theory to start with?
> >
> >
> >
> > I'm still working to get this building in a way that reproduces
> externally.
> > My hope is that I'll be able to get a repro that doesn't even use protos
> --
> > it looks to me as if the error is already happening when the string gets
> > initialized.
> >
> >
> >
> >
> >
> > On Mon, 12 Feb 2018 at 15:42, Saba, Lama <lama.saba at intel.com> wrote:
> >
> > Okay thanks.
> >
> > -----Original Message-----
> > From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> Hans
> > Wennborg
> > Sent: Monday, February 12, 2018 4:39 PM
> > To: Saba, Lama <lama.saba at intel.com>; Martin Böhme <mboehme at google.com>
> > Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in
> HW
> >
> > I don't have anything small, it would require building a part of
> Chromium.
> >
> > Martin also saw miscompiles internally, concerning initialization of
> > protobuf fields. I think he's working on a repro.
> >
> > On Mon, Feb 12, 2018 at 3:36 PM, Saba, Lama <lama.saba at intel.com> wrote:
> >> Hi,
> >>
> >> Thanks for notifying, I am fixing the assertion problem, regarding the
> >> other failures:
> >> In some of Chromium's 64-bit builds, we're seeing test failures that
> >> started around the same time frame as this change, for example:
> >> https://ci.chromium.org/buildbot/chromium.clang/ToTWin64/920
> >> So besides the assert, I suspect this is causing miscompiles.
> >>
> >> Is there a way I can get a reproducer for this?
> >>
> >> Thanks,
> >> Lama
> >>
> >> -----Original Message-----
> >> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> Hans
> >> Wennborg
> >> Sent: Monday, February 12, 2018 2:47 PM
> >> To: Saba, Lama <lama.saba at intel.com>
> >> Cc: llvm-commits <llvm-commits at lists.llvm.org>
> >> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in
> >> HW
> >>
> >> I've reverted this in r324887 as it broke the Chromium build, see
> >> https://bugs.llvm.org/show_bug.cgi?id=36346
> >>
> >> On Sun, Feb 11, 2018 at 10:34 AM, Lama Saba via llvm-commits
> >> <llvm-commits at lists.llvm.org> wrote:
> >>> Author: lsaba
> >>> Date: Sun Feb 11 01:34:12 2018
> >>> New Revision: 324835
> >>>
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=324835&view=rev
> >>> Log:
> >>> [X86] Reduce Store Forward Block issues in HW
> >>>
> >>> If a load follows a store and reloads data that the store has written
> to
> >>> memory, Intel microarchitectures can in many cases forward the data
> directly
> >>> from the store to the load, This "store forwarding" saves cycles by
> enabling
> >>> the load to directly obtain the data instead of accessing the data from
> >>> cache or memory.
> >>> A "store forward block" occurs in cases that a store cannot be
> forwarded
> >>> to the load. The most typical case of store forward block on Intel Core
> >>> microarchiticutre that a small store cannot be forwarded to a large
> load.
> >>> The estimated penalty for a store forward block is ~13 cycles.
> >>>
> >>> This pass tries to recognize and handle cases where "store forward
> block"
> >>> is created by the compiler when lowering memcpy calls to a sequence
> >>> of a load and a store.
> >>>
> >>> The pass currently only handles cases where memcpy is lowered to
> XMM/YMM
> >>> registers, it tries to break the memcpy into smaller copies.
> >>> breaking the memcpy should be possible since there is no atomicity
> >>> guarantee for loads and stores to XMM/YMM.
> >>>
> >>> Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
> >>>
> >>> Added:
> >>>     llvm/trunk/lib/Target/X86/X86FixupSFB.cpp
> >>>     llvm/trunk/test/CodeGen/X86/fixup-sfb.ll
> >>> Modified:
> >>>     llvm/trunk/lib/Target/X86/CMakeLists.txt
> >>>     llvm/trunk/lib/Target/X86/X86.h
> >>>     llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
>


-- 

Martin Böhme

Software Engineer

mboehme at google.com

Google Germany GmbH
Erika-Mann-Straße 33
80363 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und
löschen Sie die E-Mail und alle Anhänge. Vielen Dank.



This e-mail is confidential. If you are not the right addressee please do
not forward it, please inform the sender, and please erase this e-mail
including any attachments. Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180213/73bf8649/attachment.html>


More information about the llvm-commits mailing list