[llvm] r324835 - [X86] Reduce Store Forward Block issues in HW

Hans Wennborg via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 13 02:02:33 PST 2018


Martin was able to reduce his repro to a stand-alone C++ program, and
I managed to get it working outside our internal environment.

a.cc:

#include <cstdint>
#include <iostream>
#include "/usr/include/c++/6/ext/vstring.h"

using int64 = std::int64_t;

template <typename T>
class ExplicitlyConstructed {
 public:
  void DefaultConstruct() {
    new (&union_) T();
  }

  void Destruct() {
    get_mutable()->~T();
  }

  constexpr const T& get() const { return reinterpret_cast<const T&>(union_); }
  T* get_mutable() { return reinterpret_cast<T*>(&union_); }

 private:
  // Prefer c++14 aligned_storage, but for compatibility this will do.
  union AlignedUnion {
    char space[sizeof(T)];
    int64 align_to_int64;
    void* align_to_ptr;
  } union_;
};

int main() {
  ExplicitlyConstructed<__gnu_cxx::__versa_string<char>> str;
  str.DefaultConstruct();
  *str.get_mutable() = __gnu_cxx::__versa_string<char>("world", 5);
  if (str.get() != "world") {
    std::cout << "Expected 'world' but got '" << str.get() << "'\n";
  }
  str.Destruct();

  return 0;
}

bin/clang++ -O2 /tmp/a.cc && ./a.out
Expected 'world' but got 'worl'


I'll reduce further to get rid off the library dependencies.



On Mon, Feb 12, 2018 at 5:58 PM, Saba, Lama <lama.saba at intel.com> wrote:
> Hmm.. I'm not too familiar with protobufs, is there a way I can take a look
> at the generated assembly or ll?
>
> I see that the last 4 bytes are not getting copied, so yeah it's probably
> the memcpy at initialization/return but my attempts to reproduce did not
> work..
>
>
>
>
>
> From: Martin Böhme [mailto:mboehme at google.com]
> Sent: Monday, February 12, 2018 4:56 PM
> To: Saba, Lama <lama.saba at intel.com>
> Cc: hans at chromium.org
> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in HW
>
>
>
> I've got an internal repro that essentially looks like this:
>
>
>
> proto file:
>
>
>
> message MyProto {
>
>   optional bytes foo = 1 [default = "world"];
>
> }
>
>
>
> test file:
>
>
>
> TEST(ProtoReproTest, Repro) {
>
>   MyProto proto;
>
>   EXPECT_EQ("world", proto.foo());
>
> }
>
>
>
> This fails with an actual value for proto.foo() of "worl\0". I assume it's
> probably not a coincidence that four bytes are getting copied correctly.
> Maybe this already gives you a theory to start with?
>
>
>
> I'm still working to get this building in a way that reproduces externally.
> My hope is that I'll be able to get a repro that doesn't even use protos --
> it looks to me as if the error is already happening when the string gets
> initialized.
>
>
>
>
>
> On Mon, 12 Feb 2018 at 15:42, Saba, Lama <lama.saba at intel.com> wrote:
>
> Okay thanks.
>
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans
> Wennborg
> Sent: Monday, February 12, 2018 4:39 PM
> To: Saba, Lama <lama.saba at intel.com>; Martin Böhme <mboehme at google.com>
> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in HW
>
> I don't have anything small, it would require building a part of Chromium.
>
> Martin also saw miscompiles internally, concerning initialization of
> protobuf fields. I think he's working on a repro.
>
> On Mon, Feb 12, 2018 at 3:36 PM, Saba, Lama <lama.saba at intel.com> wrote:
>> Hi,
>>
>> Thanks for notifying, I am fixing the assertion problem, regarding the
>> other failures:
>> In some of Chromium's 64-bit builds, we're seeing test failures that
>> started around the same time frame as this change, for example:
>> https://ci.chromium.org/buildbot/chromium.clang/ToTWin64/920
>> So besides the assert, I suspect this is causing miscompiles.
>>
>> Is there a way I can get a reproducer for this?
>>
>> Thanks,
>> Lama
>>
>> -----Original Message-----
>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans
>> Wennborg
>> Sent: Monday, February 12, 2018 2:47 PM
>> To: Saba, Lama <lama.saba at intel.com>
>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r324835 - [X86] Reduce Store Forward Block issues in
>> HW
>>
>> I've reverted this in r324887 as it broke the Chromium build, see
>> https://bugs.llvm.org/show_bug.cgi?id=36346
>>
>> On Sun, Feb 11, 2018 at 10:34 AM, Lama Saba via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>> Author: lsaba
>>> Date: Sun Feb 11 01:34:12 2018
>>> New Revision: 324835
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=324835&view=rev
>>> Log:
>>> [X86] Reduce Store Forward Block issues in HW
>>>
>>> If a load follows a store and reloads data that the store has written to
>>> memory, Intel microarchitectures can in many cases forward the data directly
>>> from the store to the load, This "store forwarding" saves cycles by enabling
>>> the load to directly obtain the data instead of accessing the data from
>>> cache or memory.
>>> A "store forward block" occurs in cases that a store cannot be forwarded
>>> to the load. The most typical case of store forward block on Intel Core
>>> microarchiticutre that a small store cannot be forwarded to a large load.
>>> The estimated penalty for a store forward block is ~13 cycles.
>>>
>>> This pass tries to recognize and handle cases where "store forward block"
>>> is created by the compiler when lowering memcpy calls to a sequence
>>> of a load and a store.
>>>
>>> The pass currently only handles cases where memcpy is lowered to XMM/YMM
>>> registers, it tries to break the memcpy into smaller copies.
>>> breaking the memcpy should be possible since there is no atomicity
>>> guarantee for loads and stores to XMM/YMM.
>>>
>>> Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
>>>
>>> Added:
>>>     llvm/trunk/lib/Target/X86/X86FixupSFB.cpp
>>>     llvm/trunk/test/CodeGen/X86/fixup-sfb.ll
>>> Modified:
>>>     llvm/trunk/lib/Target/X86/CMakeLists.txt
>>>     llvm/trunk/lib/Target/X86/X86.h
>>>     llvm/trunk/lib/Target/X86/X86TargetMachine.cpp


More information about the llvm-commits mailing list