[cfe-dev] Speculative load optimization
James Dennett via cfe-dev
cfe-dev at lists.llvm.org
Tue Oct 10 13:01:34 PDT 2017
On Tue, Oct 10, 2017 at 12:23 PM, John McCall via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>
> On Oct 10, 2017, at 2:57 PM, John McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> On Oct 10, 2017, at 12:12 PM, Hal Finkel via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> [+Richard, Chandler]
>
> On 10/09/2017 07:00 PM, Hans Wennborg via cfe-dev wrote:
>
> I am not a language lawyer, but I'll atttempt to answer anyway.
>
> On Mon, Oct 9, 2017 at 2:16 PM, Kreitzer, David L via cfe-dev<cfe-dev at lists.llvm.org> <cfe-dev at lists.llvm.org> wrote:
>
> This llvm patch, https://reviews.llvm.org/D37289, attempts to do an optimization
> that involves speculating loads. The patch itself needs some work regardless,
> but we are questioning the legality of the optimization, which is currently
> performed by both gcc and icc. The desired transformation looks like this:
>
> typedef struct S {
> char padding[4088];
> struct S *p1;
> struct S *p2;
> } S;
>
> struct S* f1(struct S *s, int x)
> {
> S *r;
> if (x)
> r = s->p1;
> else
> r = s->p2;
> return r;
> }
>
> TO
>
> struct S* f1(struct S *s, int x)
> {
> return (x) ? s->p1 : s->p2;
> }
>
> The fundamental question seems to be whether loading one member of struct S
> makes it valid to load other members of the same struct.
>
> Yes, I believe that's true. If we're dereferencing s on both paths, it
> must point to a struct S object, and then loading from any member of
> that object should be fine.
>
>
> I also believe that this is correct.
>
> I think that Chandler summarized some things to be careful about in this
> regard here:
> http://lists.llvm.org/pipermail/llvm-commits/Week-
> of-Mon-20170807/477944.html
>
> Of the three points highlighted here, the second clearly might apply:
>
> 2) Related to #1, there are applications that rely on this memory model,
> for example structures where entire regions of the structure live in
> protected pages and cannot be correctly accessed.
>
>
> This, however, is clearly an extension to the standard memory model, and I
> see no reason to support this by default. Speculatively loading cache lines
> under contention from other cores might not be a good thing to do for
> performance reasons, but that's not a correctness issue.
>
>
> I agree that this optimization is legal under the C and C++ specs.
>
>
> Sorry, I'd meant to add a proviso to this. It's legal under the C and C++
> specs to the same extent that any load speculation is.
>
> In general, load speculation is theoretically problematic in C/C++ because
> it is undefined behavior to have a race even if the race is spurious. For
> example,
> a program containing a load that races with a store has undefined behavior
> even
> if the result of the load is never used. Therefore, strictly speaking,
> speculating a
> load can introduce a race that didn't otherwise exist, meaning that it can
> introduce
> undefined behavior to a program,
>
This is where the argument falls down. If there's no race in the source
code, the behavior *is* defined. It doesn't matter that the hardware might
have a racy read in the implementation, so long as it doesn't affect the
behavior of the abstract machine.
> meaning that it's not a legal transformation.
>
There might be other reasons (maybe practical reasons) why this isn't
permitted, but it's not because it introduces UB (it doesn't).
> Now,
> AFAIK LLVM doesn't have any multi-threaded analyses that would actually
> miscompile such a speculation, but it can still matter because e.g. TSan
> emits code
> that dynamically enforces the memory model, and it will dutifully report
> the race here.
>
> (Or, to quote from C11 5.1.2.4p28:
>
> Transformations that introduce a speculative read of a potentially
> shared memory
> location may not preserve the semantics of the program as defined in
> this standard,
> since they potentially introduce a data race.
>
I think this is somewhat muddy, and essentially contradicted by the text
immediately following it:
> However, they are typically valid in the
> context of an optimizing compiler that targets a specific machine with
> well-defined
> semantics for data races. They would be invalid for a hypothetical
> machine that is
> not tolerant of races or provides hardware race detection.
>
> In this sense, TSan makes an arbitrary machine race-intolerant.)
>
Right, this transformation is invalid according to TSan (and a compiler
targeting TSan should not apply it) . Apart from such debug
implementations, I'm not aware of an extant platform on which it's
problematic.
-- James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171010/62ffa8c5/attachment.html>
More information about the cfe-dev
mailing list