[cfe-dev] Speculative load optimization

Tue Oct 10 13:01:34 PDT 2017

On Tue, Oct 10, 2017 at 12:23 PM, John McCall via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

>
> On Oct 10, 2017, at 2:57 PM, John McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> On Oct 10, 2017, at 12:12 PM, Hal Finkel via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> [+Richard, Chandler]
>
> On 10/09/2017 07:00 PM, Hans Wennborg via cfe-dev wrote:
>
> I am not a language lawyer, but I'll atttempt to answer anyway.
>
> On Mon, Oct 9, 2017 at 2:16 PM, Kreitzer, David L via cfe-dev<cfe-dev at lists.llvm.org> <cfe-dev at lists.llvm.org> wrote:
>
> This llvm patch, https://reviews.llvm.org/D37289, attempts to do an optimization
> that involves speculating loads. The patch itself needs some work regardless,
> but we are questioning the legality of the optimization, which is currently
> performed by both gcc and icc. The desired transformation looks like this:
>
>     typedef struct S {
>       char padding[4088];
>       struct S *p1;
>       struct S *p2;
>     } S;
>
>     struct S* f1(struct S *s, int x)
>     {
>       S *r;
>       if (x)
>         r = s->p1;
>       else
>         r = s->p2;
>       return r;
>     }
>
> TO
>
>     struct S* f1(struct S *s, int x)
>     {
>       return (x) ? s->p1 : s->p2;
>     }
>
> The fundamental question seems to be whether loading one member of struct S
> makes it valid to load other members of the same struct.
>
> Yes, I believe that's true. If we're dereferencing s on both paths, it
> must point to a struct S object, and then loading from any member of
> that object should be fine.
>
>
> I also believe that this is correct.
>
> I think that Chandler summarized some things to be careful about in this
> regard here:
>   http://lists.llvm.org/pipermail/llvm-commits/Week-
> of-Mon-20170807/477944.html
>
> Of the three points highlighted here, the second clearly might apply:
>
> 2) Related to #1, there are applications that rely on this memory model,
> for example structures where entire regions of the structure live in
> protected pages and cannot be correctly accessed.
>
>
> This, however, is clearly an extension to the standard memory model, and I
> see no reason to support this by default. Speculatively loading cache lines
> under contention from other cores might not be a good thing to do for
> performance reasons, but that's not a correctness issue.
>
>
> I agree that this optimization is legal under the C and C++ specs.
>
>
> Sorry, I'd meant to add a proviso to this.  It's legal under the C and C++
> specs to the same extent that any load speculation is.
>
> In general, load speculation is theoretically problematic in C/C++ because
> it is undefined behavior to have a race even if the race is spurious.  For
> example,
> a program containing a load that races with a store has undefined behavior
> even
> if the result of the load is never used.  Therefore, strictly speaking,
> speculating a
> load can introduce a race that didn't otherwise exist, meaning that it can
> introduce
> undefined behavior to a program,
>

This is where the argument falls down.  If there's no race in the source
code, the behavior *is* defined.  It doesn't matter that the hardware might
have a racy read in the implementation, so long as it doesn't affect the
behavior of the abstract machine.

> meaning that it's not a legal transformation.
>

There might be other reasons (maybe practical reasons) why this isn't
permitted, but it's not because it introduces UB (it doesn't).

>  Now,
> AFAIK LLVM doesn't have any multi-threaded analyses that would actually
> miscompile such a speculation, but it can still matter because e.g. TSan
> emits code
> that dynamically enforces the memory model, and it will dutifully report
> the race here.
>
> (Or, to quote from C11 5.1.2.4p28:
>
>   Transformations that introduce a speculative read of a potentially
> shared memory
>   location may not preserve the semantics of the program as defined in
> this standard,
>   since they potentially introduce a data race.
>

I think this is somewhat muddy, and essentially contradicted by the text
immediately following it:

> However, they are typically valid in the
>   context of an optimizing compiler that targets a specific machine with
> well-defined
>   semantics for data races. They would be invalid for a hypothetical
> machine that is
>   not tolerant of races or provides hardware race detection.
>
> In this sense, TSan makes an arbitrary machine race-intolerant.)
>

Right, this transformation is invalid according to TSan (and a compiler
targeting TSan should not apply it) .  Apart from such debug
implementations, I'm not aware of an extant platform on which it's
problematic.

-- James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171010/62ffa8c5/attachment.html>