[PATCH] D101701: [nofree] Refine concurrency requirements

Sat Jul 24 14:12:10 PDT 2021

jdoerfert added a comment.

In D101701#2902142 <https://reviews.llvm.org/D101701#2902142>, @nhaehnle wrote:

> In D101701#2752587 <https://reviews.llvm.org/D101701#2752587>, @jdoerfert wrote:
>
>> So you are arguing `armemonly` is also implying `nosync`? Then we get into the same problems we have here. You don't actually gain expressiveness, after all you can check if both attribute `X` and `nosync` are present, but you loose the ability to derive and utilize them separately. One conceptual thing to consider is that almost all of this predates `nosync` in the first place. That said, I don't see how you would interpret the text as there are no side effects by other threads:
>>
>>   argmemonly
>>   
>>       This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, with arbitrary offsets. Or in other words, all memory operations in the function can refer to memory only using pointers based on its function arguments.
>>   
>>       Note that argmemonly can be used together with readonly attribute in order to specify that function reads only from its arguments.
>>   
>>       If an argmemonly function reads or writes memory other than the pointer arguments, or has other side-effects, the behavior is undefined.
>>
>> What in the above definition prevents me from doing this:
>>
>>   void foo(atomic flag) { atomic_write(flag, 1); while(atomic_read(flag) == 1); atomic_write(flag, 2); }
>>
>> Inside the function all memory that is accesses are loads/stores from objects pointed by pointer-type arguments (with 0 offset).  Do you disagree?
>
> The part of the definition that I would argue prevents you from doing this is the "or has other side-effects".

This reading defines FunctionAttr (and others) are wrong right now when deducing almost any attribute in the absence of nosync.
Further, it should not be "other side-effects" as a general statement anyway. It would subsume `nounwind` and interact badly with infinite loops (D106749 <https://reviews.llvm.org/D106749>), both of which would interact badly with C++ and pure/const functions, as an example.

Long story short, there is still no good reason for attributes not to be composed instead of somehow defined with overlapping semantics.

>> FWIW, this basically breaks down with stuff like `malloc` which we assume is `inaccessiblememonly` but it is in fact *not*, and cannot be, `nosync`. So if we require attributes to imply `nosync`, `malloc` cannot be `inaccessiblememonly` anymore. (see https://reviews.llvm.org/D98605#2625243)
>
> There are two directions in which one can take this.
>
> First, your desired path forward appears to be that `argmemonly` on a callee can only be taken at face value if the callee is also `nosync`. Presumably, you would want the same treatment to apply to `inaccessiblememonly`. So then if `malloc` cannot be `nosync`, what's the point of making it `inaccessiblememonly`? What would that still allow you to deduce that you cannot deduce even without the `inaccessiblememonly`?

As I mentioned before, (1) this comes from a time where we did not think about synchronization much (`BuildLibCalls.cpp` does not once set `nosync` to this day)
and it was ignored widely, and (2) it helps you at the call site if you have `nosync` there. I provided call site examples before, most of them apply again.
If you combine call site information with `nofree` for the function, or `inaccessiblememonly` for the function you can actually derive things. If `nofree`,
`inaccessiblememonly`, `readonly`, ... cannot be derived in the presence of `nosync` we loose that ability. Let's make a new example though.

Let's assume this is our entire module:

  static int *X = nullptr;

  int readX() { return *X; }

  void foo() {
    if (!X)
      X = malloc(4);  // let it be known in the module.
  }

  void bar() {
    *X = 1;
    unknown_inaccessible_mem_only_but_not_nosync();   // e.g. malloc
    argmemonly_nofree_but_not_nosync(X);              // can be derived from the definition of the function and attribute can be made available through HTO or thinLTO.
    // X is still deref here because no thread could have freed it.
  }

X is in the entire module `deref_or_null(4)` but only because we have `inaccessible_mem_only` (malloc) and `argmemonly` + `nofree` (definition + HTO/thinLTO).
`nosync` is not set for any of the functions.

> Second, perhaps `malloc` can be `nosync`. I see the quote from the C standard, but it's unclear to me whether that language is good for anything other than ensuring no data races in a formal model, and surely there are other means to achieve the same goal, e.g. saying that a re-allocated portion of memory is a different memory location as far as the memory model is concerned. The only way I can see to perform meaningful synchronization through `free` and `malloc` would be to compare the result of malloc with some pointer that was allocated earlier (if they're equal, this implies that the earlier allocation was freed, and we have a synchronization edge that can be relied upon). It may be possible to make this undefined somehow, perhaps involving the non-determinism trick that is proposed for pointer provenance can be applied here as well.

You basically described the side-channel that makes malloc/free not nosync, didn't you?

  T0: foo() /* nosync */; while(1) { p = malloc(); atomic_relaxed_record(p); free(p); }
  T1: do { p = malloc(); found = atomic_relaxed_lookup(p); free(p); } while (!found); print("foo is done!");

> [...]
>
>>> 3. Today's `nofree` enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the `nofree` **argument attribute** instead, and the second is discussed above in this comment).
>>
>> I think once I manage to explain the examples you will reconsider this point. To give you another one:
>>
>>   __attribute__((nosync)) // or __attribute__((assume("nosync")) or #pragma omp assume ext_no_synchronization
>>   void user_code() {
>>     a = malloc();
>>     b = malloc();
>>     unknown_function_with_a_now_nosync_call_site(a, b);
>>     ...
>>     free(a);
>>   }
>
> Not sure what you're saying here. Do you want the unknown function to be `nofree` but it *does* contain synchronization, except that then the external `nosync` "overrides" that fact and we pray that the programmer knew what they were doing?

I'm not sure about pray but I am sure we already assume annotations by the user are correct. We also explicitly query them for
annotations (https://openmp.llvm.org/docs/remarks/OMP133.html, loop vectorizer remarks, ...) and I strongly expect this to become more frequent in the future.

I can also see us, and others, annotate libraries that use things like volatile/atomic accesses and malloc to employ nosync annotations
if we know they are sound from the callers perspective.

Finally, we started to track threads explicitly already, partially using domain knowledge, which allows us to reason about the interaction between threads 
(https://reviews.llvm.org/D106397#C2702144NL1110). So even in the presence of synchronizations (atomics, barriers, etc), we can use other attributes
(argmemonly, nofree, ...) and such thread tracking to make useful deductions. This is not possible if we interleave the `argmemonly/nofree/...` semantics
with `nosync`. The above optimization is a real thing for a very common scenario on GPUs and also CPUs:

  run_in_parallel {
    if (threadid == 0)
      effect();
    barrier();

    ... parallel stuff

    if (threadid == 0)
      effect();
    barrier();
   ...
  }

>>> That leaves as a single argument the complexity of inference, but how strong of an argument is that, really?
>>
>> I think there are a lot of reasons to keep it separated, the examples above still stand. That said, why would
>> we combine it? It saves one attribute lookup?
>
> To me the fundamental problem is having an attribute that says "calling this function doesn't free memory" on a call that does, in fact, free memory. Why would/should the caller care about whether the freeing happens in the same thread or not? I think Nuno's line of argument about separating the semantics from the implementation goes in a similar direction.

While I understand your argument it breaks a lot of things not just `nofree`. So if we wanted to do this it had to be way more involved or it is just inconsistent.
We had inaccessible_mem_only earlier or argmemonly. As noted before, the definition talks about the "memory operations in the function". We also do not annotate `nosync`
for any known builtin, etc. There is soo many things that would need to happen to do this transition in a consistent way. Instead, this picks `nofree` and argues it is
somewhat special even though it is not. See next paragraph.

> This doesn't mean that we can't have the more structural (as opposed to semantic) attribute you want to have, but I'd be in favor of naming it more accurately, e.g. `nothreadlocalfree`.

Almost all our attributes are "local" not "global". They are local because we can derive them only locally in a reasonable way and some of them make sense only locally anyway as they have
an explicit scope, most often the function. Local information can be combined with call site information, e.g., in a subsequent pass or via HTO/thinLTO, to create actionable information (see
my examples in this thread and above). Making local deduction harder/impossible is not helpful as it prevents us to combine it with other information sources, e.g., domain knowledge or explicit
thread tracking.

>>> P.S.: The example I have in mind where confusion about `argmemonly` can be used to trigger a miscompile is at https://godbolt.org/z/dEG8n6W3E. Current main merges the two loads from %q, which is incorrect since `@setup` could arrange for a second thread to be created which changes its value in between.
>>
>> Agreed this is a bug. Though, it's in instcombine (or probably MemorySSA) not the Attributor. If you disagree you have to tell me which part of the `argmemonly` definition is violated by `release`: "This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, with arbitrary offsets. Or in other words, all memory operations in the function can refer to memory only using pointers based on its function arguments."
>
> As stated above, it's at the very least the part that says there can be no other side effects. Synchronizing with another thread which then modifies other memory is surely a side effect.

The function writes pointer arguments, which is explicitly allowed in the text multiple times before it describes what it shall not do.
It writes them atomically, but that is not disallowed in any way. As I stated above, I see your point but this patch is not even close to what would be required to bake synchronization into everything. You can
read "no synchronization" into some of the lang ref wording but you have to admit `argmemonly` does talk about "accesses inside the function" twice and does not mention synchronization at all. Further,
you attribute the `@setup` side-effect to `@release`, so you argue that the `@release` function can have a side-effect on arbitrary memory including `%q` even though "the write to `%q`" is arguably not in `@release`.

Instead, the model that I see us using already and which we should embrace is:
 `@release` is not `nosync` and therefore we can see effects at the call site that were performed by other functions with which `@release` synchronized.
 That attributes the effect properly to the "other function", makes local deduction possible, and composes the attributes we have.

Only with such a model we can hope to build and use a "synchronizes-with graph" that compose well with other arguments.
As https://reviews.llvm.org/D106397#C2702144NL1110 shows, once we stop treating synchronization as a complete black box it is crucial the rest of the system
does not fallback to a pessimistic/unknown state if synchronization may be present.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101701/new/

https://reviews.llvm.org/D101701