[PATCH] D101701: [nofree] Refine concurrency requirements

Sat Jul 24 21:44:29 PDT 2021

On 7/24/21 2:12 PM, Johannes Doerfert via Phabricator wrote:
> jdoerfert added a comment.
>
> In D101701#2902142 <https://reviews.llvm.org/D101701#2902142>, @nhaehnle wrote:
>
>> In D101701#2752587 <https://reviews.llvm.org/D101701#2752587>, @jdoerfert wrote:
>>
>>> So you are arguing `armemonly` is also implying `nosync`? Then we get into the same problems we have here. You don't actually gain expressiveness, after all you can check if both attribute `X` and `nosync` are present, but you loose the ability to derive and utilize them separately. One conceptual thing to consider is that almost all of this predates `nosync` in the first place. That said, I don't see how you would interpret the text as there are no side effects by other threads:
>>>
>>>    argmemonly
>>>    
>>>        This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, with arbitrary offsets. Or in other words, all memory operations in the function can refer to memory only using pointers based on its function arguments.
>>>    
>>>        Note that argmemonly can be used together with readonly attribute in order to specify that function reads only from its arguments.
>>>    
>>>        If an argmemonly function reads or writes memory other than the pointer arguments, or has other side-effects, the behavior is undefined.
>>>
>>> What in the above definition prevents me from doing this:
>>>
>>>    void foo(atomic flag) { atomic_write(flag, 1); while(atomic_read(flag) == 1); atomic_write(flag, 2); }
>>>
>>> Inside the function all memory that is accesses are loads/stores from objects pointed by pointer-type arguments (with 0 offset).  Do you disagree?
>> The part of the definition that I would argue prevents you from doing this is the "or has other side-effects".
> This reading defines FunctionAttr (and others) are wrong right now when deducing almost any attribute in the absence of nosync.

So, I have to admit that I'm not following this discussion closely, but 
I want to chime in here.

Assuming that Johannes' statement above is correct, the conversation 
stops here.  This is reason for a hard reject, do-not-pass-go, end of 
conversation.  If the proposed semantics are inconsistent with the 
existing implementation and changing that is not a) in the patch, and b) 
extremely well justified, that is a fatal flaw in the proposal.

Again, this assumes Johannes' comment is correct.  I have not followed 
the discussion closely enough to have an opinion on it's correctness.

> Further, it should not be "other side-effects" as a general statement anyway. It would subsume `nounwind` and interact badly with infinite loops (D106749 <https://reviews.llvm.org/D106749>), both of which would interact badly with C++ and pure/const functions, as an example.
>
> Long story short, there is still no good reason for attributes not to be composed instead of somehow defined with overlapping semantics.
>
>>> FWIW, this basically breaks down with stuff like `malloc` which we assume is `inaccessiblememonly` but it is in fact *not*, and cannot be, `nosync`. So if we require attributes to imply `nosync`, `malloc` cannot be `inaccessiblememonly` anymore. (see https://reviews.llvm.org/D98605#2625243)
>> There are two directions in which one can take this.
>>
>> First, your desired path forward appears to be that `argmemonly` on a callee can only be taken at face value if the callee is also `nosync`. Presumably, you would want the same treatment to apply to `inaccessiblememonly`. So then if `malloc` cannot be `nosync`, what's the point of making it `inaccessiblememonly`? What would that still allow you to deduce that you cannot deduce even without the `inaccessiblememonly`?
> As I mentioned before, (1) this comes from a time where we did not think about synchronization much (`BuildLibCalls.cpp` does not once set `nosync` to this day)
> and it was ignored widely, and (2) it helps you at the call site if you have `nosync` there. I provided call site examples before, most of them apply again.
> If you combine call site information with `nofree` for the function, or `inaccessiblememonly` for the function you can actually derive things. If `nofree`,
> `inaccessiblememonly`, `readonly`, ... cannot be derived in the presence of `nosync` we loose that ability. Let's make a new example though.
>
> Let's assume this is our entire module:
>
>    static int *X = nullptr;
>    
>    int readX() { return *X; }
>    
>    void foo() {
>      if (!X)
>        X = malloc(4);  // let it be known in the module.
>    }
>    
>    void bar() {
>      *X = 1;
>      unknown_inaccessible_mem_only_but_not_nosync();   // e.g. malloc
>      argmemonly_nofree_but_not_nosync(X);              // can be derived from the definition of the function and attribute can be made available through HTO or thinLTO.
>      // X is still deref here because no thread could have freed it.
>    }
>
> X is in the entire module `deref_or_null(4)` but only because we have `inaccessible_mem_only` (malloc) and `argmemonly` + `nofree` (definition + HTO/thinLTO).
> `nosync` is not set for any of the functions.
>
>> Second, perhaps `malloc` can be `nosync`. I see the quote from the C standard, but it's unclear to me whether that language is good for anything other than ensuring no data races in a formal model, and surely there are other means to achieve the same goal, e.g. saying that a re-allocated portion of memory is a different memory location as far as the memory model is concerned. The only way I can see to perform meaningful synchronization through `free` and `malloc` would be to compare the result of malloc with some pointer that was allocated earlier (if they're equal, this implies that the earlier allocation was freed, and we have a synchronization edge that can be relied upon). It may be possible to make this undefined somehow, perhaps involving the non-determinism trick that is proposed for pointer provenance can be applied here as well.
> You basically described the side-channel that makes malloc/free not nosync, didn't you?
>
>    T0: foo() /* nosync */; while(1) { p = malloc(); atomic_relaxed_record(p); free(p); }
>    T1: do { p = malloc(); found = atomic_relaxed_lookup(p); free(p); } while (!found); print("foo is done!");
>
>
>
>> [...]
>>
>>>> 3. Today's `nofree` enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the `nofree` **argument attribute** instead, and the second is discussed above in this comment).
>>> I think once I manage to explain the examples you will reconsider this point. To give you another one:
>>>
>>>    __attribute__((nosync)) // or __attribute__((assume("nosync")) or #pragma omp assume ext_no_synchronization
>>>    void user_code() {
>>>      a = malloc();
>>>      b = malloc();
>>>      unknown_function_with_a_now_nosync_call_site(a, b);
>>>      ...
>>>      free(a);
>>>    }
>> Not sure what you're saying here. Do you want the unknown function to be `nofree` but it *does* contain synchronization, except that then the external `nosync` "overrides" that fact and we pray that the programmer knew what they were doing?
> I'm not sure about pray but I am sure we already assume annotations by the user are correct. We also explicitly query them for
> annotations (https://openmp.llvm.org/docs/remarks/OMP133.html, loop vectorizer remarks, ...) and I strongly expect this to become more frequent in the future.
>
> I can also see us, and others, annotate libraries that use things like volatile/atomic accesses and malloc to employ nosync annotations
> if we know they are sound from the callers perspective.
>
> Finally, we started to track threads explicitly already, partially using domain knowledge, which allows us to reason about the interaction between threads
> (https://reviews.llvm.org/D106397#C2702144NL1110). So even in the presence of synchronizations (atomics, barriers, etc), we can use other attributes
> (argmemonly, nofree, ...) and such thread tracking to make useful deductions. This is not possible if we interleave the `argmemonly/nofree/...` semantics
> with `nosync`. The above optimization is a real thing for a very common scenario on GPUs and also CPUs:
>
>    run_in_parallel {
>      if (threadid == 0)
>        effect();
>      barrier();
>    
>      ... parallel stuff
>    
>      if (threadid == 0)
>        effect();
>      barrier();
>     ...
>    }
>
>
>
>>>> That leaves as a single argument the complexity of inference, but how strong of an argument is that, really?
>>> I think there are a lot of reasons to keep it separated, the examples above still stand. That said, why would
>>> we combine it? It saves one attribute lookup?
>> To me the fundamental problem is having an attribute that says "calling this function doesn't free memory" on a call that does, in fact, free memory. Why would/should the caller care about whether the freeing happens in the same thread or not? I think Nuno's line of argument about separating the semantics from the implementation goes in a similar direction.
> While I understand your argument it breaks a lot of things not just `nofree`. So if we wanted to do this it had to be way more involved or it is just inconsistent.
> We had inaccessible_mem_only earlier or argmemonly. As noted before, the definition talks about the "memory operations in the function". We also do not annotate `nosync`
> for any known builtin, etc. There is soo many things that would need to happen to do this transition in a consistent way. Instead, this picks `nofree` and argues it is
> somewhat special even though it is not. See next paragraph.
>
>> This doesn't mean that we can't have the more structural (as opposed to semantic) attribute you want to have, but I'd be in favor of naming it more accurately, e.g. `nothreadlocalfree`.
> Almost all our attributes are "local" not "global". They are local because we can derive them only locally in a reasonable way and some of them make sense only locally anyway as they have
> an explicit scope, most often the function. Local information can be combined with call site information, e.g., in a subsequent pass or via HTO/thinLTO, to create actionable information (see
> my examples in this thread and above). Making local deduction harder/impossible is not helpful as it prevents us to combine it with other information sources, e.g., domain knowledge or explicit
> thread tracking.
>
>>>> P.S.: The example I have in mind where confusion about `argmemonly` can be used to trigger a miscompile is at https://godbolt.org/z/dEG8n6W3E. Current main merges the two loads from %q, which is incorrect since `@setup` could arrange for a second thread to be created which changes its value in between.
>>> Agreed this is a bug. Though, it's in instcombine (or probably MemorySSA) not the Attributor. If you disagree you have to tell me which part of the `argmemonly` definition is violated by `release`: "This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, with arbitrary offsets. Or in other words, all memory operations in the function can refer to memory only using pointers based on its function arguments."
>> As stated above, it's at the very least the part that says there can be no other side effects. Synchronizing with another thread which then modifies other memory is surely a side effect.
> The function writes pointer arguments, which is explicitly allowed in the text multiple times before it describes what it shall not do.
> It writes them atomically, but that is not disallowed in any way. As I stated above, I see your point but this patch is not even close to what would be required to bake synchronization into everything. You can
> read "no synchronization" into some of the lang ref wording but you have to admit `argmemonly` does talk about "accesses inside the function" twice and does not mention synchronization at all. Further,
> you attribute the `@setup` side-effect to `@release`, so you argue that the `@release` function can have a side-effect on arbitrary memory including `%q` even though "the write to `%q`" is arguably not in `@release`.
>
> Instead, the model that I see us using already and which we should embrace is:
>   `@release` is not `nosync` and therefore we can see effects at the call site that were performed by other functions with which `@release` synchronized.
>   That attributes the effect properly to the "other function", makes local deduction possible, and composes the attributes we have.
>
> Only with such a model we can hope to build and use a "synchronizes-with graph" that compose well with other arguments.
> As https://reviews.llvm.org/D106397#C2702144NL1110 shows, once we stop treating synchronization as a complete black box it is crucial the rest of the system
> does not fallback to a pessimistic/unknown state if synchronization may be present.
>
>
> Repository:
>    rG LLVM Github Monorepo
>
> CHANGES SINCE LAST ACTION
>    https://reviews.llvm.org/D101701/new/
>
> https://reviews.llvm.org/D101701
>