[clang] [clang][CodeGen] Use byval for SystemZ indirect arguments (PR #66404)

Fri Sep 15 16:28:55 PDT 2023

iii-i wrote:

Sorry, I my wording was not precise enough, it is indeed important that we create a copy, and not pass a pointer to the original. Still, what you described matches the s390x ABI:

```
1.2.2.3. Parameter Area

The parameter area shall be allocated by a calling function if some parameters cannot
be passed in registers, but must be passed on the stack instead (see section 1.2.3).

[...]

1.2.3. Parameter Passing

[...]
A struct or union of any other size, a complex type, an __int128, a long
double, a _Decimal128, or a vector whose size exceeds 16 bytes. Replace
such an argument by a pointer to the object, or to a copy where necessary
to enforce call-by-value semantics. Only if the caller can ascertain that the
object is “constant” can it pass a pointer to the object itself.
```

---

Ah, that's the source of my confusion. I didn't realize the call instruction had to make a copy, I thought it just had to be done somewhere. "The attribute implies that a hidden copy of the pointee is made between the caller and the callee" actually does mean the former, but one has to squint to see that. The way you phrased it is much clearer.

So in the following example:

```
struct foo { char x[800]; };
void bar(struct foo);
void baz(void) { struct foo f = {}; bar(f); };
```

x84_64 generates:

```
define dso_local void @baz() #0 {
  %1 = alloca %struct.foo, align 8
  call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 800, i1 false)
  call void @bar(ptr noundef byval(%struct.foo) align 8 %1)
  ret void
}
```

and relies on the backend to expand `call void @bar` into roughly `REP_MOVSQ_64` and `CALL64pcrel32`. Whereas on s390x we get:

```
define dso_local void @baz() #0 {
  %1 = alloca %struct.foo, align 1
  %2 = alloca %struct.foo, align 1
  call void @llvm.memset.p0.i64(ptr align 1 %1, i8 0, i64 800, i1 false)
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %2, ptr align 1 %1, i64 800, i1 false)
  call void @bar(ptr noundef %2)
  ret void
}
```

so the creation of the copy is explicit in the LLVM IR. Even though the ABIs are saying roughly the same thing, it's implemented differently.

I wonder if it would still be beneficial to switch s390x to byval? I think it can be done in a way that correctly implements the ABI, even though it would of course be more complex than this PR. An obvious benefit is that s390x would become more similar to x86_64, but maybe there are some drawbacks that I'm not seeing.

---

I revisited MSan's `param_tls_limit.cpp`, and the XFAIL is actually fine. The instrumentation does indeed put the shadow of the synthetic pointer into the parameters' TLS area on s390x, but this is not a problem, since the shadow of the actual value is still preserved and checked. This prevents the overflow, which the test expects, from happening, so the conclusion that the test is not applicable is correct. Sorry for the noise.

I will check if there is a different solution for DFSan. It currently passes the label of the pointer to the copy, which is always 0, instead of the label of the actual value, to vararg functions on s390x. Even though this is similar to what MSan does, the difference is that the DFSan runtime (e.g., `format_buffer`) expects the label of the actual value, regardless of the ABI.

https://github.com/llvm/llvm-project/pull/66404