[clang] [AArch64ABI] Don't pass small types with padding as wide ints. (WIP) (PR #116615)

Mon Nov 18 05:42:21 PST 2024

https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/116615

Currently Clang returns data types with sizes <= 128 as wide integers (or pairs of wide integers). This lowering appears incorrect if the type contains padding.

For example, consider returning a type that contains 32 bits of data and 32 bits of padding (as in the example below). On AArch64, Clang uses i64 to return the result. To load the result value to return, it loads an i64, with loads the 32 bit data but also accesses 32 bits of uninitialized memory. Loading an integer where some bits are poison means the whole integer is poison (which is different to LLVM's undef)

Unless I am missing something, Clang should pass small types with padding using types that separate the data and padding bits, e.g. by using a array of i8 (assuming all fields are multiple of 8 bits) or a struct with the unpadded type as first and the padding as second element.

Computing the padding in AArch64ABI seems difficult, maybe there's a better way I am missing? For this to work at the moment, I had to adjust mustSkipTalPadding, which certainly isn't correct....

The current patch passes types with padding indirectly, unless there's only tail-padding.

struct alignas(8) S {float a;};
struct Foo {
  union {
    struct { float x; };
    S s;
  };
};
Foo bar() {  Foo r; r.s.a = 1; return r;}

Clang generates:

define i64 @bar() {
entry:
  %0 = getelementptr inbounds nuw %struct.Foo, ptr %retval, i32 0, i32 0
  %a = getelementptr inbounds nuw %struct.S, ptr %0, i32 0, i32 0
  store float 1.000000e+00, ptr %a, align 8
  %coerce.dive = getelementptr inbounds nuw %struct.Foo, ptr %retval, i32 0, i32 0
  %coerce.dive1 = getelementptr inbounds nuw %union.anon, ptr %coerce.dive, i32 0, i32 0
  %1 = load i64, ptr %coerce.dive1, align 8
  ret i64 %1
}

https://clang.godbolt.org/z/YWGdcqna7

The function can be simplified to `ret i64 poison`

https://alive2.llvm.org/ce/z/DwkvmT

>From f983323a596bf5d777ad0316853044cedf02a91a Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Mon, 18 Nov 2024 13:26:12 +0000
Subject: [PATCH] [AArch64ABI] Don't pass small types with padding as wide
 ints. (WIP)

Currently Clang returns data types with sizes <= 128 as wide integers (or
pairs of wide integers). This lowering appears incorrect if the type
contains padding.

For example, consider returning a type that contains 32 bits of data and
32 bits of padding (as in the example below). On AArch64, Clang uses i64
to return the result. To load the result value to return, it loads an
i64, with loads the 32 bit data but also accesses 32 bits of
uninitialized memory. Loading an integer where some bits are poison means
the whole integer is poison (which is different to LLVM's undef)

Unless I am missing something, Clang should pass small types with
padding using types that separate the data and padding bits, e.g. by
using a array of i8 (assuming all fields are multiple of 8 bits) or a
struct with the unpadded type as first and the padding as second
element.

Computing the padding in AArch64ABI seems difficult, maybe there's a
better way I am missing? For this to work at the moment, I had to adjust
mustSkipTalPadding, which certainly isn't correct....

The current patch passes types with padding indirectly, unless there's
only tail-padding.

struct alignas(8) S {float a;};
struct Foo {
  union {
    struct { float x; };
    S s;
  };
};
Foo bar() {  Foo r; r.s.a = 1; return r;}

Clang generates:

define i64 @bar() {
entry:
  %0 = getelementptr inbounds nuw %struct.Foo, ptr %retval, i32 0, i32 0
  %a = getelementptr inbounds nuw %struct.S, ptr %0, i32 0, i32 0
  store float 1.000000e+00, ptr %a, align 8
  %coerce.dive = getelementptr inbounds nuw %struct.Foo, ptr %retval, i32 0, i32 0
  %coerce.dive1 = getelementptr inbounds nuw %union.anon, ptr %coerce.dive, i32 0, i32 0
  %1 = load i64, ptr %coerce.dive1, align 8
  ret i64 %1
}

https://clang.godbolt.org/z/YWGdcqna7

The function can be simplified to `ret i64 poison`

https://alive2.llvm.org/ce/z/DwkvmT
---
 clang/lib/AST/RecordLayoutBuilder.cpp         |  2 +-
 clang/lib/CodeGen/Targets/AArch64.cpp         | 28 +++++++++++++++++++
 .../AArch64/small-types-with-padding.cpp      | 18 ++++++++++++
 3 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGen/AArch64/small-types-with-padding.cpp

diff --git a/clang/lib/AST/RecordLayoutBuilder.cpp b/clang/lib/AST/RecordLayoutBuilder.cpp
index b1b13b66a5e504..702c11d1e5974e 100644
--- a/clang/lib/AST/RecordLayoutBuilder.cpp
+++ b/clang/lib/AST/RecordLayoutBuilder.cpp
@@ -2450,7 +2450,7 @@ static bool mustSkipTailPadding(TargetCXXABI ABI, const CXXRecordDecl *RD) {
     // mode; fortunately, that is true because we want to assign
     // consistently semantics to the type-traits intrinsics (or at
     // least as many of them as possible).
-    return RD->isTrivial() && RD->isCXX11StandardLayout();
+    return false; //RD->isTrivial() && RD->isCXX11StandardLayout();
   }
 
   llvm_unreachable("bad tail-padding use kind");
diff --git a/clang/lib/CodeGen/Targets/AArch64.cpp b/clang/lib/CodeGen/Targets/AArch64.cpp
index 9320c6ef06efab..657db473fecaf2 100644
--- a/clang/lib/CodeGen/Targets/AArch64.cpp
+++ b/clang/lib/CodeGen/Targets/AArch64.cpp
@@ -484,6 +484,25 @@ ABIArgInfo AArch64ABIInfo::classifyArgumentType(QualType Ty, bool IsVariadicFn,
   return getNaturalAlignIndirect(Ty, /*ByVal=*/false);
 }
 
+static std::pair<unsigned, unsigned> computePadding(QualType T, ASTContext &Ctx) {
+    unsigned Padding = 0;
+    unsigned TailPadding = 0;
+    if (auto *RD = T->getAs<RecordType>()) {
+      for (auto I = RD->getDecl()->field_begin(), End = RD->getDecl()->field_end(); I != End; ++I) {
+        QualType FieldTy = I->getType();
+        if (auto *ET = FieldTy->getAs<ElaboratedType>())
+          FieldTy = ET->getNamedType();
+        if (auto *RD2 = FieldTy->getAs<RecordType>()) {
+          auto &Layout = Ctx.getASTRecordLayout(RD2->getDecl());
+          auto[SubPadding, SubTailPadding] = computePadding(I->getType(), Ctx);
+          TailPadding = Ctx.toBits(Layout.getSize() - Layout.getDataSize()) +SubTailPadding;
+          Padding += Ctx.toBits(Layout.getSize() - Layout.getDataSize()) + SubPadding;
+        }
+      }
+    }
+    return {Padding, TailPadding};
+}
+
 ABIArgInfo AArch64ABIInfo::classifyReturnType(QualType RetTy,
                                               bool IsVariadicFn) const {
   if (RetTy->isVoidType())
@@ -543,6 +562,15 @@ ABIArgInfo AArch64ABIInfo::classifyReturnType(QualType RetTy,
 
   // Aggregates <= 16 bytes are returned directly in registers or on the stack.
   if (Size <= 128) {
+    auto [Padding, TailPadding] = computePadding(RetTy, getContext());
+    // If the type contains any padding, be careful not to lower to wide types that may mix data and padding bits. E.g. using i64 for a type that has 32 bits of data and 32 bits of padding means loading the uninitialized padding bits together with data bits poisons the resulting wide type.
+    if (Padding > 0) {
+      if (Padding == TailPadding && (Size - TailPadding) % 8 == 0) {
+        llvm::Type *BaseTy = llvm::Type::getInt8Ty(getVMContext());
+        return ABIArgInfo::getDirect(llvm::ArrayType::get(BaseTy, Size / 8));
+      }
+      return getNaturalAlignIndirect(RetTy);
+    }
     if (Size <= 64 && getDataLayout().isLittleEndian()) {
       // Composite types are returned in lower bits of a 64-bit register for LE,
       // and in higher bits for BE. However, integer types are always returned
diff --git a/clang/test/CodeGen/AArch64/small-types-with-padding.cpp b/clang/test/CodeGen/AArch64/small-types-with-padding.cpp
new file mode 100644
index 00000000000000..12cdb125aa2d07
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/small-types-with-padding.cpp
@@ -0,0 +1,18 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple arm64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+
+struct alignas(8) S {float a;};
+
+struct Foo {
+  union {
+    struct { float x; };
+    S s;
+  };
+};
+
+// CHECK: define [8 x i8] @_Z3barv() #0 {
+Foo bar() {
+  Foo r;
+  r.s.a = 1;
+  return r;
+}