[clang] [clang] Fix self-capturing `__block` variables (PR #89475)

via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 19 17:10:46 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang

Author: None (ille-apple)

<details>
<summary>Changes</summary>

This is an updated version of https://reviews.llvm.org/D90434 from 2020.  Due to time constraints I stopped working on the patch, and unfortunately never got back around to it until now.

Clang special-cases `__block` variables whose initializer contains a block literal that captures the variable itself.  But this special case doesn't work for variables of record types, causing Clang to emit broken code.

Fix this by initializing such variables directly on the heap, and add a warning for when this needs to be done.

### Background on the special case

`__block` variables are unique in that their address can change during the lifetime of the variable.  Each `__block` variable starts on the stack.  But when a block is retained using `Block_copy`, both the block itself and any `__block` variables it captures are moved to the heap.

What's tricky is when this move happens while the variable is still being initialized.  For example:

```c
int copyBlockAndReturnInt(void (^block)(void)) {
    Block_copy(block);
    return 42;
}

int main() {
  __block int val = copyBlockAndReturnInt(^{
      printf("%d\n", val);
  });
}
```

Here, when `Block_copy` is called, `val` is moved to the heap while still in an uninitialized state.  Then when `copyBlockAndReturnInt` returns, `main` needs to store the return value to the variable's new location, not to the original stack slot.

This can only happen if a block literal that captures the variable is located syntactically within the initializer itself.  If it's before the initializer then it couldn't capture the variable, and if it's after the initializer then it couldn't execute during initialization.

So Clang detects when such a literal exists inside the initializer; `CGDecl.cpp` refers to this condition as `capturedByInit`.  If it does, then the initialization code operates in a special mode.  In several functions in that file, there's an `LValue` object and an accompanying `capturedByInit` flag. If `capturedByInit` is false (the typical case), then the lvalue points to the object being initialized.  But if it's true, the lvalue points to the byref header.  Then, depending on the type of the variable, you end up at one of many checks of the form:

```cpp
if (capturedByInit)
  drillIntoBlockVariable(*this, lvalue, cast<VarDecl>(D));
```

`drillIntoBlockVariable` emits a load of the forwarding pointer from the byref header (to find the object's current location), and then modifies `lvalue` to point to the object, unifying the two cases.  Importantly, this happens at the last minute: after emitting code to evaluate the initializer, but before emitting the `store` instruction to write the evaluated value into the variable.

Okay, but then why are there two different code paths?  If the address calculation can be safely deferred, why not just do that all the time rather than having a separate mode just for self-capturing `__block` variables?

Because it *can't* always be safely deferred.

### Harder cases

First, consider the case where variable being initialized is a C struct.  This is already problematic.  `drillIntoBlockVariable` is supposed to be called after evaluating the initializer but before doing a `store`.  But when initializing a struct, Clang doesn't evaluate the whole initializer before storing.  Instead it evaluates the first field, stores the first field, evaluates the second field, stores the second field, and so on.  This is actually necessary for correctness[^1] given code like:

```c
struct Foo { int a, b; };
struct Foo foo = {1, foo.a};
```

But if any one of the field evaluations can move the object, then Clang would need to reload the forwarding pointer before storing each field.  And that would need to apply recursively to any subfields.  Such a scheme is definitely possible, but Clang doesn't implement it.

And if instead of a C struct you have a C++ class, supporting relocation to the heap becomes flat-out impossible.  Consider:

```cpp
struct Foo {
    void (^block_)(void);
    Foo(void (^block)(void)) {
        printf("I am at %p\n", this);
        block_ = Block_copy(block);
        printf("I am now at %p\n", this); // shouldn't change!
    }
};

__block Foo foo(^{
    printf("%p\n", &foo);
});
```

C++ constructors can observe the address of the object being constructed, and expect that address not to change.  Once the object has finished being constructed, it can be relocated by calling the move or copy constructor, and that's what Clang uses for block captures in the non-self-capturing case.  But in this case, `Block_copy` is called while the stack object is still running its constructor.  It would be nonsensical to try to move out of it in this state.

### Clang's current behavior

So what does Clang currently do?  Well, for both C structs and C++ classes (as well as unions[^2]), it runs into this TODO dating to 2011:

```cpp
case TEK_Aggregate:
  [...]
    // TODO: how can we delay here if D is captured by its initializer?
```

Clang simply neglects to call `drillIntoBlockVariable` and falls into its normal initialization code.  That means the lvalue is initialized as if it pointed to the object, when it actually points to the byref header.  As such, the generated code is non-functional.  And this is true for all `__block` variables of record type that Clang detects as self-capturing, even if the initializer never actually calls `Block_copy`.

### Solution

As stated, it's impossible for self-capturing variables of at least some record types to support moving to the heap.  But there is a way to make them work regardless.  Any given `__block` variable can move at most once, from the stack to the heap.  Once on the heap, it never needs to move again.  So, as suggested by @<!-- -->rjmccall in 2020, we can create a heap allocation before initialization even starts, and then initialize the object directly on the heap, knowing its address will remain stable.

This patch applies this approach to all self-capturing `__block` variables with record type, i.e. all variables that previously triggered the broken code generation.

The obvious downside of the approach that we pay the cost of a heap allocation even if `Block_copy` is never called.  Conveniently, this can never be a performance regression *per se*, since the affected cases didn't work at all before.  Still, there is room for future improvement, such as by making mid-initialization relocation work for C structs[^3], or detecting initializers that do contain a block literal but can't possibly call `Block_copy` on it.

To assist performance-sensitive codebases, this patch adds a warning, `-Wimplicit-block-var-alloc`, that triggers whenever a block requires an implicit allocation.  Since most codebases probably don't care, this warning is disabled by default and is not included in any standard warning groups.

### Allocation failure

Another downside of the approach is that because the heap allocation is completely implicit, there is no way to check for allocation failure.  Normally, heap allocation happens only when you call `Block_copy`, which means you can catch allocation failure by checking whether `Block_copy` returns NULL.

...Or so I thought.  It turns out that checking `Block_copy`'s return value is only a half-measure.  `Block_copy` does return NULL if there's an allocation failure when moving the block itself to the heap.  But then `Block_copy` calls into the copy helper to move captured variables to the heap, which can also fail.  If it does, the copy helper has no way to report that failure.  Instead you just get a crash (which should probably be a proper abort; Apple folks, see rdar://126632757).  This limitation is more or less baked into the ABI, since copy helpers return void.  Even xnu, which uses blocks in kernel space, does not support allocation failure for them - though, luckily, xnu's version [waits for memory to be available rather than crashing](https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/libkern/libclosure/runtime.cpp#L49).

The point is, if recovery from allocation failure is already not a thing, then this patch isn't responsible for breaking it.  Even if blocks are changed someday to support it, it would probably be desired only by a tiny minority of codebases, and those codebases could avoid implicit allocations by enabling the new warning and setting it as an error.

### ABI limitation

The blocks runtime has no API to allocate a byref struct directly on the heap, because it's never been necessary before.  There is only `_Block_object_assign` to move one from the stack to the heap.  It would be nice to add an API to allocate directly on the heap, but for the dual sakes of backward compatibility and laziness, I have not attempted to do so.

Instead, the codegen works with the existing runtime, as follows.  First, a byref struct is allocated on the stack (including space for the object), but only the header is initialized; the object part is left uninitialized.  Then `_Block_object_assign` is called to migrate it to the heap (which unfortunately pays the cost of memcpying the uninitialized stack memory to the heap).  This produces the desired heap allocation, but also increases the reference count, which is balanced by immediately calling `_Block_object_dispose`.  After that, the object can finally be initialized inside the new heap allocation.

### Improved self-capture analysis

The existing check for self-capturing `__block` variables is implemented in the function `isCapturedBy`.  But it turns out that that implementation is overcomplicated for no good reason.  In ancient days (2011), `isCapturedBy` only worked on `Expr`s, and it recursed on the `Expr`'s children, which it assumed would also be `Expr`s.  This assumption turned out to be false for GNU statement expressions.  The simple fix would have been to change `isCapturedBy` to accept and recurse on arbitrary `Stmt`s instead, since `children()` is defined for all `Stmt`s, not just `Expr`s.  I'm not sure if there was some reason that wasn't possible at the time, but what happened instead is that a special case was added for statement expressions, which over a series of patches got more and more convoluted.

This patch replaces the whole thing with an `EvaluatedExprVisitor` (another suggestion from @<!-- -->rjmccall).  This is not only simpler, but also more precise, eliminating false positive self-captures including:

- Block literals in unevaluated contexts (`sizeof(^{ ... })`).

- Some types of statements inside GNU statement expressions.

This patch also moves the self-capture check earlier in the compilation process, from CodeGen to Sema.  This is necessary in order to support the aforementioned warning.

### Constant initialization

This patch also fixes a related bug where self-capturing `__block` variables could trigger this assert:

```cpp
assert(!capturedByInit && "constant init contains a capturing block?");
```

The assert is important, because constant initialization doesn't support the `drillIntoBlockVariable` mechanism.  Now, a constant initializer obviously can't include a call to `Block_copy`; in fact, it can't include block literals at all unless they have no captures.  But I found two cases where the assert can be triggered anyway.

First, the initializer can syntactically contain a capturing block literal but never evaluate it:

```cpp
__block constexpr int foo = true ? 1 : ^{ return foo; }();
```

This is addressed in this patch by forcing constant-initialized variables to be treated as non-self-capturing.

Second, the initializer can simply not be constant.  With variables of types like `const int`, Clang first tries to constant-evaluate the initializer, then falls back to runtime initialization if that fails.  But the assert is located in a place where it triggers just based on the try.  This is addressed in this patch by moving the assert down and conditioning it on constant evaluation succeeding.

[^1]: Supporting such code appears to be required by the C++ spec (see
    https://timsong-cpp.github.io/cppwp/n4861/dcl.init.aggr#<!-- -->6), though not
    by the C spec (see https://port70.net/~nsz/c/c11/n1570.html#<!-- -->6.7.9p23).

[^2]: There are other `TEK_Aggregate` types besides records, but all such
    types either can't be captured by blocks (arrays) or can't be the
    type of a local variable at all.

[^3]: Though even in C, moving mid-initializer could break code that took the address of the variable:

    ```c
    struct Foo { void *a; void *(^b)(void); };
    __block struct Foo x = {&x.b, Block_copy(^{ return x.a; })};
    assert(x.a == &x.b); // fails without up-front heap allocation
    ```

    However, it's a bit less obvious that such code *needs* to work; perhaps the user should just be aware the variable will be moving.  It's not like C++ where construction is clearly an indivisible operation performed at a single address.



---

Patch is 33.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89475.diff


12 Files Affected:

- (modified) clang/include/clang/AST/Decl.h (+20-2) 
- (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+6) 
- (modified) clang/lib/AST/Decl.cpp (+13) 
- (modified) clang/lib/CodeGen/CGBlocks.cpp (+57-24) 
- (modified) clang/lib/CodeGen/CGBlocks.h (+1) 
- (modified) clang/lib/CodeGen/CGDecl.cpp (+32-82) 
- (modified) clang/lib/CodeGen/CodeGenFunction.h (+11-1) 
- (modified) clang/lib/CodeGen/CodeGenModule.h (+6-2) 
- (modified) clang/lib/Sema/Sema.cpp (+65) 
- (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+1) 
- (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+4-2) 
- (modified) clang/test/CodeGenCXX/block-capture.cpp (+124-3) 


``````````diff
diff --git a/clang/include/clang/AST/Decl.h b/clang/include/clang/AST/Decl.h
index 8b121896d66d15..dc816786d7f11d 100644
--- a/clang/include/clang/AST/Decl.h
+++ b/clang/include/clang/AST/Decl.h
@@ -1103,6 +1103,11 @@ class VarDecl : public DeclaratorDecl, public Redeclarable<VarDecl> {
 
     LLVM_PREFERRED_TYPE(bool)
     unsigned IsCXXCondDecl : 1;
+
+    /// Whether this is a __block variable that is captured by a block
+    /// referenced in its own initializer.
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned CapturedByOwnInit : 1;
   };
 
   union {
@@ -1588,10 +1593,23 @@ class VarDecl : public DeclaratorDecl, public Redeclarable<VarDecl> {
   /// escaping block.
   bool isNonEscapingByref() const;
 
-  void setEscapingByref() {
-    NonParmVarDeclBits.EscapingByref = true;
+  void setEscapingByref();
+
+  /// Indicates this is a __block variable that is captured by a block
+  /// referenced in its own initializer.
+  bool isCapturedByOwnInit() const {
+    return !isa<ParmVarDecl>(this) && NonParmVarDeclBits.CapturedByOwnInit;
   }
 
+  void setCapturedByOwnInit() {
+    assert(!isa<ParmVarDecl>(this));
+    NonParmVarDeclBits.CapturedByOwnInit = true;
+  }
+
+  /// Check if this is a __block variable which needs to be initialized
+  /// directly on the heap.
+  bool needsInitOnHeap() const;
+
   bool isCXXCondDecl() const {
     return isa<ParmVarDecl>(this) ? false : NonParmVarDeclBits.IsCXXCondDecl;
   }
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 1bbe76ff6bd2ac..f19ee8fbc21f3a 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2356,6 +2356,12 @@ def note_var_fixit_add_initialization : Note<
 def note_uninit_fixit_remove_cond : Note<
   "remove the %select{'%1' if its condition|condition if it}0 "
   "is always %select{false|true}2">;
+def warn_implicit_block_var_alloc : Warning<
+  "variable %q0 will be initialized in a heap allocation">,
+  InGroup<DiagGroup<"implicit-block-var-alloc">>, DefaultIgnore;
+def note_because_captured_by_block : Note<
+  "because it is captured by a block used in its own initializer">;
+
 def err_init_incomplete_type : Error<"initialization of incomplete type %0">;
 def err_list_init_in_parens : Error<
   "cannot initialize %select{non-class|reference}0 type %1 with a "
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 474e0ccde5bbf7..fda9010ba126d7 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -2680,6 +2680,19 @@ bool VarDecl::isNonEscapingByref() const {
   return hasAttr<BlocksAttr>() && !NonParmVarDeclBits.EscapingByref;
 }
 
+void VarDecl::setEscapingByref() {
+  assert(!isa<ParmVarDecl>(this) && hasAttr<BlocksAttr>());
+  NonParmVarDeclBits.EscapingByref = true;
+}
+
+bool VarDecl::needsInitOnHeap() const {
+  // We need direct initialization on the heap for self-capturing __block
+  // variables of types that cause CodeGenFunction::getEvaluationKind to return
+  // TEK_Aggregate.  The only such types that can be captured are records.
+  return isCapturedByOwnInit() &&
+         getType().getAtomicUnqualifiedType()->isRecordType();
+}
+
 bool VarDecl::hasDependentAlignment() const {
   QualType T = getType();
   return T->isDependentType() || T->isUndeducedType() ||
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index 2742c39965b2c8..4bea057a15a52c 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -45,6 +45,13 @@ CGBlockInfo::CGBlockInfo(const BlockDecl *block, StringRef name)
     name = name.substr(1);
 }
 
+BlockByrefHelpers::BlockByrefHelpers(const BlockByrefInfo &info)
+    : CopyHelper(nullptr), DisposeHelper(nullptr),
+      // The alignment we care about for the purposes of uniquing byref
+      // helpers is the alignment of the actual byref value field.
+      Alignment(info.ByrefAlignment.alignmentAtOffset(info.FieldOffset)),
+      ForceNoopCopy(info.ForceNoopCopy) {}
+
 // Anchor the vtable to this translation unit.
 BlockByrefHelpers::~BlockByrefHelpers() {}
 
@@ -2127,8 +2134,8 @@ class ObjectByrefHelpers final : public BlockByrefHelpers {
   BlockFieldFlags Flags;
 
 public:
-  ObjectByrefHelpers(CharUnits alignment, BlockFieldFlags flags)
-    : BlockByrefHelpers(alignment), Flags(flags) {}
+  ObjectByrefHelpers(const BlockByrefInfo &info, BlockFieldFlags flags)
+      : BlockByrefHelpers(info), Flags(flags) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2161,7 +2168,7 @@ class ObjectByrefHelpers final : public BlockByrefHelpers {
 /// Emits the copy/dispose helpers for an ARC __block __weak variable.
 class ARCWeakByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCWeakByrefHelpers(CharUnits alignment) : BlockByrefHelpers(alignment) {}
+  ARCWeakByrefHelpers(const BlockByrefInfo &info) : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2182,7 +2189,7 @@ class ARCWeakByrefHelpers final : public BlockByrefHelpers {
 /// that's not of block-pointer type.
 class ARCStrongByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCStrongByrefHelpers(CharUnits alignment) : BlockByrefHelpers(alignment) {}
+  ARCStrongByrefHelpers(const BlockByrefInfo &info) : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2218,8 +2225,8 @@ class ARCStrongByrefHelpers final : public BlockByrefHelpers {
 /// variable that's of block-pointer type.
 class ARCStrongBlockByrefHelpers final : public BlockByrefHelpers {
 public:
-  ARCStrongBlockByrefHelpers(CharUnits alignment)
-    : BlockByrefHelpers(alignment) {}
+  ARCStrongBlockByrefHelpers(const BlockByrefInfo &info)
+      : BlockByrefHelpers(info) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2248,9 +2255,9 @@ class CXXByrefHelpers final : public BlockByrefHelpers {
   const Expr *CopyExpr;
 
 public:
-  CXXByrefHelpers(CharUnits alignment, QualType type,
+  CXXByrefHelpers(const BlockByrefInfo &info, QualType type,
                   const Expr *copyExpr)
-    : BlockByrefHelpers(alignment), VarType(type), CopyExpr(copyExpr) {}
+    : BlockByrefHelpers(info), VarType(type), CopyExpr(copyExpr) {}
 
   bool needsCopy() const override { return CopyExpr != nullptr; }
   void emitCopy(CodeGenFunction &CGF, Address destField,
@@ -2276,8 +2283,8 @@ class NonTrivialCStructByrefHelpers final : public BlockByrefHelpers {
   QualType VarType;
 
 public:
-  NonTrivialCStructByrefHelpers(CharUnits alignment, QualType type)
-    : BlockByrefHelpers(alignment), VarType(type) {}
+  NonTrivialCStructByrefHelpers(const BlockByrefInfo &info, QualType type)
+      : BlockByrefHelpers(info), VarType(type) {}
 
   void emitCopy(CodeGenFunction &CGF, Address destField,
                 Address srcField) override {
@@ -2336,7 +2343,7 @@ generateByrefCopyHelper(CodeGenFunction &CGF, const BlockByrefInfo &byrefInfo,
     // Create a scope with an artificial location for the body of this function.
   auto AL = ApplyDebugLocation::CreateArtificial(CGF);
 
-  if (generator.needsCopy()) {
+  if (generator.needsCopy() && !generator.ForceNoopCopy) {
     // dst->x
     Address destField = CGF.GetAddrOfLocalVar(&Dst);
     destField = Address(CGF.Builder.CreateLoad(destField), byrefInfo.Type,
@@ -2458,18 +2465,13 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
 
   auto &byrefInfo = getBlockByrefInfo(&var);
 
-  // The alignment we care about for the purposes of uniquing byref
-  // helpers is the alignment of the actual byref value field.
-  CharUnits valueAlignment =
-    byrefInfo.ByrefAlignment.alignmentAtOffset(byrefInfo.FieldOffset);
-
   if (const CXXRecordDecl *record = type->getAsCXXRecordDecl()) {
     const Expr *copyExpr =
         CGM.getContext().getBlockVarCopyInit(&var).getCopyExpr();
     if (!copyExpr && record->hasTrivialDestructor()) return nullptr;
 
     return ::buildByrefHelpers(
-        CGM, byrefInfo, CXXByrefHelpers(valueAlignment, type, copyExpr));
+        CGM, byrefInfo, CXXByrefHelpers(byrefInfo, type, copyExpr));
   }
 
   // If type is a non-trivial C struct type that is non-trivial to
@@ -2477,7 +2479,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
   if (type.isNonTrivialToPrimitiveDestructiveMove() == QualType::PCK_Struct ||
       type.isDestructedType() == QualType::DK_nontrivial_c_struct)
     return ::buildByrefHelpers(
-        CGM, byrefInfo, NonTrivialCStructByrefHelpers(valueAlignment, type));
+        CGM, byrefInfo, NonTrivialCStructByrefHelpers(byrefInfo, type));
 
   // Otherwise, if we don't have a retainable type, there's nothing to do.
   // that the runtime does extra copies.
@@ -2499,7 +2501,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
     // byref routines.
     case Qualifiers::OCL_Weak:
       return ::buildByrefHelpers(CGM, byrefInfo,
-                                 ARCWeakByrefHelpers(valueAlignment));
+                                 ARCWeakByrefHelpers(byrefInfo));
 
     // ARC __strong __block variables need to be retained.
     case Qualifiers::OCL_Strong:
@@ -2507,13 +2509,13 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
       // transfer possible.
       if (type->isBlockPointerType()) {
         return ::buildByrefHelpers(CGM, byrefInfo,
-                                   ARCStrongBlockByrefHelpers(valueAlignment));
+                                   ARCStrongBlockByrefHelpers(byrefInfo));
 
       // Otherwise, we transfer ownership of the retain from the stack
       // to the heap.
       } else {
         return ::buildByrefHelpers(CGM, byrefInfo,
-                                   ARCStrongByrefHelpers(valueAlignment));
+                                   ARCStrongByrefHelpers(byrefInfo));
       }
     }
     llvm_unreachable("fell out of lifetime switch!");
@@ -2533,7 +2535,7 @@ CodeGenFunction::buildByrefHelpers(llvm::StructType &byrefType,
     flags |= BLOCK_FIELD_IS_WEAK;
 
   return ::buildByrefHelpers(CGM, byrefInfo,
-                             ObjectByrefHelpers(valueAlignment, flags));
+                             ObjectByrefHelpers(byrefInfo, flags));
 }
 
 Address CodeGenFunction::emitBlockByrefAddress(Address baseAddr,
@@ -2653,11 +2655,39 @@ const BlockByrefInfo &CodeGenFunction::getBlockByrefInfo(const VarDecl *D) {
   info.FieldOffset = varOffset;
   info.ByrefAlignment = std::max(varAlign, getPointerAlign());
 
+  // If we're initializing directly on the heap, then we should emit a
+  // no-op copy helper, both because we don't need a real one (the
+  // object will never move), and because a real one would break the
+  // pre-init _Block_object_assign.
+  info.ForceNoopCopy = D->needsInitOnHeap();
+
   auto pair = BlockByrefInfos.insert({D, info});
   assert(pair.second && "info was inserted recursively?");
   return pair.first->second;
 }
 
+void CodeGenFunction::emitByrefInitOnHeap(llvm::Value *P) {
+  // The object itself is initialized directly on the heap.  But for ABI
+  // backwards compatibility reasons, we have to set up a fake byref struct on
+  // the stack (with the structural components initialized but not the object
+  // itself), then call _Block_object_assign to move it to the heap (which is
+  // safe because we forced a no-op copy helper), then call
+  // _Block_object_dispose to release the extra ref from _Block_object_assign.
+  //
+  // 'P' points to the fake byref struct.
+
+  BlockFieldFlags flags = BLOCK_FIELD_IS_BYREF;
+  // Ignored out-parameter.  We'll use the forwarding pointer instead.
+  RawAddress out = CreateDefaultAlignTempAlloca(P->getType(), "initOnHeap.out");
+
+  llvm::Value *args[] = {Builder.CreateBitCast(out.getPointer(), VoidPtrTy),
+                         Builder.CreateBitCast(P, VoidPtrTy),
+                         llvm::ConstantInt::get(Int32Ty, flags.getBitMask())};
+  EmitNounwindRuntimeCall(CGM.getBlockObjectAssign(), args);
+
+  BuildBlockRelease(P, flags, false);
+}
+
 /// Initialize the structural components of a __block variable, i.e.
 /// everything but the actual object.
 void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
@@ -2699,8 +2729,8 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
   storeHeaderField(V, getPointerSize(), "byref.isa");
 
   // Store the address of the variable into its own forwarding pointer.
-  storeHeaderField(addr.emitRawPointer(*this), getPointerSize(),
-                   "byref.forwarding");
+  llvm::Value *pointer = addr.emitRawPointer(*this);
+  storeHeaderField(pointer, getPointerSize(), "byref.forwarding");
 
   // Blocks ABI:
   //   c) the flags field is set to either 0 if no helper functions are
@@ -2764,6 +2794,9 @@ void CodeGenFunction::emitByrefStructureInit(const AutoVarEmission &emission) {
     auto layoutInfo = CGM.getObjCRuntime().BuildByrefLayout(CGM, type);
     storeHeaderField(layoutInfo, getPointerSize(), "byref.layout");
   }
+
+  if (emission.NeedsInitOnHeap)
+    emitByrefInitOnHeap(pointer);
 }
 
 void CodeGenFunction::BuildBlockRelease(llvm::Value *V, BlockFieldFlags flags,
diff --git a/clang/lib/CodeGen/CGBlocks.h b/clang/lib/CodeGen/CGBlocks.h
index 8d10c4f69b2026..526dbbfd9b38d3 100644
--- a/clang/lib/CodeGen/CGBlocks.h
+++ b/clang/lib/CodeGen/CGBlocks.h
@@ -139,6 +139,7 @@ class BlockByrefInfo {
   unsigned FieldIndex;
   CharUnits ByrefAlignment;
   CharUnits FieldOffset;
+  bool ForceNoopCopy;
 };
 
 /// Represents a type of copy/destroy operation that should be performed for an
diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
index ce6d6d8956076e..954dd1079215fd 100644
--- a/clang/lib/CodeGen/CGDecl.cpp
+++ b/clang/lib/CodeGen/CGDecl.cpp
@@ -1450,6 +1450,8 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) {
 
   bool isEscapingByRef = D.isEscapingByref();
   emission.IsEscapingByRef = isEscapingByRef;
+  emission.IsCapturedByOwnInit = D.isCapturedByOwnInit();
+  emission.NeedsInitOnHeap = D.needsInitOnHeap();
 
   CharUnits alignment = getContext().getDeclAlign(&D);
 
@@ -1692,68 +1694,6 @@ CodeGenFunction::EmitAutoVarAlloca(const VarDecl &D) {
   return emission;
 }
 
-static bool isCapturedBy(const VarDecl &, const Expr *);
-
-/// Determines whether the given __block variable is potentially
-/// captured by the given statement.
-static bool isCapturedBy(const VarDecl &Var, const Stmt *S) {
-  if (const Expr *E = dyn_cast<Expr>(S))
-    return isCapturedBy(Var, E);
-  for (const Stmt *SubStmt : S->children())
-    if (isCapturedBy(Var, SubStmt))
-      return true;
-  return false;
-}
-
-/// Determines whether the given __block variable is potentially
-/// captured by the given expression.
-static bool isCapturedBy(const VarDecl &Var, const Expr *E) {
-  // Skip the most common kinds of expressions that make
-  // hierarchy-walking expensive.
-  E = E->IgnoreParenCasts();
-
-  if (const BlockExpr *BE = dyn_cast<BlockExpr>(E)) {
-    const BlockDecl *Block = BE->getBlockDecl();
-    for (const auto &I : Block->captures()) {
-      if (I.getVariable() == &Var)
-        return true;
-    }
-
-    // No need to walk into the subexpressions.
-    return false;
-  }
-
-  if (const StmtExpr *SE = dyn_cast<StmtExpr>(E)) {
-    const CompoundStmt *CS = SE->getSubStmt();
-    for (const auto *BI : CS->body())
-      if (const auto *BIE = dyn_cast<Expr>(BI)) {
-        if (isCapturedBy(Var, BIE))
-          return true;
-      }
-      else if (const auto *DS = dyn_cast<DeclStmt>(BI)) {
-          // special case declarations
-          for (const auto *I : DS->decls()) {
-              if (const auto *VD = dyn_cast<VarDecl>((I))) {
-                const Expr *Init = VD->getInit();
-                if (Init && isCapturedBy(Var, Init))
-                  return true;
-              }
-          }
-      }
-      else
-        // FIXME. Make safe assumption assuming arbitrary statements cause capturing.
-        // Later, provide code to poke into statements for capture analysis.
-        return true;
-    return false;
-  }
-
-  for (const Stmt *SubStmt : E->children())
-    if (isCapturedBy(Var, SubStmt))
-      return true;
-
-  return false;
-}
-
 /// Determine whether the given initializer is trivial in the sense
 /// that it requires no code to be generated.
 bool CodeGenFunction::isTrivialInitializer(const Expr *Init) {
@@ -1918,8 +1858,7 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
   // Check whether this is a byref variable that's potentially
   // captured and moved by its own initializer.  If so, we'll need to
   // emit the initializer first, then copy into the variable.
-  bool capturedByInit =
-      Init && emission.IsEscapingByRef && isCapturedBy(D, Init);
+  bool capturedByInit = emission.IsCapturedByOwnInit;
 
   bool locIsByrefHeader = !capturedByInit;
   const Address Loc =
@@ -1951,8 +1890,9 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
   llvm::Constant *constant = nullptr;
   if (emission.IsConstantAggregate ||
       D.mightBeUsableInConstantExpressions(getContext())) {
-    assert(!capturedByInit && "constant init contains a capturing block?");
     constant = ConstantEmitter(*this).tryEmitAbstractForInitializer(D);
+    assert(!(constant && capturedByInit) &&
+           "constant init contains a capturing block?");
     if (constant && !constant->isZeroValue() &&
         (trivialAutoVarInit !=
          LangOptions::TrivialAutoVarInitKind::Uninitialized)) {
@@ -1975,6 +1915,10 @@ void CodeGenFunction::EmitAutoVarInit(const AutoVarEmission &emission) {
     initializeWhatIsTechnicallyUninitialized(Loc);
     LValue lv = MakeAddrLValue(Loc, type);
     lv.setNonGC(true);
+    if (emission.NeedsInitOnHeap) {
+      drillIntoBlockVariable(*this, lv, &D);
+      capturedByInit = false; // We don't need to drill again.
+    }
     return EmitExprAsInit(Init, &D, lv, capturedByInit);
   }
 
@@ -2023,6 +1967,9 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
     return;
   }
   case TEK_Aggregate:
+    assert(
+        !capturedByInit &&
+        "Capture-by-initializer should have been handled by NeedsInitOnHeap!");
     if (type->isAtomicType()) {
       EmitAtomicInit(const_cast<Expr*>(init), lvalue);
     } else {
@@ -2031,7 +1978,6 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
         Overlap = AggValueSlot::DoesNotOverlap;
       else if (auto *FD = dyn_cast<FieldDecl>(D))
         Overlap = getOverlapForFieldInit(FD);
-      // TODO: how can we delay here if D is captured by its initializer?
       EmitAggExpr(init, AggValueSlot::forLValue(
                             lvalue, *this, AggValueSlot::IsDestructed,
                             AggValueSlot::DoesNotNeedGCBarriers,
@@ -2120,27 +2066,31 @@ void CodeGenFunction::EmitAutoVarCleanups(const AutoVarEmission &emission) {
   // us from jumping into any of these scopes anyway.
   if (!HaveInsertPoint()) return;
 
-  const VarDecl &D = *emission.Variable;
+  // If we're initializing directly on the heap, _Block_object_destroy will
+  // handle destruction, so we don't need to perform cleanups directly.
+  if (!emission.NeedsInitOnHeap) {
+    const VarDecl &D = *emission.Variable;
 
-  // Check the type for a cleanup.
-  if (QualType::DestructionKind dtorKind = D.needsDestruction(getContext()))
-    emitAutoVarTypeCleanup(emission, dtorKind);
+    // Check the type for a cleanup.
+    if (QualType::DestructionKind dtorKind = D.needsDestruction(getContext()))
+      emitAutoVarTypeCleanup(emission, dtorKind);
 
-  // In GC mode, honor objc_precise_lifetime.
-  if (getLangOpts().getGC() != LangOptions::NonGC &&
-      D.hasAttr<ObjCPreciseLifetimeAttr>()) {
-    EHStack.pushCleanup<ExtendGCLifetime>(NormalCleanup, &D);
-  }
+    // In GC mode, honor objc_precise_lifetime.
+    if (getLangOpts().getGC() != LangOptions::NonGC &&
+        D.hasAttr<ObjCPreciseLifetimeAttr>()) {
+      EHStack.pushCleanup<ExtendGCLifetime>(NormalCleanup, &D);
+    }
 
-  // Handle the cleanup attribute.
-  if (const CleanupAttr *CA = D.getAttr<CleanupAttr>()) {
-    const FunctionDecl *FD = CA->getFunctionDec...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/89475


More information about the cfe-commits mailing list