[clang] [RFC] [clang] [CodeGen] Avoid creating global variable repeatedly when type are not specified (PR #114948)

Chuanqi Xu via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 5 00:13:08 PST 2024


https://github.com/ChuanqiXu9 created https://github.com/llvm/llvm-project/pull/114948

This comes from an internal crash. I know generally it is better to reproduce it first but I do feel the pattern is pretty risky. So I am wondering if we can discuss it first. So maybe this is more of a discussion instead of a pure PR.

Then story is, when we try to get or create a LLVM global for a C/C++'s global, we will try to look up the name first for the existing globals. And if we find one, we will perform some checks. If the checks pass, we will return the found one. If not, we will create a new one and replace the previous one. (Why do we want to do this? My instinct reaction is that we should abort here):
https://github.com/llvm/llvm-project/blob/bf43a138f0a6220cd043a376200bd739cacb25e3/clang/lib/CodeGen/CodeGenModule.cpp#L4966-L4982

https://github.com/llvm/llvm-project/blob/bf43a138f0a6220cd043a376200bd739cacb25e3/clang/lib/CodeGen/CodeGenModule.cpp#L5017-L5032

The problem is, if we store the address of a global variable and the global variable got replaced later, the address we stored became a wild pointer! e.g.

https://github.com/llvm/llvm-project/blob/283273fa1e3be4a03f06a5efd08a8c818be981fd/clang/lib/CodeGen/CodeGenModule.cpp#L2092-L2097

I feel this is pretty dangerous. And to my knowledge, I think we'd better to not remove things emitted during CodeGen.

Then, one of the trigger for the problem is `CodeGenModule::GetAddrOfGlobalVar`:

https://github.com/llvm/llvm-project/blob/283273fa1e3be4a03f06a5efd08a8c818be981fd/clang/lib/CodeGen/CodeGenModule.cpp#L5232C17-L5243

The arguments except `D` can be omitted. And if we don't specify `Ty`, the function will try to deduce the type from `D`. And use the type to get or create a LLVM global in the above process. And the `Ty` arguments may not always be omitted, e.g., in https://github.com/llvm/llvm-project/blob/283273fa1e3be4a03f06a5efd08a8c818be981fd/clang/lib/CodeGen/CodeGenModule.cpp#L5484-L5564, we will try to deduce the LLVM type directly.

Then problem happens, sometimes we try to get or create the global variable by the AST type, but sometimes we try to get or create the **same** global variable by deduced type, and if the two types differs, we may be in the trouble of wild pointer.

(the two types are compatible: e.g., one is `struct { %another.struct}` with `%another.struct = { ptr }` and another  type is `{ { ptr } }`).

The solution or one workaround I got is, in `CodeGenModule::GetAddrOfGlobalVar`, if we didn't specify the `Ty` and we have the same variable, return the variable directly. I think it makes sense since if the `Ty` is not specified, it implies the caller doesn't care about it too much.

WDYT?



>From 5b7def2c1deb4315cd043bc090a7364edbaeb84c Mon Sep 17 00:00:00 2001
From: Chuanqi Xu <yedeng.yd at linux.alibaba.com>
Date: Tue, 5 Nov 2024 15:50:50 +0800
Subject: [PATCH] [RFC] [clang] [CodeGen] Avoid creating global variable
 repeatedly when type are not specified

---
 clang/lib/CodeGen/CodeGenModule.cpp | 9 ++++++++-
 clang/test/CodeGen/attr-weakref2.c  | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index ba376f9ecfacde..9566cfb8d6e794 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -5233,11 +5233,18 @@ llvm::Constant *CodeGenModule::GetAddrOfGlobalVar(const VarDecl *D,
                                                   llvm::Type *Ty,
                                            ForDefinition_t IsForDefinition) {
   assert(D->hasGlobalStorage() && "Not a global variable");
+
+  StringRef MangledName = getMangledName(D);
+  llvm::GlobalValue *Entry = GetGlobalValue(MangledName);
   QualType ASTTy = D->getType();
+  LangAS AddrSpace = ASTTy.getAddressSpace();
+
+  if (Entry && !Ty && Entry->getAddressSpace() == getContext().getTargetAddressSpace(AddrSpace))
+    return Entry;
+
   if (!Ty)
     Ty = getTypes().ConvertTypeForMem(ASTTy);
 
-  StringRef MangledName = getMangledName(D);
   return GetOrCreateLLVMGlobal(MangledName, Ty, ASTTy.getAddressSpace(), D,
                                IsForDefinition);
 }
diff --git a/clang/test/CodeGen/attr-weakref2.c b/clang/test/CodeGen/attr-weakref2.c
index 114f048a851832..a67f906810faf3 100644
--- a/clang/test/CodeGen/attr-weakref2.c
+++ b/clang/test/CodeGen/attr-weakref2.c
@@ -33,7 +33,7 @@ int test4_h(void) {
 }
 int test4_f;
 
-// CHECK: @test5_f = external global i32
+// CHECK: @test5_f = extern_weak global i32
 extern int test5_f;
 static int test5_g __attribute__((weakref("test5_f")));
 int test5_h(void) {



More information about the cfe-commits mailing list