[PATCH] D150218: [ConstantMerge] Only merge constant w/unnamed_addr

Gulfem Savrun Yeniceri via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 9 11:53:26 PDT 2023


gulfem created this revision.
Herald added subscribers: hoy, ormris, hiraditya, arichardson.
Herald added a project: All.
gulfem requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Currently, ConstantMergePass merges an unnamed_addr with a
non-unnamed_addr constant as it is explained in LangRef.

"Note that a constant with significant address can be merged with a
unnamed_addr constant, the result being a constant whose address is
significant."

https://llvm.org/docs/LangRef.html#global-variables

This can result in a situation where Clang vioalates C semantics, and
here is a small reproducer to explain the problem:

  const char foo_string[] = "foo";
  const char* foo_func(void) { return "foo"; }
  int is_foo(const char* p) { return p == foo_string; }
  
  int main() {
    printf("is_foo: %d\n", is_foo("foo"));
  }

When we compile with -O0, where ConstantMerge is not applied, Clang
and GCC have the same result.

  clang -O0 foo.c -o foo
  ./foo
  is_foo: 0
  
  gcc -O0 foo.c -o foo
  ./foo
  is_foo: 0

When we compile -O1 and higher, where ConstantMerge is applied, Clang
and GCC have different results.

  clang -O3 foo.c -o foo
  ./foo
  is_foo: 1
  
  gcc -O3 foo.c -o foo
  ./foo
  is_foo: 0

Here's the IR before ConstantMergePass pass:

  @.str = private unnamed_addr constant [4 x i8] c"foo\00", align 1
  @_ZL10foo_string = internal constant [4 x i8] c"foo\00", align 1
  @.str.1 = private unnamed_addr constant [12 x i8] c"is_foo: %d\0A\00",
  align 1
  
  ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone
  uwtable willreturn
  define noundef i8* @_Z8foo_funcv() local_unnamed_addr #0 {
   ret i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0)
  }
  
  ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone
  uwtable willreturn
  define noundef i32 @_Z6is_fooPKc(i8* noundef readnone %0)
  local_unnamed_addr #0 {
  %2 = icmp eq i8* %0, getelementptr inbounds ([4 x i8], [4 x i8]*
  @_ZL10foo_string, i64 0, i64 0)
  %3 = zext i1 %2 to i32
  ret i32 %3
  }
  
  ; Function Attrs: mustprogress nofree norecurse nounwind uwtable
  define noundef i32 @main() local_unnamed_addr #1 {
  %1 = tail call i32 (i8*, ...) @printf(i8* noundef nonnull
  dereferenceable(1) getelementptr inbounds ([12 x i8], [12 x i8]*
  @.str.1, i64 0, i64 0), i32 noundef zext (i1 icmp eq (i8* getelementptr
  inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i8* getelementptr
  inbounds ([4 x i8], [4 x i8]* @_ZL10foo_string, i64 0, i64 0)) to i32))
  ret i32 0
  }

MergeConstantPass merges `_ZL10foo_string` into `.str`, where it merges
a non-`unnamed_addr` constant into an `unnamed_addr` constant.

- IR Dump After ConstantMergePass on [module] ***

  @.str = private constant [4 x i8] c"foo\00", align 1
  @.str.1 = private unnamed_addr constant [12 x i8] c"is_foo: %d\0A\00",
  align 1
  
  ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone
  uwtable willreturn
  define noundef i8* @_Z8foo_funcv() local_unnamed_addr #0 {
  ret i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0)
  }
  
  ; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone
  uwtable willreturn
  define noundef i32 @_Z6is_fooPKc(i8* noundef readnone %0)
  local_unnamed_addr #0 {
  %2 = icmp eq i8* %0, getelementptr inbounds ([4 x i8], [4 x i8]* @.str,
  i64 0, i64 0)
  %3 = zext i1 %2 to i32
  ret i32 %3
  }
  
  ; Function Attrs: mustprogress nofree norecurse nounwind uwtable
  define noundef i32 @main() local_unnamed_addr #1 {
  %1 = tail call i32 (i8*, ...) @printf(i8* noundef nonnull
  dereferenceable(1) getelementptr inbounds ([12 x i8], [12 x i8]*
  @.str.1, i64 0, i64 0), i32 noundef 1)
  ret i32 0
  }

This transformation violates the following C pointer semantics:

"Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a pointer
to one past the end of one array object and the other is a pointer to
the start of a different array object that happens to immediately
follow the first array object in the address space."
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf

So, this patch changes ConstantMerge pass to only allow merging when
when a constant is marked with `unnamed_addr` attribute.
I also found an old GitHub issue where a similar issue about invalid
constant merging is explained.
https://github.com/llvm/llvm-project/issues/9299


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D150218

Files:
  llvm/lib/Transforms/IPO/ConstantMerge.cpp
  llvm/test/Transforms/ConstantMerge/2011-01-15-EitherOrder.ll
  llvm/test/Transforms/ConstantMerge/merge-both.ll
  llvm/test/Transforms/ConstantMerge/merge-dbg.ll
  llvm/test/Transforms/ConstantMerge/unnamed-addr.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D150218.520778.patch
Type: text/x-patch
Size: 5100 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230509/af7644df/attachment.bin>


More information about the llvm-commits mailing list