[Patch] Use address-taken to disambiguate global var and indirect accesses

Shuxin Yang shuxin.llvm at gmail.com
Mon Oct 21 11:27:23 PDT 2013


Ping.

(add testing cases. Forget attaching testing case in my previous mail).
Thanks
Shuxin


-------- Original Message --------
Subject: 	[Patch] Use address-taken to disambiguate global var and 
indirect accesses
Date: 	Tue, 15 Oct 2013 14:39:51 -0700
From: 	Shuxin Yang <shuxin.llvm at gmail.com>
To: 	Commit Messages and Patches for LLVM <llvm-commits at cs.uiuc.edu>



Hi,

      The attached patch is to take advantage of address-taken to
disambiguate global
   variable and indirect memory accesses.

   The motivation
   ===========
       I was asked to investigate the problem where the static variable
is not hoisted as
loop invariant:

      ---------------
       static int xyz;
       void foo(int *p) {
           for (int i = 0; i < xyz; i++)
              *p++ = ....
       }
     -----------------

     The compiler dose have a concept call "addr-capture". However, I
don't think it can
be used for disambiguate global variable and indirect access. The
reasons is that
a variable dose not have its address *CAPTURED*, dose not necessarily
mean this variable
cannot be indirectly accessed.

     So, I rely on "address taken"

   How it works
   ========
        1. In globalopt, when a global var is identified as
not-addr-taken, cache the result
            to GlobalVariable::notAddrTaken.

        2. In alias-analyzer, supposed the mem-op involved are m1 and m2.
Let o1 and o2
            be the "object" (obtained via get_underlying_object() of m1
and m2 respectively.

           if O1 != O2 && one of the them are global-variable without
address taken,
           then m1 and m2 are disjointed access.

   Misc:
   =========
        Note that I *cache* the result of not-addr-taken. Unlike
function, it is far more expensive
to figure out if a globalvar has its address taken or not. So, it is not
appropriate to analyze
the address-taken on the fly.

       On the other hand,  I personally think not-addr-taken flag is
almost maintenance free.
(FIXME) Only few optimizer could make a not-addr-taken variable become
addr-taken (say, outlining),
and I don't think currently we have such passes (FIXME again!).  In case
such rare cases take place,
it is up to the pass the to reset the not-addr-taken flags.

      Of course, a variable previously considered addr-taken may later on
proved to be not-addr-taken.
In that case, compiler dose not have to update it -- it is
conservatively correct.

   Performance impact
=============
    Measured on an oldish Mac Tower with 2x 2.26Ghz Quad-Core Xeon. Both
base-line and
the change are measured for couple of times.  I did take a look of why
Olden/power is sped up --
the loads of static variable "P" and "Q" are promoted in many places.  I
have not yet got chance
to investigate why the speedup to pairlocalalign with O3 disappear in
O3+LTO.


o. test-suite w/ O3:
-------------------
Benchmarks/Olden/power/power                  1.6129 1.354
-16.0518321036642
Benchmarks/mafft/pairlocalalign               31.4211 26.5794
-15.4090722476298
Benchmarks/Ptrdist/yacr2/yacr2                0.881 0.804
-8.74006810442678

o. test-suite w/ O3 + LTO
-------------------------
Benchmarks/Olden/power/power  1.6143      1.3419 -16.8741869540978
Applications/spiff/spiff      2.9203      2.849 -2.44152997979659

o. spec2kint w/ O3+LTO
----------------------
bzip2  75.02 73.92 -1.4


Thanks
Shuxin





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131021/db75d781/attachment.html>
-------------- next part --------------
Index: include/llvm/IR/GlobalVariable.h
===================================================================
--- include/llvm/IR/GlobalVariable.h	(revision 192719)
+++ include/llvm/IR/GlobalVariable.h	(working copy)
@@ -48,6 +48,7 @@
                                                // can change from its initial
                                                // value before global
                                                // initializers are run?
+  bool notAddrTaken : 1;                       // Dose not have address taken.
 
 public:
   // allocate space for exactly one operand
@@ -174,6 +175,9 @@
     isExternallyInitializedConstant = Val;
   }
 
+  void setNotAddressTaken(bool Val) { notAddrTaken = Val; }
+  bool notAddressTaken(void) const { return notAddrTaken; }
+
   /// copyAttributesFrom - copy all additional attributes (those not needed to
   /// create a GlobalVariable) from the GlobalVariable Src to this one.
   void copyAttributesFrom(const GlobalValue *Src);
Index: docs/LangRef.rst
===================================================================
--- docs/LangRef.rst	(revision 192719)
+++ docs/LangRef.rst	(working copy)
@@ -514,6 +514,9 @@
 ``@llvm.used``. This assumption may be suppressed by marking the
 variable with ``externally_initialized``.
 
+If a global variable dose not have its address taken, it will be optionally
+flagged ``notaddrtaken``.
+
 An explicit alignment may be specified for a global, which must be a
 power of 2. If not present, or if the alignment is set to zero, the
 alignment of the global is set by the target to whatever it feels
Index: lib/Analysis/BasicAliasAnalysis.cpp
===================================================================
--- lib/Analysis/BasicAliasAnalysis.cpp	(revision 192719)
+++ lib/Analysis/BasicAliasAnalysis.cpp	(working copy)
@@ -1238,6 +1238,14 @@
       return NoAlias;
     if (isEscapeSource(O2) && isNonEscapingLocalObject(O1))
       return NoAlias;
+
+    if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(O1))
+      if (GV->notAddressTaken())
+        return NoAlias;
+
+    if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(O2))
+      if (GV->notAddressTaken())
+        return NoAlias;
   }
 
   // If the size of one access is larger than the entire object on the other
Index: lib/AsmParser/LLParser.cpp
===================================================================
--- lib/AsmParser/LLParser.cpp	(revision 192719)
+++ lib/AsmParser/LLParser.cpp	(working copy)
@@ -704,7 +704,7 @@
                            unsigned Linkage, bool HasLinkage,
                            unsigned Visibility) {
   unsigned AddrSpace;
-  bool IsConstant, UnnamedAddr, IsExternallyInitialized;
+  bool IsConstant, UnnamedAddr, IsExternallyInitialized, notAddrTaken;
   GlobalVariable::ThreadLocalMode TLM;
   LocTy UnnamedAddrLoc;
   LocTy IsExternallyInitializedLoc;
@@ -719,6 +719,7 @@
                          IsExternallyInitialized,
                          &IsExternallyInitializedLoc) ||
       ParseGlobalType(IsConstant) ||
+      ParseOptionalToken(lltok::kw_notaddrtaken, notAddrTaken) ||
       ParseType(Ty, TyLoc))
     return true;
 
@@ -776,6 +777,7 @@
   GV->setLinkage((GlobalValue::LinkageTypes)Linkage);
   GV->setVisibility((GlobalValue::VisibilityTypes)Visibility);
   GV->setExternallyInitialized(IsExternallyInitialized);
+  GV->setNotAddressTaken(notAddrTaken);
   GV->setThreadLocalMode(TLM);
   GV->setUnnamedAddr(UnnamedAddr);
 
Index: lib/AsmParser/LLLexer.cpp
===================================================================
--- lib/AsmParser/LLLexer.cpp	(revision 192719)
+++ lib/AsmParser/LLLexer.cpp	(working copy)
@@ -504,6 +504,7 @@
   KEYWORD(zeroinitializer);
   KEYWORD(undef);
   KEYWORD(null);
+  KEYWORD(notaddrtaken);
   KEYWORD(to);
   KEYWORD(tail);
   KEYWORD(target);
Index: lib/AsmParser/LLToken.h
===================================================================
--- lib/AsmParser/LLToken.h	(revision 192719)
+++ lib/AsmParser/LLToken.h	(working copy)
@@ -51,6 +51,7 @@
     kw_localdynamic, kw_initialexec, kw_localexec,
     kw_zeroinitializer,
     kw_undef, kw_null,
+    kw_notaddrtaken,
     kw_to,
     kw_tail,
     kw_target,
Index: lib/Transforms/IPO/GlobalOpt.cpp
===================================================================
--- lib/Transforms/IPO/GlobalOpt.cpp	(revision 192719)
+++ lib/Transforms/IPO/GlobalOpt.cpp	(working copy)
@@ -1922,6 +1922,8 @@
   if (AnalyzeGlobal(GV, GS, PHIUsers))
     return false;
 
+  GV->setNotAddressTaken(true);
+
   if (!GS.isCompared && !GV->hasUnnamedAddr()) {
     GV->setUnnamedAddr(true);
     NumUnnamed++;
Index: lib/Bitcode/Reader/BitcodeReader.cpp
===================================================================
--- lib/Bitcode/Reader/BitcodeReader.cpp	(revision 192719)
+++ lib/Bitcode/Reader/BitcodeReader.cpp	(working copy)
@@ -1848,6 +1848,9 @@
         new GlobalVariable(*TheModule, Ty, isConstant, Linkage, 0, "", 0,
                            TLM, AddressSpace, ExternallyInitialized);
       NewGV->setAlignment(Alignment);
+      if (Record.size() > 10)
+        NewGV->setNotAddressTaken(Record[10]);
+
       if (!Section.empty())
         NewGV->setSection(Section);
       NewGV->setVisibility(Visibility);
Index: lib/Bitcode/Writer/BitcodeWriter.cpp
===================================================================
--- lib/Bitcode/Writer/BitcodeWriter.cpp	(revision 192719)
+++ lib/Bitcode/Writer/BitcodeWriter.cpp	(working copy)
@@ -616,11 +616,13 @@
     Vals.push_back(GV->hasSection() ? SectionMap[GV->getSection()] : 0);
     if (GV->isThreadLocal() ||
         GV->getVisibility() != GlobalValue::DefaultVisibility ||
-        GV->hasUnnamedAddr() || GV->isExternallyInitialized()) {
+        GV->hasUnnamedAddr() || GV->isExternallyInitialized() ||
+        GV->notAddressTaken()) {
       Vals.push_back(getEncodedVisibility(GV));
       Vals.push_back(getEncodedThreadLocalMode(GV));
       Vals.push_back(GV->hasUnnamedAddr());
       Vals.push_back(GV->isExternallyInitialized());
+      Vals.push_back(GV->notAddressTaken());
     } else {
       AbbrevToUse = SimpleGVarAbbrev;
     }
Index: lib/IR/AsmWriter.cpp
===================================================================
--- lib/IR/AsmWriter.cpp	(revision 192719)
+++ lib/IR/AsmWriter.cpp	(working copy)
@@ -1459,6 +1459,7 @@
   if (GV->hasUnnamedAddr()) Out << "unnamed_addr ";
   if (GV->isExternallyInitialized()) Out << "externally_initialized ";
   Out << (GV->isConstant() ? "constant " : "global ");
+  if (GV->notAddressTaken()) Out << "notaddrtaken ";
   TypePrinter.print(GV->getType()->getElementType(), Out);
 
   if (GV->hasInitializer()) {
Index: lib/IR/Globals.cpp
===================================================================
--- lib/IR/Globals.cpp	(revision 192719)
+++ lib/IR/Globals.cpp	(working copy)
@@ -99,6 +99,7 @@
   }
 
   LeakDetector::addGarbageObject(this);
+  setNotAddressTaken(false);
 }
 
 GlobalVariable::GlobalVariable(Module &M, Type *Ty, bool constant,
@@ -125,6 +126,7 @@
     Before->getParent()->getGlobalList().insert(Before, this);
   else
     M.getGlobalList().push_back(this);
+  setNotAddressTaken(false);
 }
 
 void GlobalVariable::setParent(Module *parent) {
@@ -185,6 +187,7 @@
   GlobalValue::copyAttributesFrom(Src);
   const GlobalVariable *SrcVar = cast<GlobalVariable>(Src);
   setThreadLocal(SrcVar->isThreadLocal());
+  setNotAddressTaken(SrcVar->notAddressTaken());
 }
 
 

-------------- next part --------------
Index: test/Analysis/BasicAA/noaddrtaken.ll
===================================================================
--- test/Analysis/BasicAA/noaddrtaken.ll	(revision 0)
+++ test/Analysis/BasicAA/noaddrtaken.ll	(revision 0)
@@ -0,0 +1,29 @@
+; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info 2>&1 | FileCheck %s
+
+; ModuleID = 'b.c'
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-apple-macosx10.8.0"
+
+; CHECK: NoAlias:   i32* %p, i32* @xyz
+
+;@xyz = global i32 12, align 4
+ at xyz = internal unnamed_addr global notaddrtaken i32 12, align 4
+
+; Function Attrs: nounwind ssp uwtable
+define i32 @foo(i32* nocapture %p, i32* nocapture %q) #0 {
+entry:
+  %0 = load i32* @xyz, align 4, !tbaa !0
+  %inc = add nsw i32 %0, 1
+  store i32 %inc, i32* @xyz, align 4, !tbaa !0
+  store i32 1, i32* %p, align 4, !tbaa !0
+  %1 = load i32* @xyz, align 4, !tbaa !0
+  store i32 %1, i32* %q, align 4, !tbaa !0
+  ret i32 undef
+}
+
+attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
+!0 = metadata !{metadata !1, metadata !1, i64 0}
+!1 = metadata !{metadata !"int", metadata !2, i64 0}
+!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
+!3 = metadata !{metadata !"Simple C/C++ TBAA"}
Index: test/Transforms/GlobalOpt/atomic.ll
===================================================================
--- test/Transforms/GlobalOpt/atomic.ll	(revision 192719)
+++ test/Transforms/GlobalOpt/atomic.ll	(working copy)
@@ -3,8 +3,8 @@
 @GV1 = internal global i64 1
 @GV2 = internal global i32 0
 
-; CHECK: @GV1 = internal unnamed_addr constant i64 1
-; CHECK: @GV2 = internal unnamed_addr global i32 0
+; CHECK: @GV1 = internal unnamed_addr constant notaddrtaken i64 1
+; CHECK: @GV2 = internal unnamed_addr global notaddrtaken i32 0
 
 define void @test1() {
 entry:
Index: test/Transforms/GlobalOpt/2009-03-07-PromotePtrToBool.ll
===================================================================
--- test/Transforms/GlobalOpt/2009-03-07-PromotePtrToBool.ll	(revision 192719)
+++ test/Transforms/GlobalOpt/2009-03-07-PromotePtrToBool.ll	(working copy)
@@ -1,4 +1,4 @@
-; RUN: opt < %s -globalopt -S | grep "@X = internal unnamed_addr global i32"
+; RUN: opt < %s -globalopt -S | grep "@X = internal unnamed_addr global notaddrtaken i32"
 target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
 target triple = "i386-apple-darwin7"
 @X = internal global i32* null		; <i32**> [#uses=2]
Index: test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll
===================================================================
--- test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll	(revision 192719)
+++ test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll	(working copy)
@@ -8,7 +8,7 @@
 target triple = "x86_64-apple-darwin10.0"
 
 @TOP = internal global i64* null                    ; <i64**> [#uses=2]
-; CHECK: @TOP = internal unnamed_addr global i64* null
+; CHECK: @TOP = internal unnamed_addr global notaddrtaken i64* null
 @channelColumns = internal global i64 0             ; <i64*> [#uses=2]
 
 ; Derived from @DescribeChannel() in yacr2
Index: test/Transforms/GlobalOpt/integer-bool.ll
===================================================================
--- test/Transforms/GlobalOpt/integer-bool.ll	(revision 192719)
+++ test/Transforms/GlobalOpt/integer-bool.ll	(working copy)
@@ -4,7 +4,7 @@
 @G = internal addrspace(1) global i32 0
 ; CHECK: @G
 ; CHECK: addrspace(1)
-; CHECK: global i1 false
+; CHECK: global notaddrtaken i1 false
 
 define void @set1() {
   store i32 0, i32 addrspace(1)* @G
Index: test/Transforms/GlobalOpt/unnamed-addr.ll
===================================================================
--- test/Transforms/GlobalOpt/unnamed-addr.ll	(revision 192719)
+++ test/Transforms/GlobalOpt/unnamed-addr.ll	(working copy)
@@ -6,10 +6,10 @@
 @d = internal constant [4 x i8] c"foo\00", align 1
 @e = linkonce_odr global i32 0
 
-; CHECK: @a = internal global i32 0, align 4
+; CHECK: @a = internal global notaddrtaken i32 0, align 4
 ; CHECK: @b = internal global i32 0, align 4
-; CHECK: @c = internal unnamed_addr global i32 0, align 4
-; CHECK: @d = internal unnamed_addr constant [4 x i8] c"foo\00", align 1
+; CHECK: @c = internal unnamed_addr global notaddrtaken i32 0, align 4
+; CHECK: @d = internal unnamed_addr constant notaddrtaken [4 x i8] c"foo\00", align 1
 ; CHECK: @e = linkonce_odr global i32 0
 
 define i32 @get_e() {
Index: test/Transforms/GlobalOpt/globalsra-unknown-index.ll
===================================================================
--- test/Transforms/GlobalOpt/globalsra-unknown-index.ll	(revision 192719)
+++ test/Transforms/GlobalOpt/globalsra-unknown-index.ll	(working copy)
@@ -1,5 +1,5 @@
 ; RUN: opt < %s -globalopt -S > %t
-; RUN: grep "@Y = internal unnamed_addr global \[3 x [%]struct.X\] zeroinitializer" %t
+; RUN: grep "@Y = internal unnamed_addr global notaddrtaken \[3 x [%]struct.X\] zeroinitializer" %t
 ; RUN: grep load %t | count 6
 ; RUN: grep "add i32 [%]a, [%]b" %t | count 3
 


More information about the llvm-commits mailing list