[Patch] Use address-taken to disambiguate global var and indirect accesses
Shuxin Yang
shuxin.llvm at gmail.com
Mon Oct 21 11:27:23 PDT 2013
Ping.
(add testing cases. Forget attaching testing case in my previous mail).
Thanks
Shuxin
-------- Original Message --------
Subject: [Patch] Use address-taken to disambiguate global var and
indirect accesses
Date: Tue, 15 Oct 2013 14:39:51 -0700
From: Shuxin Yang <shuxin.llvm at gmail.com>
To: Commit Messages and Patches for LLVM <llvm-commits at cs.uiuc.edu>
Hi,
The attached patch is to take advantage of address-taken to
disambiguate global
variable and indirect memory accesses.
The motivation
===========
I was asked to investigate the problem where the static variable
is not hoisted as
loop invariant:
---------------
static int xyz;
void foo(int *p) {
for (int i = 0; i < xyz; i++)
*p++ = ....
}
-----------------
The compiler dose have a concept call "addr-capture". However, I
don't think it can
be used for disambiguate global variable and indirect access. The
reasons is that
a variable dose not have its address *CAPTURED*, dose not necessarily
mean this variable
cannot be indirectly accessed.
So, I rely on "address taken"
How it works
========
1. In globalopt, when a global var is identified as
not-addr-taken, cache the result
to GlobalVariable::notAddrTaken.
2. In alias-analyzer, supposed the mem-op involved are m1 and m2.
Let o1 and o2
be the "object" (obtained via get_underlying_object() of m1
and m2 respectively.
if O1 != O2 && one of the them are global-variable without
address taken,
then m1 and m2 are disjointed access.
Misc:
=========
Note that I *cache* the result of not-addr-taken. Unlike
function, it is far more expensive
to figure out if a globalvar has its address taken or not. So, it is not
appropriate to analyze
the address-taken on the fly.
On the other hand, I personally think not-addr-taken flag is
almost maintenance free.
(FIXME) Only few optimizer could make a not-addr-taken variable become
addr-taken (say, outlining),
and I don't think currently we have such passes (FIXME again!). In case
such rare cases take place,
it is up to the pass the to reset the not-addr-taken flags.
Of course, a variable previously considered addr-taken may later on
proved to be not-addr-taken.
In that case, compiler dose not have to update it -- it is
conservatively correct.
Performance impact
=============
Measured on an oldish Mac Tower with 2x 2.26Ghz Quad-Core Xeon. Both
base-line and
the change are measured for couple of times. I did take a look of why
Olden/power is sped up --
the loads of static variable "P" and "Q" are promoted in many places. I
have not yet got chance
to investigate why the speedup to pairlocalalign with O3 disappear in
O3+LTO.
o. test-suite w/ O3:
-------------------
Benchmarks/Olden/power/power 1.6129 1.354
-16.0518321036642
Benchmarks/mafft/pairlocalalign 31.4211 26.5794
-15.4090722476298
Benchmarks/Ptrdist/yacr2/yacr2 0.881 0.804
-8.74006810442678
o. test-suite w/ O3 + LTO
-------------------------
Benchmarks/Olden/power/power 1.6143 1.3419 -16.8741869540978
Applications/spiff/spiff 2.9203 2.849 -2.44152997979659
o. spec2kint w/ O3+LTO
----------------------
bzip2 75.02 73.92 -1.4
Thanks
Shuxin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131021/db75d781/attachment.html>
-------------- next part --------------
Index: include/llvm/IR/GlobalVariable.h
===================================================================
--- include/llvm/IR/GlobalVariable.h (revision 192719)
+++ include/llvm/IR/GlobalVariable.h (working copy)
@@ -48,6 +48,7 @@
// can change from its initial
// value before global
// initializers are run?
+ bool notAddrTaken : 1; // Dose not have address taken.
public:
// allocate space for exactly one operand
@@ -174,6 +175,9 @@
isExternallyInitializedConstant = Val;
}
+ void setNotAddressTaken(bool Val) { notAddrTaken = Val; }
+ bool notAddressTaken(void) const { return notAddrTaken; }
+
/// copyAttributesFrom - copy all additional attributes (those not needed to
/// create a GlobalVariable) from the GlobalVariable Src to this one.
void copyAttributesFrom(const GlobalValue *Src);
Index: docs/LangRef.rst
===================================================================
--- docs/LangRef.rst (revision 192719)
+++ docs/LangRef.rst (working copy)
@@ -514,6 +514,9 @@
``@llvm.used``. This assumption may be suppressed by marking the
variable with ``externally_initialized``.
+If a global variable dose not have its address taken, it will be optionally
+flagged ``notaddrtaken``.
+
An explicit alignment may be specified for a global, which must be a
power of 2. If not present, or if the alignment is set to zero, the
alignment of the global is set by the target to whatever it feels
Index: lib/Analysis/BasicAliasAnalysis.cpp
===================================================================
--- lib/Analysis/BasicAliasAnalysis.cpp (revision 192719)
+++ lib/Analysis/BasicAliasAnalysis.cpp (working copy)
@@ -1238,6 +1238,14 @@
return NoAlias;
if (isEscapeSource(O2) && isNonEscapingLocalObject(O1))
return NoAlias;
+
+ if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(O1))
+ if (GV->notAddressTaken())
+ return NoAlias;
+
+ if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(O2))
+ if (GV->notAddressTaken())
+ return NoAlias;
}
// If the size of one access is larger than the entire object on the other
Index: lib/AsmParser/LLParser.cpp
===================================================================
--- lib/AsmParser/LLParser.cpp (revision 192719)
+++ lib/AsmParser/LLParser.cpp (working copy)
@@ -704,7 +704,7 @@
unsigned Linkage, bool HasLinkage,
unsigned Visibility) {
unsigned AddrSpace;
- bool IsConstant, UnnamedAddr, IsExternallyInitialized;
+ bool IsConstant, UnnamedAddr, IsExternallyInitialized, notAddrTaken;
GlobalVariable::ThreadLocalMode TLM;
LocTy UnnamedAddrLoc;
LocTy IsExternallyInitializedLoc;
@@ -719,6 +719,7 @@
IsExternallyInitialized,
&IsExternallyInitializedLoc) ||
ParseGlobalType(IsConstant) ||
+ ParseOptionalToken(lltok::kw_notaddrtaken, notAddrTaken) ||
ParseType(Ty, TyLoc))
return true;
@@ -776,6 +777,7 @@
GV->setLinkage((GlobalValue::LinkageTypes)Linkage);
GV->setVisibility((GlobalValue::VisibilityTypes)Visibility);
GV->setExternallyInitialized(IsExternallyInitialized);
+ GV->setNotAddressTaken(notAddrTaken);
GV->setThreadLocalMode(TLM);
GV->setUnnamedAddr(UnnamedAddr);
Index: lib/AsmParser/LLLexer.cpp
===================================================================
--- lib/AsmParser/LLLexer.cpp (revision 192719)
+++ lib/AsmParser/LLLexer.cpp (working copy)
@@ -504,6 +504,7 @@
KEYWORD(zeroinitializer);
KEYWORD(undef);
KEYWORD(null);
+ KEYWORD(notaddrtaken);
KEYWORD(to);
KEYWORD(tail);
KEYWORD(target);
Index: lib/AsmParser/LLToken.h
===================================================================
--- lib/AsmParser/LLToken.h (revision 192719)
+++ lib/AsmParser/LLToken.h (working copy)
@@ -51,6 +51,7 @@
kw_localdynamic, kw_initialexec, kw_localexec,
kw_zeroinitializer,
kw_undef, kw_null,
+ kw_notaddrtaken,
kw_to,
kw_tail,
kw_target,
Index: lib/Transforms/IPO/GlobalOpt.cpp
===================================================================
--- lib/Transforms/IPO/GlobalOpt.cpp (revision 192719)
+++ lib/Transforms/IPO/GlobalOpt.cpp (working copy)
@@ -1922,6 +1922,8 @@
if (AnalyzeGlobal(GV, GS, PHIUsers))
return false;
+ GV->setNotAddressTaken(true);
+
if (!GS.isCompared && !GV->hasUnnamedAddr()) {
GV->setUnnamedAddr(true);
NumUnnamed++;
Index: lib/Bitcode/Reader/BitcodeReader.cpp
===================================================================
--- lib/Bitcode/Reader/BitcodeReader.cpp (revision 192719)
+++ lib/Bitcode/Reader/BitcodeReader.cpp (working copy)
@@ -1848,6 +1848,9 @@
new GlobalVariable(*TheModule, Ty, isConstant, Linkage, 0, "", 0,
TLM, AddressSpace, ExternallyInitialized);
NewGV->setAlignment(Alignment);
+ if (Record.size() > 10)
+ NewGV->setNotAddressTaken(Record[10]);
+
if (!Section.empty())
NewGV->setSection(Section);
NewGV->setVisibility(Visibility);
Index: lib/Bitcode/Writer/BitcodeWriter.cpp
===================================================================
--- lib/Bitcode/Writer/BitcodeWriter.cpp (revision 192719)
+++ lib/Bitcode/Writer/BitcodeWriter.cpp (working copy)
@@ -616,11 +616,13 @@
Vals.push_back(GV->hasSection() ? SectionMap[GV->getSection()] : 0);
if (GV->isThreadLocal() ||
GV->getVisibility() != GlobalValue::DefaultVisibility ||
- GV->hasUnnamedAddr() || GV->isExternallyInitialized()) {
+ GV->hasUnnamedAddr() || GV->isExternallyInitialized() ||
+ GV->notAddressTaken()) {
Vals.push_back(getEncodedVisibility(GV));
Vals.push_back(getEncodedThreadLocalMode(GV));
Vals.push_back(GV->hasUnnamedAddr());
Vals.push_back(GV->isExternallyInitialized());
+ Vals.push_back(GV->notAddressTaken());
} else {
AbbrevToUse = SimpleGVarAbbrev;
}
Index: lib/IR/AsmWriter.cpp
===================================================================
--- lib/IR/AsmWriter.cpp (revision 192719)
+++ lib/IR/AsmWriter.cpp (working copy)
@@ -1459,6 +1459,7 @@
if (GV->hasUnnamedAddr()) Out << "unnamed_addr ";
if (GV->isExternallyInitialized()) Out << "externally_initialized ";
Out << (GV->isConstant() ? "constant " : "global ");
+ if (GV->notAddressTaken()) Out << "notaddrtaken ";
TypePrinter.print(GV->getType()->getElementType(), Out);
if (GV->hasInitializer()) {
Index: lib/IR/Globals.cpp
===================================================================
--- lib/IR/Globals.cpp (revision 192719)
+++ lib/IR/Globals.cpp (working copy)
@@ -99,6 +99,7 @@
}
LeakDetector::addGarbageObject(this);
+ setNotAddressTaken(false);
}
GlobalVariable::GlobalVariable(Module &M, Type *Ty, bool constant,
@@ -125,6 +126,7 @@
Before->getParent()->getGlobalList().insert(Before, this);
else
M.getGlobalList().push_back(this);
+ setNotAddressTaken(false);
}
void GlobalVariable::setParent(Module *parent) {
@@ -185,6 +187,7 @@
GlobalValue::copyAttributesFrom(Src);
const GlobalVariable *SrcVar = cast<GlobalVariable>(Src);
setThreadLocal(SrcVar->isThreadLocal());
+ setNotAddressTaken(SrcVar->notAddressTaken());
}
-------------- next part --------------
Index: test/Analysis/BasicAA/noaddrtaken.ll
===================================================================
--- test/Analysis/BasicAA/noaddrtaken.ll (revision 0)
+++ test/Analysis/BasicAA/noaddrtaken.ll (revision 0)
@@ -0,0 +1,29 @@
+; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info 2>&1 | FileCheck %s
+
+; ModuleID = 'b.c'
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-apple-macosx10.8.0"
+
+; CHECK: NoAlias: i32* %p, i32* @xyz
+
+;@xyz = global i32 12, align 4
+ at xyz = internal unnamed_addr global notaddrtaken i32 12, align 4
+
+; Function Attrs: nounwind ssp uwtable
+define i32 @foo(i32* nocapture %p, i32* nocapture %q) #0 {
+entry:
+ %0 = load i32* @xyz, align 4, !tbaa !0
+ %inc = add nsw i32 %0, 1
+ store i32 %inc, i32* @xyz, align 4, !tbaa !0
+ store i32 1, i32* %p, align 4, !tbaa !0
+ %1 = load i32* @xyz, align 4, !tbaa !0
+ store i32 %1, i32* %q, align 4, !tbaa !0
+ ret i32 undef
+}
+
+attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
+!0 = metadata !{metadata !1, metadata !1, i64 0}
+!1 = metadata !{metadata !"int", metadata !2, i64 0}
+!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
+!3 = metadata !{metadata !"Simple C/C++ TBAA"}
Index: test/Transforms/GlobalOpt/atomic.ll
===================================================================
--- test/Transforms/GlobalOpt/atomic.ll (revision 192719)
+++ test/Transforms/GlobalOpt/atomic.ll (working copy)
@@ -3,8 +3,8 @@
@GV1 = internal global i64 1
@GV2 = internal global i32 0
-; CHECK: @GV1 = internal unnamed_addr constant i64 1
-; CHECK: @GV2 = internal unnamed_addr global i32 0
+; CHECK: @GV1 = internal unnamed_addr constant notaddrtaken i64 1
+; CHECK: @GV2 = internal unnamed_addr global notaddrtaken i32 0
define void @test1() {
entry:
Index: test/Transforms/GlobalOpt/2009-03-07-PromotePtrToBool.ll
===================================================================
--- test/Transforms/GlobalOpt/2009-03-07-PromotePtrToBool.ll (revision 192719)
+++ test/Transforms/GlobalOpt/2009-03-07-PromotePtrToBool.ll (working copy)
@@ -1,4 +1,4 @@
-; RUN: opt < %s -globalopt -S | grep "@X = internal unnamed_addr global i32"
+; RUN: opt < %s -globalopt -S | grep "@X = internal unnamed_addr global notaddrtaken i32"
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
target triple = "i386-apple-darwin7"
@X = internal global i32* null ; <i32**> [#uses=2]
Index: test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll
===================================================================
--- test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll (revision 192719)
+++ test/Transforms/GlobalOpt/2009-11-16-MallocSingleStoreToGlobalVar.ll (working copy)
@@ -8,7 +8,7 @@
target triple = "x86_64-apple-darwin10.0"
@TOP = internal global i64* null ; <i64**> [#uses=2]
-; CHECK: @TOP = internal unnamed_addr global i64* null
+; CHECK: @TOP = internal unnamed_addr global notaddrtaken i64* null
@channelColumns = internal global i64 0 ; <i64*> [#uses=2]
; Derived from @DescribeChannel() in yacr2
Index: test/Transforms/GlobalOpt/integer-bool.ll
===================================================================
--- test/Transforms/GlobalOpt/integer-bool.ll (revision 192719)
+++ test/Transforms/GlobalOpt/integer-bool.ll (working copy)
@@ -4,7 +4,7 @@
@G = internal addrspace(1) global i32 0
; CHECK: @G
; CHECK: addrspace(1)
-; CHECK: global i1 false
+; CHECK: global notaddrtaken i1 false
define void @set1() {
store i32 0, i32 addrspace(1)* @G
Index: test/Transforms/GlobalOpt/unnamed-addr.ll
===================================================================
--- test/Transforms/GlobalOpt/unnamed-addr.ll (revision 192719)
+++ test/Transforms/GlobalOpt/unnamed-addr.ll (working copy)
@@ -6,10 +6,10 @@
@d = internal constant [4 x i8] c"foo\00", align 1
@e = linkonce_odr global i32 0
-; CHECK: @a = internal global i32 0, align 4
+; CHECK: @a = internal global notaddrtaken i32 0, align 4
; CHECK: @b = internal global i32 0, align 4
-; CHECK: @c = internal unnamed_addr global i32 0, align 4
-; CHECK: @d = internal unnamed_addr constant [4 x i8] c"foo\00", align 1
+; CHECK: @c = internal unnamed_addr global notaddrtaken i32 0, align 4
+; CHECK: @d = internal unnamed_addr constant notaddrtaken [4 x i8] c"foo\00", align 1
; CHECK: @e = linkonce_odr global i32 0
define i32 @get_e() {
Index: test/Transforms/GlobalOpt/globalsra-unknown-index.ll
===================================================================
--- test/Transforms/GlobalOpt/globalsra-unknown-index.ll (revision 192719)
+++ test/Transforms/GlobalOpt/globalsra-unknown-index.ll (working copy)
@@ -1,5 +1,5 @@
; RUN: opt < %s -globalopt -S > %t
-; RUN: grep "@Y = internal unnamed_addr global \[3 x [%]struct.X\] zeroinitializer" %t
+; RUN: grep "@Y = internal unnamed_addr global notaddrtaken \[3 x [%]struct.X\] zeroinitializer" %t
; RUN: grep load %t | count 6
; RUN: grep "add i32 [%]a, [%]b" %t | count 3
More information about the llvm-commits
mailing list