[llvm-dev] GlobalAddress lowering strategy
Alex Bradbury via llvm-dev
llvm-dev at lists.llvm.org
Wed May 16 02:36:13 PDT 2018
I've been looking at GlobalAddress lowering strategy recently and was
wondering if anyone had any ideas, insights, or experience to share.
## Background
When lowering global address, which is typically done in
FooTargetLowering::LowerGlobalAddress you have the option of folding
in the global offset into the global address or else emitting the base
globaladdress and a separate ADD node for the offset. Which is best
depends on the references to the base GlobalAddress within the
function. AArch64 recently gained a DAGCombine for folding offsets
into addresses where all users are of the form (globaladdr + constant)
<https://reviews.llvm.org/rL330630>. We've been looking at the best
GlobalAddress lowering strategy for RISC-V here
<https://reviews.llvm.org/D45748> (thanks Sameer!), and that
discussion has prompted me to reach out here on llvm-dev.
For a RISC target I'd suggest that the ideal strategy would be:
1. If the global base has only a single reference across the whole
function, or every reference has the same offset then combine the
global with offset
2. If the global base has multiple references with different offsets
then never combine the global with the offset. MachineCSE can remove
redundant instructions.
It isn't straightforward to implement such a strategy due to the
basic-block granularity of the SelectionDAG and lack of use
information for GlobalAddress values.
I was wondering whether anybody has looked into this sort of issue for
an in-tree or out-of-tree backend, or had any thoughts on addressing
it. The numbers for introducing performDAGCombine to the AArch64
backend certainly indicate that performing the combine is a net win
(46KB reduction in .text size of chromium), but it would interesting
to look at addressing cases where the combine is counterproductive.
## Appendix: example 1
For the following code snippet, folding the offset into the global is
ideal and AArch64 will choose to do so:
@foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0],
align 2
define i32 @main() nounwind {
entry:
%0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]*
@foo, i32 0, i32 4), align 2
%cmp = icmp eq i16 %0, 140
br i1 %cmp, label %if.end, label %if.then
if.then:
tail call void @abort()
unreachable
if.end:
ret i32 0
}
declare void @abort()
## Appendix: example 2
For this example, you produce fewer instructions if you don't fold the
offset into the global and instead rely on MachineCSE to remove
redundant instructions for forming the base global address. AArch64
will fold in the offset in performGlobalAddressCombine.
@a = global [4 x i32] zeroinitializer, align 4
; Function Attrs: noreturn nounwind
define i32 @main() nounwind {
entry:
%0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 0), align 4
%cmp = icmp eq i32 %0, 0
br i1 %cmp, label %if.end, label %if.then
if.then: ; preds = %entry
tail call void @abort() #3
unreachable
if.end: ; preds = %entry
%1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 1), align 4
%cmp1 = icmp eq i32 %1, 3
br i1 %cmp1, label %if.end3, label %if.then2
if.then2: ; preds = %if.end
tail call void @abort() #3
unreachable
if.end3: ; preds = %if.end
%2 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 2), align 4
%cmp4 = icmp eq i32 %2, 2
br i1 %cmp4, label %if.end6, label %if.then5
if.then5: ; preds = %if.end3
tail call void @abort() #3
unreachable
if.end6: ; preds = %if.end3
%3 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 3), align 4
%cmp7 = icmp eq i32 %3, 1
br i1 %cmp7, label %if.end9, label %if.then8
if.then8: ; preds = %if.end6
tail call void @abort() #3
unreachable
if.end9: ; preds = %if.end6
tail call void @exit(i32 0) #3
unreachable
}
declare void @abort()
declare void @exit(i32)
More information about the llvm-dev
mailing list