[llvm-dev] GlobalAddress lowering strategy

Wed May 16 02:36:13 PDT 2018

I've been looking at GlobalAddress lowering strategy recently and was
wondering if anyone had any ideas, insights, or experience to share.

## Background
When lowering global address, which is typically done in
FooTargetLowering::LowerGlobalAddress you have the option of folding
in the global offset into the global address or else emitting the base
globaladdress and a separate ADD node for the offset. Which is best
depends on the references to the base GlobalAddress within the
function. AArch64 recently gained a DAGCombine for folding offsets
into addresses where all users are of the form (globaladdr + constant)
<https://reviews.llvm.org/rL330630>. We've been looking at the best
GlobalAddress lowering strategy for RISC-V here
<https://reviews.llvm.org/D45748> (thanks Sameer!), and that
discussion has prompted me to reach out here on llvm-dev.

For a RISC target I'd suggest that the ideal strategy would be:
1. If the global base has only a single reference across the whole
function, or every reference has the same offset then combine the
global with offset
2. If the global base has multiple references with different offsets
then never combine the global with the offset. MachineCSE can remove
redundant instructions.

It isn't straightforward to implement such a strategy due to the
basic-block granularity of the SelectionDAG and lack of use
information for GlobalAddress values.

I was wondering whether anybody has looked into this sort of issue for
an in-tree or out-of-tree backend, or had any thoughts on addressing
it. The numbers for introducing performDAGCombine to the AArch64
backend certainly indicate that performing the combine is a net win
(46KB reduction in .text size of chromium), but it would interesting
to look at addressing cases where the combine is counterproductive.

## Appendix: example 1
For the following code snippet, folding the offset into the global is
ideal and AArch64 will choose to do so:

@foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0],
align 2

define i32 @main() nounwind {
entry:
  %0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]*
@foo, i32 0, i32 4), align 2
  %cmp = icmp eq i16 %0, 140
  br i1 %cmp, label %if.end, label %if.then

if.then:
  tail call void @abort()
  unreachable

if.end:
  ret i32 0
}

declare void @abort()

## Appendix: example 2

For this example, you produce fewer instructions if you don't fold the
offset into the global and instead rely on MachineCSE to remove
redundant instructions for forming the base global address. AArch64
will fold in the offset in performGlobalAddressCombine.

@a = global [4 x i32] zeroinitializer, align 4

; Function Attrs: noreturn nounwind
define i32 @main() nounwind {
entry:
  %0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 0), align 4
  %cmp = icmp eq i32 %0, 0
  br i1 %cmp, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  tail call void @abort() #3
  unreachable

if.end:                                           ; preds = %entry
  %1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 1), align 4
  %cmp1 = icmp eq i32 %1, 3
  br i1 %cmp1, label %if.end3, label %if.then2

if.then2:                                         ; preds = %if.end
  tail call void @abort() #3
  unreachable

if.end3:                                          ; preds = %if.end
  %2 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 2), align 4
  %cmp4 = icmp eq i32 %2, 2
  br i1 %cmp4, label %if.end6, label %if.then5

if.then5:                                         ; preds = %if.end3
  tail call void @abort() #3
  unreachable

if.end6:                                          ; preds = %if.end3
  %3 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]*
@a, i32 0, i32 3), align 4
  %cmp7 = icmp eq i32 %3, 1
  br i1 %cmp7, label %if.end9, label %if.then8

if.then8:                                         ; preds = %if.end6
  tail call void @abort() #3
  unreachable

if.end9:                                          ; preds = %if.end6
  tail call void @exit(i32 0) #3
  unreachable
}

declare void @abort()

declare void @exit(i32)