[PATCH] Implement ADRP CSE for global symbols

Thu Apr 24 02:58:09 PDT 2014

================
Comment at: lib/Transforms/Scalar/GlobalMerge.cpp:167
@@ -161,1 +166,3 @@
+
+    bool IsExternal = Globals[i]->hasExternalLinkage();
     for (j = i; j != e; ++j) {
----------------
Quentin Colombet wrote:
> Should be hoisted outside of the loop, since we assert this does not change within the loop.
Accept and fixed by the new version uploaded.

================
Comment at: lib/Transforms/Scalar/GlobalMerge.cpp:188
@@ +187,3 @@
+    // the symbol after merging.
+    GlobalValue::LinkageTypes Linkage = IsExternal ?
+                                          GlobalValue::ExternalLinkage :
----------------
Quentin Colombet wrote:
> Ditto.
Accept and fixed by the new version uploaded.

================
Comment at: lib/Transforms/Scalar/GlobalMerge.cpp:203
@@ +202,3 @@
+    if (EnableGlobalMergeOnExternal && IsExternal) {
+      // If the alignment is not a power of 2, round up to the next power of 2.
+      uint64_t Align = MergedSize;
----------------
Quentin Colombet wrote:
> This is the alignment thing I was talking about.
Hi Quentin,

Appreciate for your further feedback!

I understand your point. Actually I think we have three solutions to solve this ADRP issue, let me summarize them, and make sure our discussion is on the same page.

1) Materialize the base address of merged global data structure with a single register. MOVAddr in ARM64 just does this work. This is essentially to use "adrp + add" to compute the base of merged global data structure. This way, the merge global data structure needn't the structure size alignment, but the natural alignment. The disadvantage of this solution is, at compile time we don't really know if the offset in add instruction can really be propagated and combined into load/store instructions, because it might be larger than a page size. For this solution the instruction sequence could be like below, (Suppose we have three loads)

adrp r0, MergedGlobal at PAGE
add r1, r0, MergedGlobal at PAGE_OFF
load [r1, var1_offset_within_MergedGlobal]
load [r1, var2_offset_within_MergedGlobal]
load [r1, var3_offset_within_MergedGlobal]

Refer to the chart below as well,

            page 1                     page 2
|---------------------------------|---------------------------------|
r0
------------------------------r1-----------|   // merged global data structure
----------------------------s-------e
var1_offset_within_MergedGlobal = e - s

    (MergedGlobal at PAGE_OFF + var1_offset_within_MergedGlobal) > PageSize

2) We don't use a single register to describe the base address of merged global data structure at all. If we can guarantee the merged global data structure is always within a page, and doesn't cross page boundary, we would be able make sure the offset from the page boundary is always smaller than a page, so at compile time, the offset to page boundary would be able to be fused into load/store instruction directly. Setting the alignment to be the maximum merged data structure size could guarantee the merged data structure doesn't cross page boundary. The disadvantage of this solution is, we would probably increase run-time memory consumption, because there might be some "holes" in data section. With this solution, we would have the following instruction sequence finally,

adrp r0, MergedGlobal at PAGE
load [r0, var1_offset_within_PAGE]
load [r0, var2_offset_within_PAGE]
load [r0, var3_offset_within_PAGE]

Refer to the chart below as well,

            page 1                     page 2
|---------------------------------|---------------------------------|
r0
--------------------|----------|    // merged global data structure
s----------------------e
var1_offset_within_PAGE =  e - s

3) We use link-time solution to help the removal of ADRP and ADD. Link-time optimization might not be cost-less because the dead instruction can only be replaced with NOP instruction.

Both solution 1) and 2) can remove some ADRPs statically. And the ADD introduced in solution 1) could probably be reduced by solution 3) as well.

Do you think the disadvantage with solution 2) is unacceptable?

Thanks,
-Jiangning

http://reviews.llvm.org/D3432