[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

Jiangning Liu liujiangning1 at gmail.com
Tue Apr 15 23:12:49 PDT 2014


Hi Quentin,

Thanks for your feedback!

> ​ARM64 generates pseudo instructions ARM64::MOVaddr and friends in ISEL
> stage, which intends to guarantee address serialization (page address +
> in-page address), and exposes adrp finally by pass ExpandPseudoInsts. The
> assumption of ARM64 solution is we don't know the in-page offset can be
> fused into load/store or not at compile time, and this assumption would
> turn to be not true any longer for the solution of using global merge as I
> proposed with the patch.
>
> I think this is orthogonal. If you happen to merge globals they will have
> the same base address (i.e., the same pseudo instruction) but different
> offsets.
> CSE and such will work like a charm for the pseudos.
>

This is probably not true. Global merge pass happens in PreIsel stage. For
my test case at http://reviews.llvm.org/D3223, after applying the patch, we
will have LLVM IR as below,

  store i32 %a1, i32* getelementptr inbounds ({ i32, i32, i32 }*
@_MergedGlobals_x, i32 0, i32 0), align 4
  store i32 %a2, i32* getelementptr inbounds ({ i32, i32, i32 }*
@_MergedGlobals_x, i32 0, i32 1), align 4

and after ISEL stage, we can see different Machine Instructions generated
for AArch64 and ARM64.

AArch64:

        %vreg4<def> = ADRPxi <ga:@_MergedGlobals_x>; GPR64noxzr:%vreg4
        LS32_STR %vreg3, %vreg4, <ga:@_MergedGlobals_x>[TF=11]
        %vreg5<def> = ADDxxi_lsl0_s %vreg4, <ga:@_MergedGlobals_x>[TF=11];
GPR64noxzr:%vreg5,%vreg4
        LS32_STR %vreg2, %vreg5<kill>, 1

ARM64:

        %vreg2<def> = ADRP <ga:@_MergedGlobals_x>[TF=1]; GPR64common:%vreg2
        STRWui %vreg0, %vreg2<kill>, <ga:@_MergedGlobals_x>[TF=18]
        %vreg3<def> = MOVaddr <ga:@_MergedGlobals_x>[TF=1],
<ga:@_MergedGlobals_x>[TF=18];
GPR64common:%vreg3
        STRWui %vreg1, %vreg3<kill>, 1

The problem is MOVaddr generated for ARM64  implies introducing adrp in
ExpandPseudoInsts pass again, although at this moment we don't really see
redundant ADRP yet. AArch64 is using ADDxxi_lsl0_s instead, and it will be
folded into LS32_STR finally.

Assuming you emit the right instructions at isel time, you will create
> ADRP, LOADGot, or ADD with symbols. Since you do not know anything on the
> symbols, CSE will match only the ones that are identical.
>

This is correct.


> You will have a finer granularity to do CSE, but I am not sure it will
> help that much.
>

The 'CSE' here is a term only rather than the traditional CSE. Since global
variables are merged into a monolithic data structure, the we will be able
to generate only one base address (page address) for all of those global
variables.


> On the other hand, you lose the rematerialization capability, because that
> feature can only handle one instruction at a time. So you will still be
> able to rematerialize ADRP but not the LOADGot and ADD with symbols.
>

Yes, but this depends on register pressure, and it's hard to tell
rematerialization is always good.

> If simply apply the global merge solution to ARM64, probably we should
> avoid generating pseudo instruction MOVaddr and friends in ISEL stage, but
> I'm not sure if the LOH solution would still work or not, because,
> 1) ARM64 link-time optimization depends on LOH.
> 2) We don't see linker plug-in in LLVM trunk and it would be hard for me
> to verify any thoughts.
>
> The LOH solution is also orthogonal. You can see that as a last chance way
> to optimize those accesses.
> That said, if you CSE the ADRP and not the LOADGot, you will indeed create
> far less candidates for the LOHs because you will have ADRPs with several
> uses, which is not supported by LOHs.
>

Yes. This is just what I'm worrying about. So essentially those two
optimizations have conflict.


> FYI, the LOH optimization is not a link-time optimization in LLVM, this is
> really a link-time optimization: on the binary.
>

Yes. I see.

> Any concrete suggestion of combining those different ADRP CSE solutions
> and tests would be appreciated!
>
> The bottom line is whatever you are doing with merge globals, it is
> orthogonal with LOHs.
> That said I think it is best to keep the pseudo instructions.
>

Well, if we keep the pseudo instruction MOVaddr, we would have to keep adrp
and expose it to binary, so it would lose the opportunity of removing
redundant adrp at compile-time.


> Of course I may be wrong and the best way to check would be to measure
> what happens if you get rid of the pseudo instructions. Do not be too
> concerned with the impact on the LOHs.
>

Since compile-time ADRP CSE is not so powerful as link-time ADRP removal, I
don't want to hurt link-time solution.


>
> Thanks,
> -Quentin
>
>
> Thanks,
> -Jiangning
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140416/36477d3d/attachment.html>


More information about the llvm-dev mailing list