[PATCH] D57054: [MachineOutliner][ARM][RFC] Add Machine Outliner support for ARM
Yvan Roux via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 23 07:10:07 PST 2019
yroux marked 17 inline comments as done.
yroux added a comment.
Thanks Tim, Jessica and Eli for the quick review!
I agree with the overall comment that mixing ARM and Thumb makes things long and hard to follow I'll split some parts and add some helpers as suggested.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5127
+///
+/// * Call construction overhead: 3 (save + BL + restore)
+/// * Frame construction overhead: 1 (BX)
----------------
efriedma wrote:
> You probably want to describe the overhead in bytes, not instructions, since they're substantially different in Thumb mode.
It'd make more sense, indeed.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5298
+ unsigned FrameID = MachineOutlinerDefault;
+ unsigned NumBytesToCreateFrame = Thumb ? 2 : 4;
+
----------------
t.p.northover wrote:
> I think we should probably try to do something about these bare numbers as costs. Especially as things are pretty complicated with ARM/Thumb2/Thumb1. It's sometimes not quite clear just what cost we're talking about.
>
> Maybe a special struct type with named costs that can be referred to when needed?
>
>
> struct OutlineCosts {
> int TailCall;
> int Call;
> int SaveLRToStack;
> [...]
> };
I agree, I will give it a try.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5302
+ // tail call all of the candidates.
+ if (Subtarget.supportsTailCall()
+ && RepeatedSequenceLocs[0].back()->isTerminator()) {
----------------
efriedma wrote:
> supportsTailCall is more general than you need for MachineOutlinerTailCall; on Thumb1, you can outline a tail call if LR isn't live-in (if the terminator is a tPOP_RET etc.).
Indeed, I don't even understand why SupportsTailCall is defined as !isThumb() || hasV8MBaselineOps();
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5310
+ else if (Subtarget.supportsTailCall()
+ && (LastInstrOpcode == ARM::BL || LastInstrOpcode == ARM::BLX
+ || LastInstrOpcode == ARM::tBL)) {
----------------
efriedma wrote:
> supportsTailCall is a bit more general than you need for MachineOutlinerThunk; rewriting tBLX to tBX is legal in Thumb1. Also, missing check for tBLX (both here and elsewhere).
Oh, yes!
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5428
+bool ARMBaseInstrInfo::checkAndUpdateStackOffset(MachineInstr *MI,
+ int64_t fixup,
+ bool Updt) const {
----------------
paquette wrote:
> Capitalization
OK
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5528
+ case ARM::t2IT:
+ case ARM::tPICADD:
+ case ARM::PICADD:
----------------
efriedma wrote:
> This list isn't complete for Thumb; missing tLDRpci_pic, t2LDRpci_pic, t2MOVi16_ga_pcrel, t2MOVTi16_ga_pcrel.
OK, I'll add these.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5704-5706
+ if (!MI.mayLoadOrStore()) {
+ continue;
+ }
----------------
paquette wrote:
> Don't need brackets here.
OK
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5722
+ if (Thumb)
+ TC = BuildMI(MF, DebugLoc(), get(ARM::tTAILJMPdND))
+ .add(Call->getOperand(2))
----------------
efriedma wrote:
> t.p.northover wrote:
> > I think it's more idiomatic to use BuildMI with an MBB and an iterator rather than separate build & insert steps.
> There are two different Thumb direct tail-call opcodes: tTAILJMPdND and tTAILJMPd; the latter is used for MachO targets because the assembler doesn't support relaxing a Thumb1 jump instruction.
Tim, my intention is to stay close to AArch64 machine outliner code, such that improvements can be easily added to both targets, what about keeping it like that in a first step and then switch both targets to the idiomatic form ?
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5722
+ if (Thumb)
+ TC = BuildMI(MF, DebugLoc(), get(ARM::tTAILJMPdND))
+ .add(Call->getOperand(2))
----------------
yroux wrote:
> efriedma wrote:
> > t.p.northover wrote:
> > > I think it's more idiomatic to use BuildMI with an MBB and an iterator rather than separate build & insert steps.
> > There are two different Thumb direct tail-call opcodes: tTAILJMPdND and tTAILJMPd; the latter is used for MachO targets because the assembler doesn't support relaxing a Thumb1 jump instruction.
> Tim, my intention is to stay close to AArch64 machine outliner code, such that improvements can be easily added to both targets, what about keeping it like that in a first step and then switch both targets to the idiomatic form ?
OK, I'll add the MachO version
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5755
+
+ // Insert a save before the outlined region
+ MachineInstr *Save;
----------------
t.p.northover wrote:
> Comment doesn't quite fit here, and code should maybe be factored into a helper function.
hmm, maybe the comment can be more precise like "Create and Insert a save before outlined region." but I find it coherent with the code. Anyway, I agree with the overall comment that these functions are too long and a helper function would make it clearer.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5828
+ //current feature set.
+ if (!Subtarget.isThumb1Only()) {
+ MachineInstr *ret = BuildMI(MF, DebugLoc(),
----------------
efriedma wrote:
> Not sure why you aren't using tPOP_RET for Thumb2.
I'm not sure myself, I did it a while ago and need to re-think about it.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5844-5846
+MachineBasicBlock::iterator ARMBaseInstrInfo::insertOutlinedCall(
+ Module &M, MachineBasicBlock &MBB, MachineBasicBlock::iterator &It,
+ MachineFunction &MF, const outliner::Candidate &C) const {
----------------
paquette wrote:
> This function is pretty long. I think it'd be a bit easier to follow if it was split into the Thumb and non-Thumb cases if possible.
>
> Maybe something like a `insertOutliner[WhateverInstructionWe'reAdding]Inst` which returns the next value of `It` would work? E.g
>
> ```
> if (C.CallConstructionID == MachineOutlinerTailCall) {
> It = insertOutlinerTailCallInst(Thumb);
> return It;
> }
>
> if (C.CallConstructionID == MachineOutlinerNoLRSave ||
> C.CallConstructionID == MachineOutlinerThunk) {
> It = insertOutlinerCallInst(Thumb);
> return It;
> }
>
> ...
>
> ```
My first implementation for ARM mode only was made inside ARMInstrInfo.cpp, then I moved everything here to avoid duplication between ARM and Thumb, but I agree that splitting it a bit and adding some helpers as suggested by Tim, will make it easier to follow.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5893
+ if (Thumb) {
+ Save = BuildMI(MF, DebugLoc(), get(ARM::tMOVr), Reg)
+ .addReg(ARM::LR)
----------------
t.p.northover wrote:
> This probably ought to be a copyPhysReg, if you look in Thumb1InstrInfo are some really weird edge cases on old CPUs. As a bonus you don't even need to care about ARM vs Thumb if you do that.
OK will do.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5921-5923
+ Save = BuildMI(MF, DebugLoc(), get(ARM::tPUSH))
+ .add(predOps(ARMCC::AL))
+ .addReg(ARM::LR);
----------------
t.p.northover wrote:
> I think we need to talk about stack alignment here. This sequence misaligns the stack on ARM and Thumb2.
>
> If you take the view that the call to `OUTLINED_FUNCTION` is not a public interface (plausible, but by no means a definite) then the corresponding `push` for a nested call in `buildOutlinedFrame` will realign the stack except on Apple's watchOS ABI (armv7k, which has a 16-byte requirement).
>
> If it is public, then this definitely has to change.
>
> Either way, it should definitely be noted in comments.
Correct me Jessica if I'm wrong, my understanding of outlined functions is that they are not public interfaces and don't even respect the ABI but it is not an issue since we have the control on all the call sites, but I didn't thought of such cases of stack misalignment and targets requirement. I need to think more about it, but I agree at least a comment is needed.
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5966
+ && Subtarget.isThumb1Only()) {
+ It++;
+ It = MBB.insert(It, BuildMI(MF, DebugLoc(), get(ARM::tMOVr))
----------------
t.p.northover wrote:
> Indentation.
OK
================
Comment at: lib/Target/ARM/ARMBaseInstrInfo.cpp:5967-5970
+ It = MBB.insert(It, BuildMI(MF, DebugLoc(), get(ARM::tMOVr))
+ .addReg(ARM::LR, RegState::Define)
+ .addReg(ARM::R7, RegState::Kill)
+ .add(predOps(ARMCC::AL)));
----------------
t.p.northover wrote:
> Haven't we just popped what originally came from r7? And If there's no frame pointer couldn't r7 be a normal register that has changed inside the outlined function?
>
> Also, I think BuildMIs that insert at the same time would definitely be clearer in this function. In outline:
>
> // Do all kinds of save
> if (Thumb)
> BuildMI(...);
> else
> BuildMI(...);
>
> // Insert call
> BuildMI(....);
>
> // Do all kinds of restore
> if (Thumb)
> BuildMI(...);
> else
> BuildMI(...);
>
> I'd be pretty tempted to add virtual helper functions actually: `spillLRToSTack` and `restoreLRFromStack`.
Ah yes, as I said in the intro Thumb1 wasn't really tested and this might be a remaining of my experiments, I'll re-design this part as suggested, and pick-up a register which is not used inside the outlined region when there is no frame pointer.
================
Comment at: lib/Target/ARM/ARMTargetMachine.cpp:242
+ setMachineOutliner(true);
+ setSupportsDefaultOutlining(true);
}
----------------
paquette wrote:
> Hmmm, this is somewhat bold. I (conservatively) think that it'd be good to split turning it on by default into a separate patch.
Oh yes, that was just used for testing, turning it on by default and bundle it with -Oz will come later.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57054/new/
https://reviews.llvm.org/D57054
More information about the llvm-commits
mailing list