[PATCH] D63476: [ARM] DLS/LE low-overhead loop code generation
Sam Parker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 25 00:36:24 PDT 2019
samparker marked 6 inline comments as done.
samparker added a comment.
I will add some comments, but I really don't think this belongs in constant islands. This doesn't have to worry about iterative changes, the loop size may only vary by 8 bytes, which is nothing compared to the 4KB that we need to concern ourselves with. Plus this is a very specific pass, especially once we start having to handle the tail predicated loops!
================
Comment at: lib/Target/ARM/ARMInstrThumb2.td:5193
+ t2PseudoInst<(outs), (ins GPRlr:$elts, brtarget:$target),
+ 8, IIC_Br, []>, Sched<[WriteBr]>;
+
----------------
dmgreen wrote:
> 12 bytes for something that should usually take 4 seems overly pessimistic to me, and may pessimise other optimisations. Can we get this smaller somehow? I would expect the fallback to probably be (subs; bne) ideally.
12 is what this implementation will produce though in the fallback case. When we use subs, we can reduce it.
================
Comment at: lib/Target/ARM/ARMLowOverheadLoops.cpp:76
+ BBUtils = std::unique_ptr<ARMBasicBlockUtils>(new ARMBasicBlockUtils(MF));
+ BBUtils->computeAllBlockSizes();
+
----------------
dmgreen wrote:
> How expensive is this? We might as well not do it for cores that won't have low overhead loops.
Good point! I'll add a check and an exit.
================
Comment at: lib/Target/ARM/ARMLowOverheadLoops.cpp:96
+
+ auto IsLoopStart = [](MachineInstr &MI) {
+ return MI.getOpcode() == ARM::t2DoLoopStart;
----------------
dmgreen wrote:
> These don't really seem to add much to me :)
All in good time... while and the vector ones will follow. But yes, LoopDec can go.
================
Comment at: lib/Target/ARM/ARMLowOverheadLoops.cpp:135
+ // If we find that we load/store LR between LoopDec and LoopEnd, revert
+ // back to a 'normal' loop.
+ if (MI.mayLoad() || MI.mayStore())
----------------
dmgreen wrote:
> Can you explain this a little? Is it for correctness or performance? Because we can't merge the two instructions?
Sure, I'll add a comment. To summarise though, its because the decremented value of LR has probably been stored to the stack - but this decrement isn't really going to happen until the end of the block, where we can't spill. If we want the value of LR to be on the stack, we'd have to perform a manual sub. Added to this that we're probably reloading LR at the bottom of the loop for LE, we'd have to either perform an add before LE or use the other form of LE that doesn't perform the decrement.
================
Comment at: lib/Target/ARM/ARMLowOverheadLoops.cpp:148
+ if (Start || Dec || End)
+ assert((Start && Dec && End) && "Failed to find all loop components");
+ else {
----------------
dmgreen wrote:
> I think this should give a better error message in release builds. report_fatal_error or similar. Otherwise It may just miscompile. It should never happen, as far as I understand?
cheers!
================
Comment at: lib/Target/ARM/ARMLowOverheadLoops.cpp:201
+ MachineInstrBuilder MIB = InsertPt ?
+ BuildMI(*MBB, InsertPt, Start->getDebugLoc(), TII->get(ARM::t2DLS)) :
+ BuildMI(*MBB, Start, Start->getDebugLoc(), TII->get(ARM::t2DLS));
----------------
dmgreen wrote:
> This is just the same call, with one parameter different?
Yes, i know it looks awkward... I'll try to see if I can make it nicer.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D63476/new/
https://reviews.llvm.org/D63476
More information about the llvm-commits
mailing list