[PATCH] [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations
Bill Schmidt
wschmidt at linux.vnet.ibm.com
Fri Apr 10 11:49:05 PDT 2015
Hal, thanks for the review! Responses to your comments are inline. Please let me know if I'm off base on any of them. I'll work on a new version shortly.
REPOSITORY
rL LLVM
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:21
@@ +20,3 @@
+// The extra xxswapd instructions reduce performance. This can be
+// particularly bad for vectorized code. The purpose of this pass
+// is to reduce the number of xxswapd instructions required for
----------------
hfinkel wrote:
> What does "particularly bad for vectorized code" mean? Doesn't this exclusively apply to vectorized code?
>
I suppose I had meant "autovectorized code" at the time, because the user has no way to anticipate these swaps are going to show up out of nowhere and reduce performance in a performance-enhancing optimization. However, the sentence really adds nothing, so I'll remove it. ;)
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:126
@@ +125,3 @@
+ // Walk the machine instructions to gather vector usage information.
+ // Return true iff vector mentions are present.
+ bool gatherVectorMentions();
----------------
hfinkel wrote:
> Why don't we just call these "vector instructions"? mentions sounds stilted to me.
That's fine. "Mentions" is a more or less standard term for defs U kills U uses. The important thing is to get anything that mentions a vector register, including things that just copy into or out of them. But in our current ISA, everything that lives in the Category:Altivec and Category:VSX fits this description, so "vector instructions" is equivalent.
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:167
@@ +166,3 @@
+ // Return true iff the given register is a full vector register.
+ bool isVecReg(unsigned Reg) {
+ return (isRegInClass(Reg, &PPC::VSRCRegClass) ||
----------------
hfinkel wrote:
> As a general note, this pass does not handle the case where a register has a subregister index. This should be mostly irrelevant here (we probably don't generate them for these kinds of instructions), but we also don't want to mishandle them should they occur.
>
> (maybe also give up in lookThruCopyLike should you run into a register with a subreg index too).
I don't believe any special handling is necessary for the current patch. If a subregister mention is connected, directly or indirectly, to a full register load or store, it will have to be done via an EXTRACT_SUBREG, INSERT_SUBREG, or SUBREG_TO_REG. But these are operations that kill the optimization anyway (except SUBREG_TO_REG for full 128-bit copies).
We don't want to exclude them from the analysis, because we can handle EXTRACT_SUBREG and INSERT_SUBREG by adjusting the subregister number to account for the doubleword swap, and dealing with SUBREG_TO_REG accordingly as well. I plan to incorporate this into a future patch.
However, there may be subtleties in how subregs are handled in LLVM that I'm not familiar with, so please let me know if I'm missing something.
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:224
@@ +223,3 @@
+
+ for (MachineFunction::iterator I = MF->begin(); I != MF->end();) {
+ MachineBasicBlock &MBB = *I++;
----------------
hfinkel wrote:
> range-based for?
OK -- I'll rework this and the ones below.
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:284
@@ +283,3 @@
+ // handling for this in the future.
+ SwapVector[VecIdx].IsLoad = 1;
+ break;
----------------
hfinkel wrote:
> Where to we actually reject webs with non-swapping load/stores?
>
The fact that these are not marked as either IsSwap or IsSwappable causes them to be rejected. See lines 537-538.
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:307
@@ +306,3 @@
+ // These are fine provided they are moving between full vector
+ // register classes.
+ if (isVecReg(MI->getOperand(0).getReg()) &&
----------------
hfinkel wrote:
> We should add a comment here explaining that this happens because some of the 128-bit VSX registers have 128-bit Altivec sub-registers.
OK.
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:333
@@ +332,3 @@
+ // FIXME: Is there a way we could specify this as a flag on the
+ // instruction in the tblgen description? That would seem
+ // a better approach for future maintenance, but I don't
----------------
hfinkel wrote:
> Yes, that would be better. If nothing else, we should be able to use the InstrMapping facility to generate a lookup table (this is how we currently keep track of which instructions are record forms, and the mapping between record-form and the non-record-form variants).
>
I'll look at that, thanks. There didn't seem to be a way to just create a simple flag that I could see, which would have been nice...
================
Comment at: lib/Target/PowerPC/PPCVSXSwapRemoval.cpp:459
@@ +458,3 @@
+ CopySrcReg = MI->getOperand(1).getReg();
+ else // isSubregToReg()
+ CopySrcReg = MI->getOperand(2).getReg();
----------------
hfinkel wrote:
> add an assert that isSubregToReg() is true.
OK.
http://reviews.llvm.org/D8565
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list