[PATCH] D62301: Fold Address Computations into Load/Store instructions for AArch64
Tim Northover via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 23 05:32:16 PDT 2019
t.p.northover added inline comments.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1388-1389
+// Both instructions opcodes should 'X' or 'W'
+static bool isAddrFoldableInst(MachineBasicBlock::iterator Update,
+ MachineBasicBlock::iterator I) {
+ unsigned IOpc = I->getOpcode();
----------------
I think these names could be improved. Maybe `AddrCalcI` and `MemI`?
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1401-1402
+ }
+ // will return true if both instructions are ADDWrs and (STRWui and LDRWui).
+ else if (Update->getOpcode() == AArch64::ADDWrs) {
+ switch(IOpc) {
----------------
I don't see why we're checking ADDWrs here. This is an address computation so it'll always be at 64-bits.
Even if it hypothetically wasn't (say with 32-bit pointers) the size of the arithmetic has no bearing on the type being stored, which is what the W in STRWui and LDRWui refer to.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1419-1420
+ unsigned IOpc = I->getOpcode();
+ if(UpdateOpc == AArch64::ADDXrs ||
+ UpdateOpc == AArch64::ADDWrs) {
+ switch(IOpc) {
----------------
Are you planning to add more variants in the immediate future? If not, it's probably best to convert this to an assertion.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1437
+
+// here will merge the both ADD and STR/LDR instructions if able to merge both
+// and will create new instruction.
----------------
Can we start sentences with a capital letter please.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1449
+ ++NextI;
+ // getting the immediate value of operand three
+ int Value = Update->getOperand(3).getImm();
----------------
This is not a useful comment.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1473
+ // Paired instruction.
+ MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(Opc))
+ .add(I->getOperand(0))
----------------
I don't believe this is reachable. It's certainly wrong: no ldp or stp instructions allow the sum of two registers, let alone with a shift.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1499
+
+ if(I->getOperand(1).isOnRegUseListNext()) {
+ MachineBasicBlock *MBB = I->getParent();
----------------
What's the benefit of doing the transformation at all if we don't get to remove the update instruction? I think there should be an earlier check that the load/store is the only user before doing anything.
If it remains, the test should probably be whether the register is killed by its use in `I`.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1511-1513
bool AArch64LoadStoreOpt::isMatchingUpdateInsn(MachineInstr &MemMI,
MachineInstr &MI,
unsigned BaseReg, int Offset) {
----------------
I don't think this is the right place for your code. What you're looking for is a fundamentally but not necessarily different set of mergeability criteria.
You could easily imagine someone extending this function to support `ADDXrs` with the intention to apply it in the **same** way as ADDXri, and that's what this function should be saved for.
Instead, I think we need a distinctive pair of names for what you're doing and the existing transformation. Perhaps `mergeUpdateToIndexedMemOp` and `mergeUpdateToSingleUser`, though I'm not entirely happy with those and I'll think some more.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1518-1519
+ case AArch64::ADDWrs:
+ case AArch64::ADDXrs:
+ {
+ if (!MI.getOperand(1).isReg() || !MI.getOperand(2).isReg() || !MI.getOperand(3).isImm())
----------------
LLVM style braces start on the same line as the case.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1523-1524
+ int ShiftValue = MI.getOperand(3).getImm();
+ if (ShiftValue > 4)
+ break;
+ // The update instruction source and destination register must be the
----------------
I think this needs significantly better validation. The acceptable shifts depend strongly on what load or store this is being folded into. Non-paired loads only allow shifts of 0 or the data width, and paired loads don't allow any shift.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1785-1786
if (Update != E) {
- // Merge the update into the ld/st.
- MBBI = mergeUpdateInsn(MBBI, Update, /*IsPreIdx=*/false);
- return true;
+ if (Update->getOpcode() == AArch64::ADDXri ||
+ Update->getOpcode() == AArch64::SUBXri) {
+ // Merge the update into the ld/st.
----------------
I think this needs refactoring, we shouldn't be hard-coding what `mergeUpdateInsn` supports outside that function. Instead, the merge should be able to bail gracefully when it doesn't understand ADDWrs or whatever.
================
Comment at: test/CodeGen/AArch64/fold_addressing_modes_aarch64.ll:32
+
+attributes #0 = { minsize norecurse nounwind optsize "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a53" "target-features"="+aes,+crc,+crypto,+fp-armv8,+neon,+sha2" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
----------------
You can prune all the metadata out of this, and I think we also need more than just one test. There are many different ways this transformation could potentially go wrong.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D62301/new/
https://reviews.llvm.org/D62301
More information about the llvm-commits
mailing list