[PATCH] D70841: [ARM][MVE] Sink vector shift operand

Tue Dec 3 00:24:41 PST 2019

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:14746
+    case Instruction::LShr:
+    case Instruction::AShr:
       return Operand == 1;
----------------
This bit looks like a good change on its own, if you want to split that out ( with the tests) and commit it separately.

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:4072
+
+def splat32_negated : PatFrag<(ops node:$val),
+                              (sub (ARMvmovImm 0), (ARMvdup node:$val))>;
----------------
samparker wrote:
> dmgreen wrote:
> > I think all of this should be happening in a DAG combine, to give the negate a chance to fold into something else.
> > 
> > It's generally not a good idea to create multiple outputs in a tablegen pattern. Ideally thy should be one/multi inputs to one output, with the other optimisations happening in dag combine to allow extra folding to happen.
> > 
> > I guess ideally this would happen much earlier, like in instcombine to allow all that folding goodness. Not sure how to make that happen sensibly though.
> I thought that these negations were generated by us in lowering/combining? I haven't looked much into the lowering code, but I really would hope that we could remove some of the custom nodes... So I would be very hesitant to introduce yet more custom nodes, to get around the custom nodes that they've already introduced! I also don't see why more c++ would be better than having all the related patterns here, in a concise readable format.
I'm not sure which custom nodes you mean. There was a SPLATVECTOR added recently, which could replace ARMvdup. That would be good to use if it works enough yet (when I looked it seemed like it might only work for scalables, not sure). Shifts are a bit more awkward though, with the negated conditions. If you can make them work then that sounds useful, but I have a memory of trying that initially but eventually just using the custom nodes, the same as neon.

My understanding is that, at least in isel, tablegen is considered only as the last step. There is no folding that happens after it, on machine instructions. (Which would be useful in places, but sounds like a lot of work, having to reimplement it for all targets). So the graph going into lowering needs to already be optimised. Unlike with legalising you can't just make a mess and have it cleaned up after. 

If you generate a shift _and_ a rsb, that's what you end up with even if the rsb can be folded into something else. We just need to give it that chance to be folded and that can only happen in dag combine.

So I think there should be a (vdup(neg)) -> (neg(vdup)) combine. It shouldn't be too much code, as far as I understand. It might even be less than all this tablegen.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70841/new/

https://reviews.llvm.org/D70841