[PATCH] D68337: [ARM][MVE] Enable extending masked loads

Mon Oct 7 03:12:52 PDT 2019

dmgreen added subscribers: RKSimon, craig.topper.
dmgreen added a comment.

Nice. I think this is looking good, just some details to sort out, like what to do about the target independent parts.

We will presumably want to add the pre and post inc to these in the future too, which will probably bring up the same kinds of questions.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:9283
+                                        ISD::NodeType ExtOpc) {
+  if (!TLI.isLoadExtLegal(ExtLoadType, VT, N0.getValueType()))
+    return SDValue();
----------------
samparker wrote:
> dmgreen wrote:
> > Is it true that whenever you have a legal extending load, you will also have the equivalent legal extending masked load? (For MVE we do, but is that true for all archs?)
> > 
> > Do we need to add an extra set of flags for this? Or is isVectorLoadExtDesirable good enough to handle these cases when there is an asymmetry?
> Yes, we can't expect that it's true for everything. I don't understand why the APIs generally like to pass lots of arguments instead of just passing, say the load that you'd want to inspect... So hopefully both these calls will cover all cases and I'd like to avoid adding another flag. That or I could just change isLoadExtLegal to take the LoadSDNode, but I've assumed these calls are designed like they are for reason...
They refer back to the LoadExtActions, which are set by setLoadExtAction in ISel. We may need more flags on there to specify the difference between the masked loads and the normal loads.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:8887
   // zero too, and other values are lowered to a select.
   SDValue ZeroVec = DAG.getNode(ARMISD::VMOVIMM, dl, VT,
                                 DAG.getTargetConstant(0, dl, MVT::i32));
----------------
This is creating a zero vector of size VT, which is the size of what the masked loads returns. Should it instead be the size of the memory being loaded (because the extend happens to the passthru as well)? What happens if that isn't a legal value type?

================
Comment at: lib/Target/ARM/ARMInstrMVE.td:5196
   def : MVE_vector_maskedload_typed<v4f32, MVE_VLDRWU32, alignedmaskedload32, 2>;
+  // Extending masked loads.
+  def : Pat<(v8i16 (sextmaskedload8 t2addrmode_imm7<0>:$addr, VCCR:$pred,
----------------
There likely needs to be an anyext too. Can (or is it beneficial for) these be merged into the MVEExtLoad multiclass below?

================
Comment at: lib/Target/ARM/ARMInstrMVE.td:5203
+            (v4i32 (MVE_VLDRBS32 t2addrmode_imm7<0>:$addr, (i32 1), VCCR:$pred))>;
+  def : Pat<(v4i32 (sextmaskedload16 t2addrmode_imm7<0>:$addr, VCCR:$pred,
+                    (v4i32 NEONimmAllZerosV))),
----------------
dmgreen wrote:
> t2addrmode_imm7<0> -> t2addrmode_imm7<1>, for a VLDRH. Same below.
Edit: You beat me to it.  Can you add some tests?

================
Comment at: lib/Target/ARM/ARMTargetTransformInfo.cpp:511
+      // Only support extending integers if the memory is aligned.
+      if ((EltWidth == 16 && Alignment < 2) ||
+          (EltWidth == 32 && Alignment < 4))
----------------
samparker wrote:
> dmgreen wrote:
> > If this is coming from codegen, can the alignment here be 0? I think in ISel it is always set (and clang will always set it), but it may not be guaranteed in llvm in general.
> I can't see anything in the spec for any guarantees of these intrinsics, but for normal loads, it becomes defined by the target ABI. It's always safe for us to use a i8* accessor, so I don't see 0 being a problem here.
Yeah. Alignment of 0 means ABI alignment, which means 8, not unaligned.

I think it may be better to just check this alignment is always the case, getting rid of that weird "use i8's to load unaligned masked loads" thing. That was probably a bad idea, more trouble than it's worth.

I think what will happen here at the moment is that the Vectorizer will call isLegalMaskedLoad with an scalar type and an alignment (which, lets say is unaligned). That alignment won't be checked so the masked loads and stores will be created. Then when we get to the backend the legalizer will call this with a vector type and we'll hit this check, expanding out the masked load into a that very inefficient bunch of code. Which is probably something that we want to avoid.

================
Comment at: test/CodeGen/Thumb2/mve-masked-load.ll:903
 ; CHECK-LE:       @ %bb.0: @ %entry
-; CHECK-LE-NEXT:    vmov.i32 q1, #0x0
 ; CHECK-LE-NEXT:    vpt.s8 gt, q0, zr
 ; CHECK-LE-NEXT:    vldrbt.u8 q0, [r0]
----------------
Nice :)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68337/new/

https://reviews.llvm.org/D68337