[PATCH] D68337: [ARM][MVE] Enable extending masked loads

Fri Oct 4 06:36:25 PDT 2019

samparker marked 3 inline comments as done.
samparker added inline comments.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:9283
+                                        ISD::NodeType ExtOpc) {
+  if (!TLI.isLoadExtLegal(ExtLoadType, VT, N0.getValueType()))
+    return SDValue();
----------------
dmgreen wrote:
> Is it true that whenever you have a legal extending load, you will also have the equivalent legal extending masked load? (For MVE we do, but is that true for all archs?)
> 
> Do we need to add an extra set of flags for this? Or is isVectorLoadExtDesirable good enough to handle these cases when there is an asymmetry?
Yes, we can't expect that it's true for everything. I don't understand why the APIs generally like to pass lots of arguments instead of just passing, say the load that you'd want to inspect... So hopefully both these calls will cover all cases and I'd like to avoid adding another flag. That or I could just change isLoadExtLegal to take the LoadSDNode, but I've assumed these calls are designed like they are for reason...

================
Comment at: lib/Target/ARM/ARMTargetTransformInfo.cpp:511
+      // Only support extending integers if the memory is aligned.
+      if ((EltWidth == 16 && Alignment < 2) ||
+          (EltWidth == 32 && Alignment < 4))
----------------
dmgreen wrote:
> If this is coming from codegen, can the alignment here be 0? I think in ISel it is always set (and clang will always set it), but it may not be guaranteed in llvm in general.
I can't see anything in the spec for any guarantees of these intrinsics, but for normal loads, it becomes defined by the target ABI. It's always safe for us to use a i8* accessor, so I don't see 0 being a problem here.

================
Comment at: test/CodeGen/Thumb2/mve-masked-load.ll:551
+; CHECK-LE-NEXT:    vldrbt.s16 q0, [r0]
+; CHECK-LE-NEXT:    vpsel q0, q0, q1
+; CHECK-LE-NEXT:    bx lr
----------------
dmgreen wrote:
> I don't think this vpsel should be here (it's not wrong, just inefficient, the instruction will already to this setting off predicated lanes to 0).
> 
> I'm guessing that the LowerMLOAD is creating a zero vector (that is potentially the wrong type?), so when it is called on the newly created maskedload it doesn't recognise it as 0 and we end up with the vselect being added too.
I'll have a look.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68337/new/

https://reviews.llvm.org/D68337