[PATCH] D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine.

Sat Dec 29 13:27:50 PST 2018

efriedma added a comment.

Please upload patches with full context (see https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface).

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:7465
+        KnownBits topKnown = DAG.computeKnownBits(top);
+        KnownBits botKnown = DAG.computeKnownBits(bot);
+
----------------
Alternatively, we could try to handle simplifications by adding a DAGCombine for VSHRu and VMULLu nodes with constant operands, instead of trying to avoid emitting them.  The end result is roughly the same (it's probably unlikely to construct a trivial VSHRu or VMULLu any other way), but it might be easier to understand.  But I guess it would get tricky if we tried to emit the form using "vrev64".  Probably okay as-is.

Needs more test coverage for various cases where the upper/lower half an operand is zero.

================
Comment at: lib/Target/ARM/ARMTargetTransformInfo.cpp:528
+    { ISD::UMULO, MVT::v2i64, 11},
+    { ISD::SMULO, MVT::v2i64, 11},
   };
----------------
Did you really mean UMULO (llvm.umul.with.overflow)?  I mean, I guess it's theoretically possible to implement, but I think we'll just scalarize it at the moment.

(The cost model changes need test coverage in test/Analysis/CostModel/ARM/ .)

================
Comment at: test/CodeGen/ARM/vmul.ll:69
+;CHECK: vld1.64         {d20, d21}, [r0]
+;CHECK; vand            q8, q10, q8
+;CHECK: vmovn.i64       d18, q9
----------------
DAGCombine should be able to catch the redundant AND... but it looks like DAGCombiner::visitTRUNCATE doesn't try to handle demanded bits for vectors.  (I guess it didn't get updated when other operations got support for vector operands?)

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56118/new/

https://reviews.llvm.org/D56118