[PATCH] D56118: [ARM]: Add optimized NEON uint64x2_t multiply routine.
easyaspi314 (Devin) via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Dec 29 17:55:32 PST 2018
easyaspi314 planned changes to this revision.
easyaspi314 marked 3 inline comments as done.
easyaspi314 added a comment.
I am going to add constant interleaving, more tests, and call it done. I think that `twomul` and interleaved loads are much smaller and lower priority optimizations, and something I am not comfortable enough to do myself.
================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:7465
+ KnownBits topKnown = DAG.computeKnownBits(top);
+ KnownBits botKnown = DAG.computeKnownBits(bot);
+
----------------
efriedma wrote:
> Alternatively, we could try to handle simplifications by adding a DAGCombine for VSHRu and VMULLu nodes with constant operands, instead of trying to avoid emitting them. The end result is roughly the same (it's probably unlikely to construct a trivial VSHRu or VMULLu any other way), but it might be easier to understand. But I guess it would get tricky if we tried to emit the form using "vrev64". Probably okay as-is.
>
> Needs more test coverage for various cases where the upper/lower half an operand is zero.
This is the exact same method used in X86ISelLowering.cpp/LowerMUL, except for the early return to work around missing optimizations later on.
If we use the `twomul`/`vrev64` version, we would only use it if none of them are zero and none could be pre-interleaved. Once things get interleaved and/or we eliminate one of the multiplies, we lose the benefit.
================
Comment at: lib/Target/ARM/ARMTargetTransformInfo.cpp:528
+ { ISD::UMULO, MVT::v2i64, 11},
+ { ISD::SMULO, MVT::v2i64, 11},
};
----------------
efriedma wrote:
> Did you really mean UMULO (llvm.umul.with.overflow)? I mean, I guess it's theoretically possible to implement, but I think we'll just scalarize it at the moment.
>
> (The cost model changes need test coverage in test/Analysis/CostModel/ARM/ .)
Oh yeah, I did that wrong. Should be normal multiply.
================
Comment at: test/CodeGen/ARM/vmul.ll:69
+;CHECK: vld1.64 {d20, d21}, [r0]
+;CHECK; vand q8, q10, q8
+;CHECK: vmovn.i64 d18, q9
----------------
craig.topper wrote:
> efriedma wrote:
> > DAGCombine should be able to catch the redundant AND... but it looks like DAGCombiner::visitTRUNCATE doesn't try to handle demanded bits for vectors. (I guess it didn't get updated when other operations got support for vector operands?)
> PR39689 mentions this is disabled for vectors. Maybe @rksimon or @spatel are working on it?
That would explain why it was choking on this when X86 does not.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D56118/new/
https://reviews.llvm.org/D56118
More information about the llvm-commits
mailing list