[PATCH] D39510: [PPC] Use xxbrd to speed up bswap64

Thu Nov 2 00:31:22 PDT 2017

nemanjai accepted this revision.
nemanjai added a comment.
This revision is now accepted and ready to land.

This is a great idea considering direct moves are so fast on Power9. I guess we just didn't think of this use when we implemented the vector byte reversal. Thanks for doing this. Other than the rather obvious change to generate the faster `mfvsrd` instruction, this LGTM.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:8571
+  Op = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, Op,
+                   DAG.getTargetConstant(0, dl, MVT::i32));
+  return Op;
----------------
Extracting LE doubleword 1 is probably better. It'll produce `mfvsrd` rather than `mfvsrld` on LE systems. The latter uses the permute pipeline and is potentially a higher-latency instruction. And it shouldn't make a functional difference since you're populating both doublewords.

https://reviews.llvm.org/D39510