[PATCH] [x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507)

Wed May 6 08:58:07 PDT 2015

In http://reviews.llvm.org/D9504#166604, @RKSimon wrote:

> Thanks Sanjay, in http://reviews.llvm.org/D9095 I added some tests to sse-scalar-fp-arith.ll for llvm.sqrt.f32 / llvm.sqrt.f64 tests - these don't appear to be optimized by this patch - is this something that could be easily added? Feel free to transfer them to sse-scalar-fp-arith-unary.ll.

Ah, the sqrt IR intrinsics as opposed to the sqrt SSE intrinsics. I didn't consider them specifically, but I did look at patterns that would match the X86frcp and X86frsqrt SDNodes + blend/move. It seemed to me that the conditions needed to produce that pattern were so far-fetched (-ffast-math + reciprocals enabled + NR refinement turned off + a scalar op in the middle of vector code?) that it wasn't worth the effort.

I'm probably not being imaginative enough, but here's my thinking: in order to get a scalar sqrt *IR* intrinsic from/to vector operands from C source, a coder would have to be explicitly using SSE intrinsics and then throw a libm sqrt() call into the mix. If the coder used _mm_sqrt_ss(), we wouldn't see a sqrt IR intrinsic. If the code was auto-vectorized, we also wouldn't see a scalar sqrt intrinsic; it would be a vector intrinsic, and then we wouldn't see this insert/extract pattern where we're just operating on the scalar lane?

That said, if this is a common enough occurrence, then what I'd hope to do is just add more defm lines instead of duplicating the multiclass of patterns to match SDNodes rather than Intrinsics, eg:

  defm : scalar_unary_math_patterns<fsqrt, "SQRTSD", X86Movsd, v2f64, UseSSE2>;

...but I'm not sure how to do that in tablegen. Any suggestions?

http://reviews.llvm.org/D9504

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/