[PATCH] [AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483)

Wed Mar 4 11:01:18 PST 2015

Hi chandlerc, qcolombet, mkuper,

This patch reduces code size for all AVX targets and increases speed for some chips. 

SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and only in the "packed" float/double flavors. Scalar alias mnemonics would have cost so much...paper. But they distinguished between floats and doubles, so we should be thankful. Wait...

AVX subsequently made the instruction useful by adding a 4-register operand form.

So we just need to paper over the lack of scalar forms of this instruction, complicate the code to choose float or double forms, and use blendv on scalars since all FP is in xmm registers anyway.

This gives us an approximately 50% speed up for a blendv microbenchmark sequence on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic    : 43.15 cycles/iter

I'm not adding any new test cases because:
1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.

http://llvm.org/bugs/show_bug.cgi?id=22483

http://reviews.llvm.org/D8063

Files:
  lib/Target/X86/X86FastISel.cpp
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/fast-isel-select-sse.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D8063.21215.patch
Type: text/x-patch
Size: 17004 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150304/8bf73bea/attachment.bin>