[PATCH] D29489: Optimize SETCC + VSEL of incompatible or illegal types

Tue Feb 28 05:51:55 PST 2017

jonpa updated this revision to Diff 90017.
jonpa added a comment.
Herald added a subscriber: jholewinski.

OK, I reworked this so that it can also handle logical combinations (and/or/xor) of two SETCCs.

I again thought it would have been nice to do this during pre type-legalize DAGCombiner, but it really must wait due to the necessity of widening, which isn't done in DAGCombiner. So this is done now in three places: when a VSELECT is handled for result widening, arg promotion or result splitting.

There was previously a handling for VSELECT in DAGCombiner, which also aimed to avoid scalarization. It didn't handle any other operand than SETCC (e.g. the AND), so I started experimenting and found that I could actually remove the splitting in DAGCombiner entirely, with improved results even. The scalarization is avoided with the new method instead.

The new WidenVSELECTAndMask() method minimizes the number of conversions between the two SETCCs and the logical op, and between the logical op and the VSELECT.

convertMask() is called to convert a e.g. SETCC or AND to the right VT.

New SystemZ tests for the AND / OR / XORs of two SETCCs:
To test all combinations would have been 480 tests per opcode, so I instead tried to find a mix that tested

1. For each vector element type, at least one compare each of Widen / Legal / Split.
2. Selects of either smaller, same, in-between or greater vector type.

This is now 180 tests without (systematic) commutation, for all three opcodes.
Is commutation needed? Should I instead test everything? (~1500 tests)

The vec-cmpsel with just cmp/select, has 109 tests.

Other tests:
AMDGPU/fmax_legacy.ll/@test_fmax_legacy_ogt_v3f32
Crashed, because it didn't work with the v3f32 type.
Added check in patch to avoid vector types which are not sized with a power of 2.

AMDGPU/vselect64.ll
Crashed, because the VSELECT should be scalarized, and type legalizer couldn't handle the output of the new method.
Added a check in patch VTWillScalarize(), which checks if a VT will be split all the way to 1 element. If so, it aborts.

NVPTX/f16x2-instructions.ll
improved and solved a TODO vectorization problem :-)

X86/2011-10-19-widen_vselect.ll:
one instruction changed place (pshufd)

X86/avx512-mask-op.ll
X86/psubus.ll
Don't know - big diff - need help.

X86/vselect-pcmp.ll
two instructions removed - legal?

https://reviews.llvm.org/D29489

Files:
  lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  lib/CodeGen/SelectionDAG/LegalizeTypes.h
  lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
  lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
  test/CodeGen/ARM/vuzp.ll
  test/CodeGen/NVPTX/f16x2-instructions.ll
  test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll
  test/CodeGen/SystemZ/vec-cmpsel.ll
  test/CodeGen/X86/2011-10-19-widen_vselect.ll
  test/CodeGen/X86/2011-10-21-widen-cmp.ll
  test/CodeGen/X86/avx512-mask-op.ll
  test/CodeGen/X86/psubus.ll
  test/CodeGen/X86/vselect-pcmp.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D29489.90017.patch
Type: text/x-patch
Size: 198746 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170228/2b1edee3/attachment-0001.bin>