[PATCH] D44585: [AMDGPU] Scalarize when scalar code cheaper than vector code.

Fri Mar 16 14:19:22 PDT 2018

FarhanaAleen created this revision.
FarhanaAleen added a reviewer: arsenm.
FarhanaAleen created this object with visibility "All Users".
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, kzhuravl.

Vector code following shuffles can generate more instructions than scalar code which optimizes way the shuffles most of the time.

Here is an example of vector pattern:

  vec2 = shuffle();
  add = vadd vec1, vec2
  res = extract_vector_elt(add, idx)

Depending on the shuffle mask there can be 1-3 instructions needed for the shuffle.

For the above kind of example pattern, scalar code can have less or equal number of instructions as vector code.
Scalar code:

  vec1 = extract_vector_elt;
  vec2 = extract_vector_elt;
  res = add vec1, vec2

https://reviews.llvm.org/D44585

Files:
  lib/Target/AMDGPU/SIISelLowering.cpp
  test/CodeGen/AMDGPU/scalarize.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D44585.138767.patch
Type: text/x-patch
Size: 5312 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180316/4d3c2f57/attachment.bin>