[llvm-dev] IR canonicalization: shufflevector or vector trunc?

Sat Jan 21 11:30:16 PST 2017

On Thu, Jan 19, 2017 at 9:17 AM, Rackover, Zvi <zvi.rackover at intel.com>
wrote:

> Hi Sanjay,
>
>
>
> I agree we should also discuss **if** this canonicalization is beneficial.
>
> For starters, do we have a concrete case where we would benefit from
> canonicalizing shuffles <-> truncates in LLVM IR?
>
> IMO, we should not count benefits for codegen because that alone does not
> justify transforming the IR ; we could always do this on the SelectionDAG.
>
>
Agreed. If we're just talking about IR benefits, then it's easy to
demonstrate a win for trunc/zext based on value tracking:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" ; little-endian

define <4 x i32> @shuffle(<4 x i64> %x) {
  %y = shl <4 x i64> %x, <i64 32, i64 32, i64 32, i64 32> ; low half of
each elt is zero
  %bc = bitcast <4 x i64> %y to <8 x i32> ; even index elements are all zero
  %trunc = shufflevector <8 x i32> %bc, <8 x i32> undef, <4 x i32> <i32 0,
i32 2, i32 4, i32 6>
  ret <4 x i32> %trunc
}

define <4 x i32> @trunc(<4 x i64> %x) {
  %y = shl <4 x i64> %x <i64 32, i64 32, i64 32, i64 32> ; low half of each
elt is zero
  %trunc = trunc <4 x i64> %y to <4 x i32> ; so this must be zero...
  ret <4 x i32> %trunc
}

$ ./opt -instsimplify 31551.ll -S
...
define <4 x i32> @shuffle(<4 x i64> %x) {
  %y = shl <4 x i64> %x, <i64 32, i64 32, i64 32, i64 32>
  %bc = bitcast <4 x i64> %y to <8 x i32>
  %trunc = shufflevector <8 x i32> %bc, <8 x i32> undef, <4 x i32> <i32 0,
i32 2, i32 4, i32 6>
  ret <4 x i32> %trunc
}

define <4 x i32> @trunc(<4 x i64> %x) {
  ret <4 x i32> zeroinitializer
}

Of course, this is something I invented as an example, but AFAIK we have
better value tracking for trunc/zext than shuffle, so we'll have an easier
time folding the IR if that is possible.

>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Tuesday, January 17, 2017 18:38
>
> *To:* Rackover, Zvi <zvi.rackover at intel.com>
> *Cc:* Friedman, Eli <efriedma at codeaurora.org>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
> We use InstCombiner::ShouldChangeType() to prevent transforms to illegal
> integer types, but I'm not sure how that would apply to vector types.
>
> Ie, let's say v256 is a legal type in your example. DataLayout doesn't
> appear to specify what configurations of a 256-bit vector are legal, so I
> don't think we can currently use that to say v2i128 should be treated
> differently than v16i16.
>
> Is this a valid argument to not canonicalize the IR?
>
>
>
> On Mon, Jan 16, 2017 at 10:16 AM, Rackover, Zvi <zvi.rackover at intel.com>
> wrote:
>
> Suppose we prefer the ‘trunc’ form, then what about cases such as:
>
> define <2 x i16> @shuffle(<16 x i16> %x) {
>
>   %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <2 x i32> <i32 0,
> i32 8>
>
>   ret <2 x i16> %shuf
>
> }
>
>
>
> Will the ‘shufflevector’ be canonicalized to a ‘trunc’ of a vector of i128?
>
> define <2 x i16> @trunc(<16 x i16> %x) {
>
>   %bc = bitcast <16 x i16> %x to <2 x i128>
>
>   %tr = trunc <2 x i128> %bc to <2 x i16>
>
>   ret <2 x i16> %tr
>
> }
>
> This may challenge the Legalizer downstream.
>
>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Friday, January 13, 2017 18:19
> *To:* Rackover, Zvi <zvi.rackover at intel.com>
> *Cc:* Friedman, Eli <efriedma at codeaurora.org>; llvm-dev <
> llvm-dev at lists.llvm.org>
>
>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
> Right - I think that case looks like this for little endian:
>
> define <2 x i32> @zextshuffle(<2 x i16> %x) {
>   %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16> zeroinitializer,
> <4 x i32> <i32 0, i32 2, i32 1, i32 2>
>   %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32>
>   ret <2 x i32> %bc
> }
>
> define <2 x i32> @zextvec(<2 x i16> %x) {
>   %zext = zext <2 x i16> %x to <2 x i32>
>   ret <2 x i32> %zext
> }
>
> IMO, the fact that we have to take endianness into account with the
> shuffles makes the trunc/zext forms the better choice. That way, we limit
> the endian dependency to one place in InstCombine, and other transforms
> don't have to worry about it. We also have lots of existing folds for
> trunc/zext and hardly any for shuffles.
>
>
>
>
>
> On Thu, Jan 12, 2017 at 1:14 PM, Rackover, Zvi <zvi.rackover at intel.com>
> wrote:
>
> Just to add, there is also the ‘zext’ – ‘shuffle with zero’ duality which
> can broaden the discussion.
>
>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Thursday, January 12, 2017 20:19
> *To:* Friedman, Eli <efriedma at codeaurora.org>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Rackover, Zvi <
> zvi.rackover at intel.com>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 11:06 AM, Friedman, Eli <efriedma at codeaurora.org>
> wrote:
>
> On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote:
>
> It's time for another round of "What is the canonical IR?"
>
> Credit for this episode to Zvi and PR31551. :)
> https://llvm.org/bugs/show_bug.cgi?id=31551
>
> define <4 x i16> @shuffle(<16 x i16> %x) {
>
>   %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
>
>   ret <4 x i16> %shuf
>
> }
>
>
>
> define <4 x i16> @trunc(<16 x i16> %x) {
>
>   %bc = bitcast <16 x i16> %x to <4 x i64>
>
>   %tr = trunc <4 x i64> %bc to <4 x i16>
>
>   ret <4 x i16> %tr
>
> }
>
>
>
> Potential reasons to prefer one or the other:
> 1. Shuffle is the most compact.
> 2. Trunc is easier to read.
> 3. One of these is easier for value tracking.
> 4. Compatibility with existing IR transforms (eg, InterleavedAccess
> recognizes the shuffle form).
>
> 5. We don't create arbitrary shuffle masks in IR because that's bad for a
> lot of targets (but maybe this mask pattern should always be recognized as
> special?).
>
>
>
> Hmm... not sure what the right answer is, but a couple more observations:
> 1. If we're going to canonicalize, we should probably canonicalize the
> same way independent of the original argument type (so we would introduce
> bitcasts either way).
>
>
>
> Ah, right - kill #1 in my list.
>
>
>
> 2. Those two functions are only equivalent on little-endian platforms.
>
>
>
> I was wondering about that. So yes, if we do want to canonicalize (until
> the recent compile-time complaints, I always thought this was the objective
> of InstCombine...maybe it still is), then the masks we're matching or
> generating will differ based on endianness.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170121/15c7abf2/attachment-0001.html>