[PATCH] D52747: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization

Tue Oct 2 07:24:18 PDT 2018

spatel added a comment.

In https://reviews.llvm.org/D52747#1252143, @lebedev.ri wrote:

> > Ideally, I think we'd do the same for scalars, but I'm afraid of unintended consequences.
>
> I *think* this originates from https://reviews.llvm.org/rL67635.

Thanks for digging that up! As the codegen examples here show, the icmp variant is not always better for vector codegen at least (we could just fix the backend, but since we can reduce the IR, I figured that's the better option).

My bigger worry is that we're going to expose IR-level holes for trunc patterns with scalar code (and-of-icmps and similar or the example with shift that's now shown here). Those seem less likely for vector code.

================
Comment at: lib/Transforms/InstCombine/InstCombineCompares.cpp:1712
+  // For vectors: icmp ne (and X, 1), 0 --> trunc X to N x i1
+  // TODO: We canonicalize to the longer form for scalars. Why?
+  if (Cmp.getPredicate() == CmpInst::ICMP_NE && Cmp.getType()->isVectorTy() &&
----------------
craig.topper wrote:
> Should this be in foldICmpAndConstConst? And it should use the APInts we already extracted?
There are 2 independent problems here, and I should have put this in a code comment:
1. If we use the already extracted APInt values, we won't handle vectors with undefs because m_APInt doesn't match those (yet). So in cases like this, I've been using the more specific matcher even if it looks redundant. I will add a test that includes undefs in the constant vector values.
2. Depending on where we position this fold, it exposes another canonicalization question because it will affect patterns with shifts like this:

```
  %shr = ashr <2 x i84> %X, <i84 4, i84 4>
  %and = and <2 x i84> %shr, <i84 1, i84 1>
  %cmp = icmp ne <2 x i84> %and, zeroinitializer

```

Should that become:

```
%m = and <2 x i84> %X, <i84 16, i84 16>
%cmp = icmp ne <2 x i84> %m, zeroinitializer

```

or:

```
%sh = lshr <2 x i84> %X, <i84 4, i84 4>
%cmp = trunc <2 x i84> %sh to <2 x i1>

```

This patch sidesteps that question by allowing the larger pattern to match first, but we could make that a prerequisite step for this patch.

https://reviews.llvm.org/D52747