[llvm-commits] [PATCH] Bug 14664: InstCombine missing canonicalization of sub-and into a select

Mon Jan 21 08:24:53 PST 2013

Thanks a lot for the detailed feedback.

It does sound like a lot of work and I might not be able to do it. Maybe you could submit bug(s) to llvm bugzilla and if it's decided that I should do it, then I will surely pick it up.

I don't understand a few things though:
I don't fully understand what you mean by "make sure that codegen produces assembly that isn't any worse than it does now". I mean do I need to change the code in CodeGen to produce assembly "as good" as is currently being produces for this canonical form? What I mean is, once we agree on this as the canonical form, then codegen should be changed if the 
code isn't optimal right?

Opinion: Also, I honestly don't know why one form (select vs sext) is better or worse than the other. In the end, for this particular pattern, it is going to end up as a select anyway and then one would optimize the codegen for this specific select pattern because the IR doesn't matter as long as it is canonical and the codegen produces optimized assembly for the canonical forms.

Thanks.

-----Original Message-----
From: Nick Lewycky [mailto:nicholas at mxc.ca] 
Sent: Saturday, January 19, 2013 6:26 PM
To: Ahmad, Muhammad T
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] [PATCH] Bug 14664: InstCombine missing canonicalization of sub-and into a select

Ahmad, Muhammad T wrote:
> Buzilla URL: http://llvm.org/bugs/show_bug.cgi?id=14664
>
> This patch adds the InstCombine that converts any expression of the form '(and (sub 0, (zext A to B)), C)' -->   'select A, C, 0' where type of A is i1.
>
> Please have a look (diff attached).

Thanks. First off, I want to lead off with a review of your patch, but I want you to add this optimization in a fundamentally different way. 
Let's start with the patch:

+  if (BinaryOperator *subI = dyn_cast<BinaryOperator>(Op0))
+    if (subI->getOpcode() == Instruction::Sub)
+      if (ConstantInt *CI = dyn_cast<ConstantInt>(subI->getOperand(0)))
+        if (CastInst *zextI = dyn_cast<CastInst>(subI->getOperand(1))) {
+          if (CI->isZero() &&
+              zextI->getSrcTy()->isIntegerTy(1)) {

This is a verbose way of writing:

   Value *V;
   if (match(Op0, m_Sub(m_Zero(), m_ZExt(m_Value(V))) &&
       V->getType()->isIntegerTy(1)) {

See llvm/Support/PatternMatch.h.

+            Value* A = zextI->getOperand(0);
+            Value* C = Op1;
+            Value* zero = Constant::getNullValue(C->getType());

The rest of the file (and part of your own patch) uses "Value *" style, with the * next to the variable, not the type. Please use that style consistently.

+            return SelectInst::Create(A, C, zero);

InstCombine keeps a convenience IRBuilder around. It's a bit more than that because it does some work trying to preserve the right debug info on instructions that are transformed. I'd suggest writing:

   return Builder->CreateSelect(V, C, zero);

Also, thanks for including tests!

So I mentioned at the top that I think this should be done differently. 
This missed optimization is a specific symptom of a general problem. 
Fixing the general problem will be a lot more work, but I think you could make it manageable by doing it in two stages.

The neat thing about i1 is that any instruction that takes an i1 and produces a non-i1 can be replaced with a select: zext, sext, uitofp, sitofp, inttoptr. Audit instcombine to make sure that each of those with an i1 operand produces a select. At the same time, make sure that the actual x86 assembly produced for the construct with select is as efficient as the code produced without. That means that the first step in instcombining your example will be to take:
   (and (sub 0, (zext i1 A to iN)), B)
and produce:
   (and (sub 0, (select i1 A, iN 1, iN 0)), B) which is progress, and the first stage is patch just for this part across the five instructions I mentioned.

The second part is making sure that 'sub 0 (select X, C1, C2)' becomes 'select X, -C1, -C2', not just for sub, but for all sorts of instructions with one operand being a select between two constants:
   binary operators (other operand is constant): add, fadd, sub, fsub, mul, fmul, udiv, sdiv, fdiv, urem, srem, frem, shl, lshr, ashr, and, or, xor
   conversion operators: trunc, zext, sext, fptrunc, fpext, fptoui, fptosi, uitofp, sitofp, ptrtoint, inttoptr, bitcast.
Again, the time consuming part isn't necessarily adding these optimizations to instcombine, but making sure that codegen produces assembly which isn't any worse than it does now.

I realize that's a lot of instruction combinations to look through, but the good news is that the code is nearly the same for each one. If you want to start by sending in a patch that does two transforms:
   zext i1 X to iN --> select i1 X, iN 1, iN 0
   sub C1, (select X, C2, C3) --> select X, C2-C1, C3-C1 I'd be happy to review that.

Nick

> Note: This is my first patch submission to this list and although I am trying to follow protocol as best as I can, if I am missing something, please let me know.